RLHF is Reinforcement Learning from Human Feedback. It usually refers to fine tu...

		spacebanana7 on March 14, 2023 \| parent \| context \| favorite \| on: Microsoft lays off one of its responsible AI teams RLHF is Reinforcement Learning from Human Feedback. It usually refers to fine tuning language models using data labelled by humans. Hugging face have a good overview in this article: https://huggingface.co/blog/rlhf