Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

RLHF is Reinforcement Learning from Human Feedback.

It usually refers to fine tuning language models using data labelled by humans.

Hugging face have a good overview in this article: https://huggingface.co/blog/rlhf



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: