Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

ok so you'll have to help me here, I'm still learning this stuff.

RLHF I looked it up. Is this really useful? The average human has zero general expertise because people are specialized (I know nothing about say, 1960s avant garde french cinema and my responses in a conversation there would be garbage - given the breadth of human knowledge even the most accomplished scholars are useless for over 99% of it). Won't there be a quality decrease? How is this accommodated for?

If the chat systems simply gave the most popular answers it would cease to be useful real fast.



RLHF isn't used to teach the model what it knows, it's used to teach the model how to follow instructions

Before RLHF instruct tuning the models could only complete sentences

Technically they still complete sentences, but now they have a strong association for a format where a question is followed by an answer




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: