ok so you'll have to help me here, I'm still learning this stuff. RLHF I looked ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		kristopolous on Oct 13, 2023 \| parent \| context \| favorite \| on: OpenAI is too cheap to beat ok so you'll have to help me here, I'm still learning this stuff. RLHF I looked it up. Is this really useful? The average human has zero general expertise because people are specialized (I know nothing about say, 1960s avant garde french cinema and my responses in a conversation there would be garbage - given the breadth of human knowledge even the most accomplished scholars are useless for over 99% of it). Won't there be a quality decrease? How is this accommodated for? If the chat systems simply gave the most popular answers it would cease to be useful real fast.

BoorishBears on Oct 14, 2023 [–]

RLHF isn't used to teach the model what it knows, it's used to teach the model how to follow instructions

Before RLHF instruct tuning the models could only complete sentences

Technically they still complete sentences, but now they have a strong association for a format where a question is followed by an answer

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact