Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Are you taking the RLHF into account when you say so?


Well, I wasn't, but if you look at the top most comment of this thread [0] you'll see that considering the level of human reinforcement being demonstrated only reinforces my point.

[0] https://news.ycombinator.com/item?id=36013017


Taking RLHF into account: it's not actually generating the most plausible completion, it's generating one that's worse.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: