Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think these LLMs have been optimized with Reinforcement Learning from Human Feedback (RLHF)

Hard to tell that will add enough to get close to AGI.

Google is also working on chain of thought prompting which helps with math and logic problems.

https://medium.com/nlplanet/two-minutes-nlp-making-large-lan...



COT is so yesterday! SOTA is LAMBADA[1] aka backward chaining also from Google, that significantly outperforms chain of thought and select inference in terms of prediction accuracy and proof accuracy.

[1] https://arxiv.org/abs/2212.13894




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: