I think these LLMs have been optimized with Reinforcement Learning from Human Fe... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		sharemywin on Jan 7, 2023 \| parent \| context \| favorite \| on: A skeptical take on ChatGPT: Ezra Klein interviews... I think these LLMs have been optimized with Reinforcement Learning from Human Feedback (RLHF) Hard to tell that will add enough to get close to AGI. Google is also working on chain of thought prompting which helps with math and logic problems. https://medium.com/nlplanet/two-minutes-nlp-making-large-lan...

lossolo on Jan 7, 2023 [–]

COT is so yesterday! SOTA is LAMBADA[1] aka backward chaining also from Google, that significantly outperforms chain of thought and select inference in terms of prediction accuracy and proof accuracy.

[1] https://arxiv.org/abs/2212.13894

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact