Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That's essentially what R1 Zero is showing:

> Notably, it is the first open research to validate that reasoning capabilities of LLMs can be incentivized purely through RL, without the need for SFT.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: