Hacker Newsnew | past | comments | ask | show | jobs | submit | fromlogin
How to Train an LLM to Do Proofs: Beyond Verifiable Rewards (tobysimonds.com)
3 points by tamassimond 3 months ago | past
The Cost of Winning:How RL Training on Poker Leads to Evil LLMs (tobysimonds.com)
2 points by tamassimond 4 months ago | past | 1 comment
The Hidden Cost of Winning:How RL Training on Poker Degrades LLM Moral Alignment (tobysimonds.com)
8 points by tamassimond 4 months ago | past
AlphaWrite: AI that improves at writing by evolving its own stories (tobysimonds.com)
80 points by tamassimond 7 months ago | past | 159 comments

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: