Submissions from tobysimonds.com

		How to Train an LLM to Do Proofs: Beyond Verifiable Rewards (tobysimonds.com)
		3 points by tamassimond 5 months ago \| past
		The Cost of Winning:How RL Training on Poker Leads to Evil LLMs (tobysimonds.com)
		2 points by tamassimond 6 months ago \| past \| 1 comment
		The Hidden Cost of Winning:How RL Training on Poker Degrades LLM Moral Alignment (tobysimonds.com)
		8 points by tamassimond 6 months ago \| past
		AlphaWrite: AI that improves at writing by evolving its own stories (tobysimonds.com)
		80 points by tamassimond 9 months ago \| past \| 159 comments