Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You are underselling or not understanding the breakthrough. They trained 600B model on 15T tokens for <$6/m. Regardless of the provenance of the tokens, this in itself is impressive.

Not to mention post-training. Their novel GRPO technique used for preference optimization / alignment is also much more efficient than PPO.



Let's call it underselling. :-) Mostly because I'm not sure anyone's independently done the math and we just have a single statement from the CEO. I do appreciate the algorithmic improvements, and the excellent attention-to-performance-in-detail stuff in their implementation (careful treatment of precision, etc.), making the H800s useful, etc. I agree there's a lot there.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: