You are underselling or not understanding the breakthrough. They trained 600B model on 15T tokens for <$6/m. Regardless of the provenance of the tokens, this in itself is impressive.
Not to mention post-training. Their novel GRPO technique used for preference optimization / alignment is also much more efficient than PPO.
Let's call it underselling. :-) Mostly because I'm not sure anyone's independently done the math and we just have a single statement from the CEO. I do appreciate the algorithmic improvements, and the excellent attention-to-performance-in-detail stuff in their implementation (careful treatment of precision, etc.), making the H800s useful, etc. I agree there's a lot there.
Not to mention post-training. Their novel GRPO technique used for preference optimization / alignment is also much more efficient than PPO.