I'm wondering about this too. Would be nice to see an ablation here, or at least see some analysis on the reasoning traces.
It definitely doesn't wipe its internal knowledge of Crystal clean (that's not how LLMs work). My guess is that it slightly encourages the model to explore more and second-guess it's likely very-strong Crystal game knowledge but that's about it.
I was curious, they train a 1900 base model, then fine tune to the exact year:
"To keep training expenses down, we train one checkpoint on data up to 1900, then continuously pretrain further checkpoints on 20B tokens of data 1900-${cutoff}$. "
Since it now includes 4 thinking levels (minimal-high) I'd really appreciate if we got some benchmarks across the whole sweep (and not just what's presumably high).
Flash is meant to be a model for lower cost, latency-sensitive tasks. Long thinking times will both make TTFT >> 10s (often unacceptable) and also won't really be that cheap?
Google appears to be changing what flash is “meant for” with this release - the capability it has along with the thinking budgets make it superior to previous Pro models in both outcome and speed. The likely-soon-coming flash-lite will fit right in to where flash used to be - cheap and fast.
both have had questionable content for a while, it's a wonder people are still paying for them. especially given that LLMs exist (and youtube for that matter).
If I were a professor at a decent school, I'd probably look at the landscape of MOOCs and go "Why am I spending any time on this?" It seemed like something new and potentially exciting at one point. I certainly wouldn't today.
Shameless plug: if OP is looking to stay on d3, he could also try slotting in my C++/WASM versions[1] of the main d3 many-body forces. Not the best, but I've found >3x speedup using these for periplus.app :)
The goal was to make the learning material very malleable, so all content can be viewed through different "lenses" (e.g. made simpler, more thorough, from first principles, etc.). A bit like Wikipedia it also allows for infinite depth/rabbit holing. Each document links to other documents, which link to other documents (...).
I'm also currently in the middle of adding interactive visualizations which actually work better than expected! Some demos:
I suspect one can go a lot further by adopting some tweaks from the GPT-2 speedrun effort [0], at minimum Muon, better init and carefully tuning learning rate.
It definitely doesn't wipe its internal knowledge of Crystal clean (that's not how LLMs work). My guess is that it slightly encourages the model to explore more and second-guess it's likely very-strong Crystal game knowledge but that's about it.