Very interesting. It sounds like tuning at the PTX level can increase workload efficiencies, such as quote "Specifically, we employ customized PTX (Parallel Thread Execution) instructions" from the DeepSeek folks https://arxiv.org/abs/2412.19437.
Agreed! The gulf between pure-C++ CUDA and PTX is getting larger with these optimizations. My understanding is that Deepseek used PTX instructions that either had no corresponding C++ implemented (like `wgmma` mentioned in the article) or uncommon permutations of modifiers (`LD.Global.NC.L1::no_allocate.L2::256b`).
This is a good read. Dr. Blaise Agüera y Arcas was a keynote speaker at https://attend.ieee.org/newera/program/ here in Seattle a week ago but he didn't really get a chance to delve deeply into his position. During his slot there ended up being a lot of back-and-forth about whether AGI truly achieved or just seeing ACI, etc, among the folks from MS, Meta, Google, UW, and even https://www.dia.mil/ rep.
ML (short for "meta-language") was originally designed for use in programming language research, and really shines for that purpose. And OCaml is probably the most pragmatic dialect for the purpose.
SML is very dated and the standard library and ecosystem lack many things that are considered table stakes for a viable programming language nowadays. And F# and Scala are fine as enterprise languages, but being tied to .NET and Java respectively makes them less desirable for implementing a language that won't itself be coupled to one of those runtimes.
Tree processing is best done in a language with decent algebraic datatypes and pattern matching. I would’ve preferred Standard ML, but, well, pot-ay-to, pot-ah-to. Haskell is another choice but the techniques you need to use there (while undeniably gaining you some possibilities) don’t really generalize to other languages, so you’re now writing a book about compiler construction in Haskell rather than just compiler construction. Ditto for Rust. Kotlin has deliberately anemic pattern matching. C# or F# leave you depending on Microsoft’s benevolence (sic). Metalua and Sweet.js both have decent ADT support but both are pretty much dead. Racket exists, I guess, and there are some pattern-matching libraries for normal Scheme as well, but the charisma malus of the parenthesis is real even if I don’t understand what causes it.
So OCaml was probably the most mainstream choice among the languages with appropriate tools, as funny as that sounds. And honestly, once you get over the syntax, it doesn’t actually have anything outrageous.