Not an expert either, but my understanding is that large models use quantized we...

kimixa · 2026-01-16T22:35:24 1768602924

Fixed point arithmetic isn't truly associative unless they have infinite precision. The second you hit a limit or saturate/clamp a value the result very much depends on order of operations.

10000truths · 2026-01-16T23:46:32 1768607192

Ah yes, I forgot about saturating arithmetic. But even for that, you wouldn't need infinite precision for all values, you'd only need "enough" precision for the intermediate values, right? E.g. for an inner product of two N-element vectors containing M-bit integers, an accumulator with at least ceil(log2(N))+2*M bits would guarantee no overflow.

kimixa · 2026-01-17T01:17:46 1768612666

True, you can increase bit width to guarantee never hit those issues, but right now saturating arithmetic on types that pretty commonly hit those values is the standard. Guaranteeing it would be a significant performance drop and/or memory use increase with current techniques to the level it would significantly affect availability and cost compared to what people expect.

Similarly you could not allow re-ordering of operations and similar - so the results are guaranteed to be deterministic (even if still "not correct" compared to infinite precision arithmetic) - but that would also have a big performance cost.