Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Not an expert either, but my understanding is that large models use quantized weights and tensor inputs for inference. Multiplication and addition of fixed-point values is associative, so unless there's an intermediate "convert to/from IEEE float" step (activation functions, maybe?), you can still build determinism into a performant model.




Fixed point arithmetic isn't truly associative unless they have infinite precision. The second you hit a limit or saturate/clamp a value the result very much depends on order of operations.

Ah yes, I forgot about saturating arithmetic. But even for that, you wouldn't need infinite precision for all values, you'd only need "enough" precision for the intermediate values, right? E.g. for an inner product of two N-element vectors containing M-bit integers, an accumulator with at least ceil(log2(N))+2*M bits would guarantee no overflow.

True, you can increase bit width to guarantee never hit those issues, but right now saturating arithmetic on types that pretty commonly hit those values is the standard. Guaranteeing it would be a significant performance drop and/or memory use increase with current techniques to the level it would significantly affect availability and cost compared to what people expect.

Similarly you could not allow re-ordering of operations and similar - so the results are guaranteed to be deterministic (even if still "not correct" compared to infinite precision arithmetic) - but that would also have a big performance cost.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: