You need to make it so the map doesn't refresh when I click another pin, that's so annoying. I wanted to see how hectic my plan for London day trip would be, but I lose the locality between clicking different options in the map.
Another good use case for a microservice - if you are going to have to change the compute size for your monolith just to accommodate the new functionality.
I had an architect bemoan the suggestion we use a microservice, until he had to begrudgingly back down when he was told that the function we were talking about (Running a CLIP model) would mean attaching a GPU to every task instance.
Imagine you are predicting the next token, you have two tokens very close in probability in the distribution, kernel execution is not deterministic because of floating point non-associativity - the token that gets predicted impacts the tokens later in the prediction stream - so it's very consequential which one gets picked.
This isn't some hypothetical - it happens all the time with LLM's - it isn't some freak accident that isn't probable
> Would you really say that the main part of non-determinism in LLM-usage stems from this
Yes I would because it causes exponential divergence (P(correct) = (1-e)^n) and doesn't have a widely adopted solution. The major labs have very expensive researchers focused on this specific problem.
There is a paper from Thinking Machines from September around Batch Invariant kernels you should read, it's a good primer on this issue of non-determinism in LLM's, you might learn something from it!
Unfortunately the method has quite a lot of overhead, but promising research all the same.
I dont think this is relevant to the main-point, but it's definitely something I wasn't aware of. I would've thought it might have an impact on like O(100)th token in some negligible way, but glad to learn.
reply