More

conorbergin · 2026-05-03T22:49:32 1777848572

LLMs are deterministic, the same model under the same conditions will produce the same output, unless some randomness is purposefully injected. Neural networks in general can be thought of as universal function approximators.

mrob · 2026-05-04T00:08:33 1777853313

Whenever somebody calls LLMs "non-deterministic", assume they meant "chaotic", in the informal sense of being a system where small changes of input can cause large changes to output, and the only way to find out if it will happen is by running the full calculation.

For many applications, this is equally troublesome as true non-determinism.

conorbergin · 2026-05-04T01:25:31 1777857931

I don't think LLMs are that chaotic, you can replace words in an input at get a similar answer, and they are very good at dealing with typos.

They are definitely not interpretable, I was reading some stuff from mechanistic interpretability researchers saying they've given up trying to build a bottom up model of how they work.

mylifeandtimes · 2026-05-04T02:03:03 1777860183

> I don't think LLMs are that chaotic, you can replace words in an input at get a similar answer, and they are very good at dealing with typos.

Compare "You are a helpful assistant. Your task is to <100 lines of task description> <example problem>"

with

"you are a helpless assistant. Your task is to <100 lines of task description> <example problem>"

I've changed 3 or 4 CHARACTERS ("ful" to "less") out of a (by construction) 1000+ character prompt.

and the outputs are not at all similar.

Just realized I've never tried the "you are a helpless ass" prompt. Again a very minor change in wording, just dropping a few letters. The helpless assistant at least output text apologizing for being so bad at the task.

orbital-decay · 2026-05-04T03:33:31 1777865611

Sure. What did you expect? You changed the semantic of your prompt to the complete opposite. Of course it will attempt to make sense of it to its ability, and deliver what you requested. The input isn't formally specified, that's inherent for the domain, not the model or a human. GP, on the other hand, is talking about semantically negligible differences like typos.

2ndorderthought · 2026-05-03T23:46:22 1777851982

That's not really true. If you turn a few knobs you can make them deterministic. Namely setting temperature to zero, and turning off all history. But none of the cloud providers do this. Because it's not a product as far as they are concerned. So in practice - not so much.

maplethorpe · 2026-05-03T23:53:19 1777852399

Can someone explain why this is? Do LLMs somehow contain a true random number generator? Why wouldn't they produce the same outputs given the same inputs (even temperature)?

edit: I'm not talking about an LLM as accessed through a provider. I'm just talking about using a model directly. Why wouldn't that be deterministic?

anon373839 · 2026-05-04T00:21:52 1777854112

The model outputs a probability distribution for the next token, given the sequence of all previous tokens in the context window. It’s just a list of floats in the same order as the list of tokens that the tokenizer uses.

After that, a piece of software that is NOT the LLM chooses the next token. This is called the sampler. There are different sampling parameters and strategies available, but if you want repeatable* outputs, just take the token with the highest probability number.

* Perfect determinism in this sense is difficult to achieve because GPU calculations naturally have a minor bit of nondeterminism. But you can get very close.

2ndorderthought · 2026-05-04T00:27:18 1777854438

I'm not so sold the LLM is an LLM without a sampler but it's not worth quibbling over. It's part of the statistical model anyways.

vrighter · 2026-05-04T11:13:20 1777893200

the llm is the trained part, the rest is the handwritten part. The sampler is handwritten, not learned.

2ndorderthought · 2026-05-04T12:01:55 1777896115

Believe it or not in statistics and machine learning the hard coded parts of a model that impact the results are considered part of the model. But I understand that now days we don't care about these things because ai goes brrr.

nowittyusername · 2026-05-04T03:50:16 1777866616

There are A LOT of misconceptions about llms, biggest one is they are not deterministic. And they are 100% deterministic and temperature has nothing to do with it. You WILL get exactly same result every single time (at ANY temperature) as long as you use same sampling parameters and server config parameters. What causes variance in LLM's is server parameters like batch processing and caching among a few other things possibly. the batching being responsible for most of the issues. The reason that flag is used is because large providers serve multiple customers per one gpu, and breaking up the vram is tricky and causes drift. If you start llama.cpp for example with only one person per slot batching off, you will always get same results every time even at temperature 1.2 or whatever other parameters because you are using one gpu per inferance call so no fucky buseness there. Reason most people are unaware of this is because most people have experience only with api instead of working with the actual inferance enjine itself so this godd damned myth keeps spreading. my vide for referance here where you can download and try for yourself. https://www.youtube.com/watch?v=EyE5BrUut2o

maplethorpe · 2026-05-04T04:42:29 1777869749

Thanks so much for this! I still haven't got around to building my own language model yet, so I'm a bit fuzzy on the details, but if I imagined a thought experiment where I did all the math by hand on paper, I just couldn't see how I would end up with a different output each time given the same inputs. Finding out that the variance other people are seeing comes from the server/hardware stuff clears that up.

This is a surprisingly annoying question to Google. A lot of articles give the reason that softmax returns a probability distribution, as if the presence of the word "probability" means the tokens will be different every time.

evrydayhustling · 2026-05-04T00:09:47 1777853387

An LLM model itself -- that is, the weights and the mathematical functions linking them -- does not tell you exactly how to train from data, nor how to generate an output. Instead, it describes a function providing relative likelihood(output | input).

Deciding how to pick a particular output given that likelihood function is left as an exercise for the user, which we call inference.

One obvious choice is to keep picking the highest likelihood token, feed it into the model, and get another -- on repeat. This is what most algorithms call "temperature=0". But doing this for token after token can lead boring output, or steer you into pathological low-probability sequences like a set of endless repeats.

So, the current SOTA is to intentionally introduce a random factor (temperature>0) to the sampling process -- along with other hacks, like explicit suppression of repeats.

2ndorderthought · 2026-05-04T00:05:44 1777853144

Yea sure. So temperature is baked into these LLM models and when it isn't zero it increases the probability of taking a different path to decode the tokens. Whether it's at a provider or downloaded on your own machine.

Technically even when the temperature is 0 it's not deterministic but it's more likely to be... You can have ties in probabilities for generating the next words. And floating point noise is real.

All these models are doing is guesstimating the next token to say.

nnevatie · 2026-05-04T04:55:23 1777870523

> Namely setting temperature to zero, and turning off all history

That's not nearly enough, though. The multi-node/GPU inference and specifically batching (and ordering in batching) have non-deterministic consequences for the current LLM services.

2ndorderthought · 2026-05-04T11:59:24 1777895964

True but for small models it's pretty close. See my comment below about other cases leading to nondeterminism.

slashdave · 2026-05-04T00:56:29 1777856189

Eh, conceptually true, but in practice, it is rather hard to get any decent performance out of a GPU and still produce a deterministic answer.

And in any case, setting the temperature to zero will not produce a useful result, unless you don't mind your LLM constantly running into infinite loops.

alansaber · 2026-05-03T23:45:00 1777851900

Yes theres a good thinking machines lab blog about this

0-_-0 · 2026-05-03T23:38:02 1777851482

You're being downvoted, but you're right. Determinism is a different concept and doesn't characterise LLMs well. You can have deterministic random number generators for example.

conorbergin · 2026-04-28T16:43:56 1777394636

I doubt a fork would ever happen, Blender, being computer graphics software, has a huge knowledge gap between it's developers and it's users.

conorbergin · 2026-04-27T21:54:37 1777326877

That "syntactic sugar" encompasses the entire value proposition of markdown, there's nothing stopping you using Typst to author blog posts or take notes, they even have HTML export.

conorbergin · 2026-04-23T16:54:19 1776963259

I wonder if well designed "mutable" operating systems like Arch and Alpine that are going to beat NixOS etc. in the long run. An install script is strictly more powerful that a declarative config language, and typically less verbose.

Pay08 · 2026-04-23T20:51:30 1776977490

Might as well use Guix then. You still have the declarative config language, but also a turing-complete (and convenient) programming language.

joshmoody24 · 2026-04-24T05:55:27 1777010127

What do you mean by strictly more powerful?

conorbergin · 2026-04-27T15:45:33 1777304733

Scripts are typically turing complete, config files are typically not.

conorbergin · 2026-04-20T14:42:49 1776696169

This is a much more promising technique from Applied Science: https://www.youtube.com/watch?v=UIqhpxul_og

alnwlsn · 2026-04-20T16:23:02 1776702182

I remember that video. Hey, I wonder what happened to that $3000 Micronics SLS printer? Wasn't it a kickstarter? I remember that being a big deal at the time, and I guess it suddenly disappeared?

> We got bought by Formlabs in 2024

> Formlabs sells their own SLS printer for $25000

> Formlabs charges a license fee to be able to print with custom materials like Applied Science did [0]

ah, well that explains it.

0 - https://support.formlabs.com/s/article/Setting-up-Open-Mater...

conorbergin · 2026-04-01T02:25:45 1775010345

Not a fan of “texture healing”, a very convoluted and unsatisfying way of fixing a minor problem with monospace fonts, I’d be more interested in seeing letterforms redesigned to be more optically balanced within the grid, another commenter points out ubuntu mono does this somewhat, but I imagine you could make some fairly radical alterations to certain letters and still be legible.

thecloudlet · 2026-04-02T16:36:57 1775147817

I fell in love with Intel One Mono for this reason

conorbergin · 2026-03-09T21:12:16 1773090736

The Osprey's accident rate is not that bad, and the US Army have ordered a new smaller tiltrotor, the v280.

owlninja · 2026-03-09T21:27:51 1773091671

They officially named it recently to the 'MV-75'.

remarkEon · 2026-03-10T03:55:24 1773114924

The Wikipedia page says this will replace UH-60s, but I just do not see how that airframe is a direct comparable to what’s been a workhorse for decades. Maybe it means only in a long range reconnaissance role? But even then, that mission is primarily owned by UAS platforms now. Confusing.

icegreentea2 · 2026-03-10T13:27:24 1773149244

I imagine UH-60 and variants will continue to serve (who knows, maybe with new airframes) along side the MV-75 for quite a while, in a similar way to how UH-1s continued to be in use well after UH-60s were deployed in large numbers. This Congressional Research Service summary of the FLRAA/MV-75 program states that the Army has plans to continue ordering UH-60s (on the order of 255 between 2027 and 2031) - https://www.congress.gov/crs-product/IF12771

The key requirements that drive MV-75's downsides (size, complexity, cost) is the Army wants to play game in the Pacific. The UH-60 is deeply limited there.

For example, the MV-75's range should let it go (one-way) from Guam to the Philippines, straight from Okinawa to Taiwan (no need to island hop) - potentially as a two way mission. Same as Philippines to Taiwan.

The "comparability" is that the MV-75 and UH-60 can be delivery ~14 troops into an order magnitude similar size clearing.

remarkEon · 2026-03-10T17:35:05 1773164105

Thank you! This context really clarifies what the use case is for this. The range difference matters.

joha4270 · 2026-03-10T09:02:54 1773133374

What is so unbelievable about that?

Sure, its going to take decades to actually make the transition and the UH-60 will remain in service for decades more after that in less demanding roles. I expect by the time this finishes, the MV-75 will be considered another workhorse, if maybe slightly fuzzier and the UH will be an antiquated platform.

But ultimately they both solve the same problem, moving stuff from A to B in rough terrain fast. But with the ever increasing amount of reconnaissance assets, A needs to be further behind the frontline and so range and speed needs to increase beyond what you can manage with a pure helicopter.

conorbergin · 2026-03-08T23:29:24 1773012564

Thank God for Zig

pjmlp · 2026-03-09T10:09:12 1773050952

For bringing us back to Modula-2?

esafak · 2026-03-09T00:23:07 1773015787

Elaborate.

conorbergin · 2026-03-09T04:57:46 1773032266

I don’t think PL theory driven design produces good systems languages.

zozbot234 · 2026-03-09T09:38:24 1773049104

Rust as it exists today is very much "PL theory" driven. It's not necessarily a good language, but it's been consistently ranked as the #1 "most loved" by Stack Overflow for the past few years.

conorbergin · 2026-02-13T14:30:54 1770993054

Is webgpu a good standard at this point? I am learning vulkan atm and 1.3 is significantly different to the previous APIs, and apparently webgpu is closer in behavior to 1.0. I am by no means an authority on the topic, I just see a lack of interest in targeting webgpu from people in game engines and scientific computing.

flohofwoe · 2026-02-13T14:37:51 1770993471

For a text editor it's definitely good enough if not extreme overkill.

Other then that the one big downside of WebGPU is the rigid binding model via baked BindGroup objects. This is both inflexible and slow when any sort of 'dynamism' is needed because you end up creating and destroying BindGroup objects in the hot path.

Vulkan's binding model will really only be fixed properly with the very new VK_EXT_descriptor_heap extension (https://docs.vulkan.org/features/latest/features/proposals/V...).

xyzsparetimexyz · 2026-02-13T18:55:55 1771008955

The modern Vulkan binding model is relatively fine. Your entire program has a single descriptor set containing an array of images that you reference by index. Buffers are never bound and instead referenced by device address.

conorbergin · 2026-02-13T19:50:30 1771012230

Do you think Vulkan will become "nice" to use, could it ever be as ergonomic as Metal is supposed to be?

flohofwoe · 2026-02-14T12:05:23 1771070723

Apparently "joy to use" is one of the new core goals of Khronos for Vulkan. Whether they succeed remains to be seen, but at least they acknowledge now that a developer hostile API is a serious problem for adoption.

The big advantage of Metal is that you can pick your abstraction level. At the highest level it's convenient like D3D11, at the lowest level it's explicit like D3D12 or Vulkan.

pornel · 2026-02-13T15:20:40 1770996040

Bevy engine uses wgpu and supports both native and WebGPU browser targets through it.

The WebGPU API gets you to rendering your first triangle quicker and without thinking about vendor-specific APIs and histories of their extensions. It's designed to be fully checkable in browsers, so if you mess up you generally get errors caught before they crash your GPU drivers :)

The downside is that it's the lowest common denominator, so it always lags behind what you can do directly in DX or VK. It was late to get subgroups, and now it's late to get bindless resources. When you target desktops, wgpu can cheat and expose more features that haven't landed in browsers yet, but of course that takes you back to the vendor API fragmentation.

swiftcoder · 2026-02-13T14:33:35 1770993215

It's a good standard if you want a sort of lowest-common-denominator that is still about a decade newer than GLES 3 / WebGL 2.

The scientific folks don't have all that much reason to upgrade from OpenGL (it still works, after all), and the games folks are often targeting even newer DX/Vulkan/Metal features that aren't supported by WebGPU yet (for example, hardware-accelerated raytracing)

pjmlp · 2026-02-13T15:41:44 1770997304

Khronos is trying to entice scientific folks with ANARI, because there was zero interest to move from OpenGL as you mention.

https://www.khronos.org/anari/

conorbergin · 2026-01-28T19:25:12 1769628312

Having no CSD at all is unacceptable on small screens IMHO, far too much real estate is taken up by a title bar, you can be competitive with SSD by making them really thin, but then they are harder to click on and impossible with touch input. At the moment I have firefox setup with CSD and vertical tabs, only 7% of my vertical real estate is taken up by bars (inc. Gnome), which is pretty good for something that supports this many niceties.