Even after reading this 10 times I still don't understand what this hack is about.
I understand these things independently:
1) SHA1 collision weakness
2) Nix checking package SHA1 when updating packages
3) Chromium returning different SHA1 for each download
Someone made a Nix thing that's supposed to check whether a new version of Chrome is available, and if so, generate an updated package.
While doing so, they found themselves needing some kind of "tryFetch" function that would return true or false depending on whether a certain URL is reachable.
But there's no such function in Nix. Why not? Because you're not really supposed to do stuff like that in the Nix paradigm which is all about determinism.
So... they really wanted to do it anyway, so they invented a clever hack, probably too clever.
What can you do in Nix? Well, you can make a package that downloads a certain URL and uses the downloaded result, provided that all the observable outputs of that package are deterministic. So after the package's build script runs, Nix verifies that the result matches a hash specified in the package definition. If the hash doesn't match, the package fails to evaluate.
So this hacker decided to make such a package that tries to download the Chrome update and results in the boolean information about whether the update was available. But the result needs to have the same hash in both cases. That's where a hash collision comes in handy.
So this hacky build script uses a couple of well-known PDF files that both have the same SHA1 hash. If the update exists, it gives PDF 1, otherwise it gives PDF 2.
The update script then depends on that hack package. It "installs" that package, and then checks whether it actually contains PDF 1 or PDF 2, and now it knows whether the Chrome update was available or not.
> Why not? Because you're not really supposed to do stuff like that in the Nix paradigm which is all about determinism.
That's the part I always feel when looking at FP languages. They sound good on paper, examples are very tempting, but when reality kicks in to this pure, predictable, perfect world, it turns into a massive pain.
The key point is that the Nix scripting environment (think “a Ports manifest”) is an intentionally restrictive language intended to have deterministic results for every operation. It’s not intended to be a Turing-complete programming language; that’s the whole point.
What the author of the script has done here, you’re supposed to do by writing code in some other language that generates a Nix manifest (or just by hand-rolling a Nix manifest.) And yet, the author here managed to get Nix to non-deterministically generate Nix.
Considered from the point of view of Nix’s goals, this is more an “exploit” than a truly-needed feature.
Nix's use of determinism actually has an important purpose, it's not just some arbitrary annoying restriction, it's what makes the whole system work properly. This script is a kind of funny meme that should probably be deleted, and anyway isn't crucial to the NixOS system at all, just a minor convenience and probably the hack was just fun to make.
Generally speaking these sandbox determinism requirements in Nixpkgs/NixOS are not annoying, they are a crucial feature: you know what you get when you install something. But yeah, it is a constraint, and sometimes when you try to package some weird program where the Makefile does some arbitrary network operations, you might find it annoying -- and you can locally disable the sandbox -- but the whole Nix philosophy is that build scripts should be reproducible, so then you just have to fix it.
What's the point of this? If you can't download something without knowing its hash in advance, then you can never download a new version of Chrome or anything else, so why do you care whether one is available?
Nix tries to achieve deterministic, reproducible builds, but Chrome's update process is non-deterministic because of (3). This hack lets the update check appear deterministic.
Can you explain how? I don't understand how a hash collision of two PDF files would help with this. Surely if it wanted to download the file at 'https://commondatastorage.googleapis.com/chromium-browser-of..., it would need the actual hash of that specific file?
edit: I just read mbrock's comment above which explained it perfectly. Didn't realize it was testing the url to see if there was an actual update available, I thought it was doing the actual update.
I understand these things independently: 1) SHA1 collision weakness 2) Nix checking package SHA1 when updating packages 3) Chromium returning different SHA1 for each download
Seriously, what is this?