I would argue that stateful services (databases, message queues, CDNs) all perfectly fit the unikernel model. The question is whether the additional engineering effort and system design is worth the performance gain.
Another one is "jalr x0, imm(x0)", which turns an indirect branch into a direct jump to address "imm" in a single instruction w/o clobbering a register. Pretty neat.
Yes? Funnily enough, I don't often use indexed access in Rust. Either I'm looping over elements of a data structure (in which case I use iterators), or I'm using an untrusted index value (in which case I explicitly handle the error case). In the rare case where I'm using an index value that I can guarantee is never invalid (e.g. graph traversal where the indices are never exposed outside the scope of the traversal), then I create a safe wrapper around the unsafe access and document the invariant.
If that's the case then hats off. What you're describing is definitely not what I've seen in practice. In fact, I don't think I've ever seen a crate or production codebase that documents infallibility of every single slice access. Even security-critical cryptography crates that passed audits don't do that. Personally, I found it quite hard to avoid indexing for graph-heavy code, so I'm always on the lookout for interesting ways to enforce access safety. If you have some code to share that would be very interesting.
My rule of thumb is that unchecked access is okay in scenarios where both the array/map and the indices/keys are private implementation details of a function or struct, since an invariant is easy to manually verify when it is tightly scoped as such. I've seen it used it in:
* Graph/tree traversal functions that take a visitor function as a parameter
> I don't think I've ever seen a crate or production codebase that documents infallibility of every single slice access.
The smoltcp crate typically uses runtime checks to ensure slice accesses made by the library do not cause a panic. It's not exactly equivalent to GP's assertion, since it doesn't cover "every single slice access", but it at least covers slice accesses triggered by the library's public API. (i.e. none of the public API functions should cause a panic, assuming that the runtime validation after the most recent mutation succeeds).
I think this goes against the Rust goals in terms of performance. Good for safe code, of course, but usually Rust users like to have compile time safety to making runtime safety checks unnecessary.
Sure, these days I'm mostly working on a few compilers. Let's say I want to make a fixed-size SSA IR. Each instruction has an opcode and two operands (which are essentially pointers to other instructions). The IR is populated in one phase, and then lowered in the next. During lowering I run a few peephole and code motion optimizations on the IR, and then do regalloc + asm codegen. During that pass the IR is mutated and indices are invalidated/updated. The important thing is that this phase is extremely performance-critical.
One normal "trick" is phantom typing. You create a type representing indices and have a small, well-audited portion of unsafe code handling creation/unpacking, where the rest of the code is completely safe.
The details depend a lot on what you're doing and how you're doing it. Does the graph grow? Shrink? Do you have more than one? Do you care about programmer error types other than panic/UB?
Suppose, e.g., that your graph doesn't change sizes, you only have one, and you only care about panics/UB. Then you can get away with:
1. A dedicated index type, unique to that graph (shadow / strong-typedef / wrap / whatever), corresponding to whichever index type you're natively using to index nodes.
2. Some mechanism for generating such indices. E.g., during graph population phase you have a method which returns the next custom index or None if none exist. You generated the IR with those custom indexes, so you know (assuming that one critical function is correct) that they're able to appropriately index anywhere in your graph.
3. You have some unsafe code somewhere which blindly trusts those indices when you start actually indexing into your array(s) of node information. However, since the very existence of such an index is proof that you're allowed to access the data, that access is safe.
Techniques vary from language to language and depending on your exact goals. GhostCell [0] in Rust is one way of relegating literally all of the unsafe code to a well-vetted library, and it uses tagged types (via lifetimes), so you can also do away with the "only one graph" limitation. It's been awhile since I've looked at it, but resizes might also be safe pretty trivially (or might not be).
The general principle though is to structure your problem in such a way that a very small amount of code (so that you can more easily prove it correct) can provide promises that are enforceable purely via the type system (so that if the critical code is correct then so is everything else).
That's trivial by itself (e.g., just rely on option-returning .get operators), so the rest of the trick is to find a cheap place in your code which can provide stronger guarantees. For many problems, initialization is the perfect place (e.g., you can bounds-check on init and then not worry about it again) (e.g., if even bounds-checking on initialization is too slow then you can still use the opportunity at initialization to write out a proof of why some invariant holds and then blindly/unsafely assert it to be true, but you then immediately pack that hard-won information into a dedicated type so that the only place you ever have to think about it is on initialization).
I do use a combination of newtyped indices + singleton arenas for data structures that only grow (like the AST). But for the IR, being able to remove nodes from the graph is very important. So phantom typing wouldn't work in that case.
Usually you'd want to write almost all your slice or other container iterations with iterators, in a functional style.
For the 5% of cases that are too complex for standard iterators? I never bother justifying why my indexes are correct, but I don't see why not.
You very rarely need SAFETY comments in Rust because almost all the code you write is safe in the first place. The language also gives you the tool to avoid manual iteration (not just for safety, but because it lets the compiler eliminate bounds checks), so it would actually be quite viable to write these comments, since you only need them when you're doing something unusual.
I didn't restate the context from the code we're discussing: it must not panic. If you don't care if the code panics, then go ahead and unwrap/expect/index, because that conforms to your chosen error handling scheme. This is fine for lots of things like CLI tools or isolated subprocesses, and makes review a lot easier.
So: first, identify code that cannot be allowed to panic. Within that code, yes, in the rare case that you use [i], you need to at least try to justify why you think it'll be in bounds. But it would be better not to.
There are a couple of attempts at getting the compiler to prove that code can't panic (e.g., the no-panic crate).
What about memory allocation - how will you stop that from panicking ? `Vec::resize` will always panic in Rust. And this is just one example out of thousands in the Rust stdlib.
Unless the language addresses no-panic in its governing design or allows try-catch, not sure how you go about this.
That is slowly being addressed, but meanwhile it’s likely you have a reliable upper bound on how much heap your service needs, so it’s a much smaller worry. There are also techniques like up-front or static allocation if you want to make more certain.
This is ridiculous. We're probably going to start seeing more of these. This was just the first, big highly visible instance.
We should have a name for this similar to "my code just NPE'd". I suggest "unwrapped", as in, "My Rust app just unwrapped a present."
I think we should start advocating for the deprecation and eventual removal of the unwrap/expect family of methods. There's no reason engineers shouldn't be handling Options and Results gracefully, either passing the state to the caller or turning to a success or fail path. Not doing this is just laziness.
Indexing is comparatively rare given the existence of iterators, IMO. If your goal is to avoid any potential for panicking, I think you'd have a harder time with arithmetic overflow.
Your pair of posts is very interesting to me. Can you share with me: What is your programming environment such that you are "fine with allocation failures"? I'm not doubting you, but for me, if I am doing systems programming with C or C++, my program is doomed if a malloc fails! When I saw your post, I immediately thought: Am I doing it wrong? If I get a NULL back from malloc(), I just terminate with an error message.
I mean, yeah, if I am using a library, as an user of this library, I would like to be able to handle the error myself. Having the library decide to panic, for example, is the opposite of it.
If I can't allocate memory, I'm typically okay with the program terminating.
I don't want dependencies deciding to unwrap() or expect() some bullshit and that causing my entire program to crash because I didn't anticipate or handle the panic.
Code should be written, to the largest extent possible, to mitigate errors using Result<>. This is just laziness.
I want checks in the language to safeguard against lazy Rust developers. I don't want their code in my dependency tree, and I want static guarantees against this.
edit: I just searched unwrap() usage on Github, and I'm now kind of worried/angry:
Something that allows me to tag annotate a function (or my whole crate) as "no panic", and get a compile error if the function or anything it calls has a reachable panic.
This will allow it to work with many unmodified crates, as long as constant propagation can prove that any panics are unreachable. This approach will also allow crates to provide panicking and non panicking versions of their API (which many already do).
Yes, I want that. I also want to be able to (1) statically apply a badge on every crate that makes and meets these guarantees (including transitively with that crate's own dependencies) so I can search crates.io for stronger guarantees and (2) annotate my Cargo.toml to not import crates that violate this, so time isn't wasted compiling - we know it'll fail in advance.
On the subject of this, I want more ability to filter out crates in our Cargo.toml. Such as a max dependency depth. Or a frozen set of dependencies that is guaranteed not to change so audits are easier. (Obviously we could vendor the code in and be in charge of our own destiny, but this feels like something we can let crate authors police.)
I think the most common solution at the moment is dtolnay's no_panic [0]. That has a bunch of caveats, though, and the ergonomics leave something to be desired, so a first-party solution would probably be preferable.
I would be fine just getting rid of unwrap(), expect(), etc. That's still a net win.
Look at how many lazy cases of this there are in Rust code [1].
Some of these are no doubt tested (albeit impossible to statically guarantee), but a lot of it looks like sloppiness or not leaning on the language's strong error handling features.
It's disappointing to see. We've had so much of this creep into the language that eventually it caused a major stop-the-world outage. This is unlikely to be the last time we see it.
I don't write Rust so I don't really know, but from someone else's description here it sounds similar to `fromJust` in Haskell which is a common newbie footgun. I think you're right that this is a case of not using the language properly, though I know I was seduced into the idea that Haskell is safe by default when I was first learning, which isn't quite true — the safety features are opt-in.
A language DX feature I quite like is when dangerous things are labelled as such. IIRC, some examples of this are `accursedUnutterablePerformIO` in Haskell, and `DO_NOT_USE_OR_YOU_WILL_BE_FIRED_EXPERIMENTAL_CREATE_ROOT_CONTAINERS` in React.js.
I would be in favor of renaming unwrap() and its family to `unwrap_do_not_use_or_you_will_break_the_internet()`
I still think we should remove them outright or make production code fail to compile without a flag allowing them. And we also need tools to start cleaning up our dependency tree of this mess.
For iteration, yes. But there's other cases, like any time you have to deal with lots of linked data structures. If you need high performance, chances are that you'll have to use an index+arena strategy. They're also common in mathematical codebases.
You could copy the instruction to a 16 byte sized buffer and hash the one/two int64s. Looking at the code sample in the article, there wasn't a single instruction longer than 5 characters, and I suspect that in general instructions with short names are more common than those with long names.
This last fact might actually support the current model, as it grows linearly-ish in the size of the instruction, instead of being constant like hash.
Personally, I think this argument only holds water for languages that are rooted in mathematics (e.g. Haskell, Lean, Rocq, F*, ...). If your computational model comes from a place of physical hardware, instructions, registers, memory etc. you're going to end up with something very different than an abstract machine based on lambda calculus. Both valid ways to design a PL.
Intel still does it. As far as I can see they're the only player in town that provide open, detailed documentation for their high-speed NICs [0]. You can actually write a driver for their 100Gb cards from scratch using their datasheet. Most other vendors would either (1) ignore you, (2) make you sign an NDA or (3) refer you to their poorly documented Linux/BSD driver.
Not sure what the situation is for other hardware like NVMe SSDs.
Wow... that PDF is 2,750 pages! There must be an army of technical writers behind it. That is an incredible technical achievement.
Real question: Why do you think Intel does this? Does it guarantee a very strong foothold into data center NICs? I am sure competitors would argue two different angles: (1) this PDF shares too much info; some should be hidden behind an NDA, (2) it's too hard to write (and maintain) this PDF.
I may be the only person who ever understood every detail of C++, starting with the preprocessor. I can make that claim because I'm the only person who ever implemented all of it. (You cannot really know a language until you've implemented it.) I gave up on that in the 2000's. Modern C++ is simply terrifying in its complexity.
(I'm not including the C++ Standard Library, as I didn't implement it.)
Sean Baxter single-handedly implemented all of up to C++23, and some C++26, including a huge number of GNU extensions and possibly an even larger number of his own features.
I don't know much of anything about him. Did he implement the preprocessor? the optimizer? the code generator?
(For some context, back in the 80's, code generators needed enhancements to implement C++. You couldn't just use an existing one. Bjarne had to do some ugly workarounds because of this.)
Sean Baxter's circle compiler uses LLVM as a backend, but I believe the rest is from scratch.
Arguably these days having a clear frontend/backend separation is good compiler architecture. It might slow down compile times a bit, but it's worth the cost.
It wouldn't have made much sense to write the preprocessor these days, too, but it is part of the C++ compiler. Unless integrating it with the C++ lexer for speed purposes, as I did.
P.S. we're adding an "Editions" feature to D so we can simplify the language by removing obsolete and deadend features. We didn't get everything right, and want to fix it!
This is a pretty standard document length. Modern microcontrollers have similar lengths (e.g. ATSAMD51 is ~2000 pages). Some of it is not software related, things like pin outs and electrical and mechanical descriptions.
It does take a huge amount of work to write and maintain. Typically the authors are not technical, so it also relies on the designers being available to answer questions as well. Then there’s a choice of how it’s written: narrative and potentially imprecise but readable, or terse and precise but hard to read. There’s both styles in the same document, terse for register descriptions.
Look up the Texas Instruments am3358. It's a tiny SOC, it was used in the beaglebone black. Its technical reference manual[1] is over 5000 pages, and it details all peripherals, all of the interconnects and every single register in the system. This, by contrast, is really just an overview.
Regards to (1), if you don't publish this information you're not selling a CPU, you're selling a very expensive chunk of sand. There is simply no way that a customer can guess at what your implementation looks like. Additionally, Intel barely has IP in the traditional sense. They hold patents, but their only real competitor in making x86 processors, AMD, has a long-standing mutual non-enforcement agreement wrt patents.
Regards to (2), I'm guessing a majority of this PDF can be generated sort of like you generate API documentation from doxygen comments.
I worked on a similar TI SoC -- with War-and-Peace-sized datasheet. My eyes burned out and brain exploded. Ultimately, another engineer had to take over the project -- or rather TEAM of engineers, of which I did only a part. It's simply to much complexity to expect one engineer to grok it all, do the schematic & PCB & power supply & hi-speed MIPI connections and radios and... and THEN to write the software for it all. It's too much. (This is the Life one gets in Startups, it seems -- worked to the (beagle)bone!)
having used the AM3358 extensively, the TRM is not complete. There are some pretty important and complex systems that have literally no documentation at all in the TRM, not to mention the large number of quirks and small details that you can only pick up from a scattering of other areas (including a wiki that TI deleted some years ago). It is, however, miles better than the documentation available for most SOCs.
> Real question: Why do you think Intel does this?
I'm not sure large traditional silicon vendors like Intel, TI, et al re-evaluate the documentation requirements (and costs) on a chip by chip basis. It's probably done by chip class and for companies who've been selling chips by the millions over many decades to industries as diverse as defense, aerospace, automotive, etc there are classes of chips where robust, complete documentation is not only expected but often a required part of the RFP, compliance or conformance processes.
While this level of effort probably isn't needed for every chip in that class, it could be hard to reliably predict when a general purpose chip is still in the design phase which customers may be interested in it during its life (which for some of these chips might be decades). Many chips which conform to MIL-SPEC or other similar standards which can require extensive documentation are simply enhanced versions of standard chips, so the docs exist anyway. Finally, there's the organizational capabilities and culture aspect. Once the org needs to maintain the systemic ability to generate serious documentation at scale, you end up with a lot of managers and staff who think this way.
For datasheets that's normal. Might even be leaning towards smaller than average for the device in question.
For comparison, a data sheet for a single transistor can be around 12 to 30 pages. A data sheet for a tiny microcontroller is probably a few hundred pages.
I once wrote a driver for a flash chip and that had a data sheet of around 80 pages.
In terms of (2), I wonder if it's even possible to write a driver without such a document. In the end, the vendor is on the foot for the driver for major platforms (let's assume Linux) - if they can write a Linux driver without a similar spec to this doc, then the doc probably doesn't need to exist since the business wins from hobbyist drivers will be low. If they can't though, then it's just a matter of formatting an internal document for public consumption - the doc itself has to be maintained anyways so the cost seems lower and maybe reasonable. I have a feeling the doc is necessary but I am not specialized in the field.
Assumptions, fair or not, about (1) seems more likely somehow.
Didn't all the asahi Linux Mac m1 drivers essentially get reverse engineered with little to no support from apple and no public docs? If I'm remembering correctly then I guess it's possible with enough effort and reverse engineering skills
It was reverse engineered from a driver. With no driver and purely some PCIE device registers mapped into memory you might as well be trying to guess lottery numbers.
I guess the driver was the one that runs on Mac that they were able to refer to? Not sure you have any links to blog posts about this process it sounds so cool
Probably CPU vendor culture? I forgot how large Intel's manual set is, but ARM's was ~11k pages the last time I checked. Intel's was smaller, but not that much smaller, certainly within an order of magnitude.
The NVMe spec is freely downloadable and sufficient to write a driver with, if your OS already has PCIe support (which doesn't have open specifications). You don't need any vendor-specific features for ordinary everyday use, so it's a bit of a different situation from NICs. (Also, NVMe was in very large part an Intel creation, though it's maintained by an industry consortium.)
Good christ this is my current work laptop. It...mostly doesn't work. Plug in a USB camera and it'll just go. Several drivers, userspace utilities and other daemons and sometimes gstreamer works, but does Zoom work? Who knows!
That's interesting that it's that short. I remember a long while ago I had aspirations of implementing a custom board for Prestonia-/Gallatin-era Xeons and the datasheets and specs for those was around 3000 pages, iirc. Supporting infra was about that long as well. So I'm surprised to see a modern ethernet controller fit into the same space. I appreciated all of the docs because it was so open, I felt like I could actually achieve that project, but other things took priority.
It's fascinating to me how the values and priorities of a project's leaders affect the community and its dominant narrative. I always wondered how it was possible for so many people in the Rust community to share such a strong view on soundness, undefined behavior, thread safety etc. I think it's because people driving the project were actively shaping the culture.
Meanwhile, compiler performance just didn't have a strong advocate with the right vision of what could be done. At least that's my read on the situation.
As OP demonstrated, Rust compiler performance is not the problem, it's actually quite fast for what it does. Slow builds are rather caused by reliance on popular over-generic crates that use metaprogramming to generate tons of code at compile time. It's not a Rust specific tradeoff but a consequence of the features it offers and the code style it encourages. An alternative, fast building crate ecosystem could be developed with the same tools we have now.
By comparison, Go doesn't have _that_ problem because it just doesn't have metaprogramming. It's easy to stay fast when you're dumb. Go is the Forest Gump of programming languages.
You can get pretty far with a branch per byte, as long as the bulk of the work is done w/ SIMD (like character classification). But yeah, LUT lookup per byte is not recommended.
reply