Hacker Newsnew | past | comments | ask | show | jobs | submit | nnethercote's commentslogin

I'm a member of Futurewei's Rust team, and I'm in the Compiler Contributors group.

I started these posts back in 2016 when I wasn't a member of either. It's been a long-running series, and there's never been much reason for them to go on the official Rust blog, because they get enough attention on my personal blog.

In the past I have posted links to Hacker News but usually they don't make the front page and get few if any comments, so I stopped bothering, but others sometimes do. I always post to /r/rust where the level of discussion tends to be higher-quality than HN, because there is a higher level of Rust knowledge.


I've tried AST memory layout work in the Zig style, look for "shrinkage" in these two posts:

- https://nnethercote.github.io/2022/10/27/how-to-speed-up-the...

- https://nnethercote.github.io/2023/03/24/how-to-speed-up-the...

It's hard work. Small AST changes often require hundreds of changes to the code. The required changes usually make the AST less ergonomic to work with. And the perf benefits I obtained were very small. Even after shrinking `ast::Expr` (by far the most common AST node kind) from over 100 bytes to 64 bytes on 64-bit.

The linked Zig PR has very impressive reductions in walltime/cycles/etc. but if you read closely it's restricted just to parsing, which is usually a very small slice of compile time, at least for rustc. My experience with these kinds of changes was disappointing. I concluded "I’d love to be proven wrong, but it doesn’t feel like this work is taking place in a good part of the effort/benefit curve."


Zig did go on to apply the same style to their two later IRs, ZIR (https://github.com/ziglang/zig/pull/8266) and AIR (https://github.com/ziglang/zig/pull/9353). These don't look as thoroughly benchmarked (at least not on the PRs) but it looks like they got significant wins there as well.

Of course Zig is a very different language and its compiler handles a rather different workload. It's totally possible that their approach makes more sense in younger codebases, or with a different source language design, or whatever else. But I also don't think node size tells the whole story- there's a synergy between memory usage, memory layout, and memory access patterns. For example Cranelift gets a lot of mileage from tweaking their algorithms in combination with their data structures, e.g. the "half-move" design mentioned in https://cfallin.org/blog/2022/06/09/cranelift-regalloc2/#per...


We also migrated all compile-time types and values to be stored in a similar fashion: https://github.com/ziglang/zig/pull/15569

Perf was a bit of a wash on this one but it means we can serialize most of the compiler's state with a single pwritev syscall. For a 300,000 line codebase, this data takes up 30 MiB and uses the same format on disk as in memory. On my laptop, 30 MiB can be written from memory to disk in 25ms. This is one puzzle piece for incremental compilation. More puzzle pieces are listed in the PR description here: https://github.com/ziglang/zig/pull/16917


This is interesting work - thank you for sharing!



I just want to say that this is a very weird but cool approach that I never would have thought of. Nice work!


The link above is to a draft version of the blog post, which is no longer available.

The final version (which has a different title) is at https://nnethercote.github.io/2023/07/11/back-end-parallelis....

A HN post about the final version is at https://news.ycombinator.com/item?id=36675281.


Ah sorry, looks like I got ahead of myself due to my feed reader picking up the post this morning!


No problem, I learned that I shouldn't post drafts and assume they will go unnoticed :)


Thanks for the suggestion!


I didn't know about this. Thanks for the tip!


The book is primarily about Rust-specific things. Section 1 says: "Some of the techniques within are entirely Rust-specific, and some involve ideas that can be applied (often with modifications) to programs written in other languages." I will clarify this.

Having said that, Section 16 does have general optimization advice, and I will consider adding some brief notes about the things you mentioned there.

Thanks for the suggestion.


It was a really weird and messy situation, and unpleasant to live through.

I agree that Brendan's Prop 8 donation was bad. But he did it privately, and never (AFAIK) made anti-LGBT comments in public. People who had worked with him for many years were surprised to find he had these views. It was only found out because of political donation public disclosure laws.

Some Mozilla employees publicly criticized Brendan for the Prop 8 donation, but some defended him, because of the aforementioned privateness of it. A number of the defenses came from LGBT employees.

The pile-on at the time was intense. It lasted more than a week. It reached the front page of my local paper. Crazy stuff.

Brendan chose to stand down as CEO and also quit Mozilla. He wasn't fired, and Mozilla leadership asked him to stay.

All this nuance was lost. Lots of left-leaning people concluded that Mozilla had knowingly promoted a proudly anti-LGBT guy to CEO. Lots of right-leaning people concluded that Mozilla had fired their CEO for his political views. Both conclusions were greatly over-simplified. Almost everyone found a reason to hate Mozilla. Bad times!


> This is a supremely surprising conclusion

That's why I started the paragraph with "Contrary to what you might expect".

As for Stabilizer: "Stabilizer eliminates measurement bias by comprehensively and repeatedly randomizing the placement of functions, stack frames, and heap objects in memory." Those placements can affect cycle counts and wall times a lot, but don't affect instruction counts.


So have you not found in practice any data dependencies or cache issues show up as bottle necks? Or do current tools just make this more of a blind spot for optimization?

Also is there any work to multi-thread the Rust compiler on a more fine-grained level like the recent GCC work? I know you allude to that potentially that would make the instruction counts potentially less reliable so wondering if that's something being explored.

Finally, while I have you, I'm wondering if there's been any exploration of the idea of keeping track of information across builds so that incremental compilation is faster (i.e. only bother recompiling/relinking the parts of the code impacted by a code change). I've always thought that should almost completely eliminate compilation/linking times (at least for debug builds where full utmost optimization is less important).


I mentioned in the post several areas I myself haven't looked at, including cache misses. There may be room for improvements there.

There is an experimental parallel rustc front-end, e.g. see https://internals.rust-lang.org/t/help-test-parallel-rustc/1...

> any exploration of the idea of keeping track of information across builds so that incremental compilation is faster

That's exactly what incremental compilation does.


There's an effort to track which functions are modules & what the downstream implications of that are in terms of needing recompilation? Are there any links to technical descriptions? Super interested in reading up on the technical details involved.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: