Ruby YJIT Ported to Rust

kibwen · on April 20, 2022

Motivation: https://bugs.ruby-lang.org/issues/18481

"The motivation behind this is that we are facing challenges in terms of code maintainability. As you know, JIT compilers can get very complex, and C99 doesn't offer many tools to manage this complexity. There are no classes and methods, limited type checking, and it's hard to fully separate code into modules, for instance."

"We believe that having access to object oriented programming and a more expressive type system would help us manage growing complexity better and also improve the safety/robustness of YJIT. For instance we would like to add Windows support and a new backend to YJIT. That means we’ll have two separate backends (x86, arm64) and we’ll need to support two different calling conventions (Microsoft, SystemV), but currently, we have limited tools to build the abstractions needed, such as preprocessor macros and if-statements."

czbond · on April 20, 2022

Thank you for posting the motivation - I was curious the "why"... and maintainability now explains it.

hardwaregeek · on April 20, 2022

This is so cool! If new contributions to Ruby could be written in Rust, I'd be a lot more inclined to contribute. I don't think I'm alone here. Andy Kelley noted that the new Zig compiler has significantly more contributors, likely due to it being written in Zig and not C++.

Some people may roll their eyes at this, but it is a lot more enticing to work on a Rust codebase than a C/C++ one. I'm less likely to screw up and create a serious bug; I get a lot more help from the compiler; the build system is standardized and simple; and it's just plain fun.

faitswulff · on April 20, 2022

Unfortunately:

> To be clear, it's OK to use Rust to implement YJIT (and other optional features in the future), but mainline CRuby will not be implemented in Rust.

- Matz, https://bugs.ruby-lang.org/issues/18481#note-14

On the other hand, there is Artichoke Ruby: https://github.com/artichoke/artichoke

hardwaregeek · on April 20, 2022

True, although Matz has changed his mind in the past (type annotations come to mind). I wouldn't be surprised if people notice that YJIT contributions are far more common, Matz may reconsider this.

fuzzythinker · on April 20, 2022

Here's a benchmark [1] done in Jan'22 against many ruby implementations, truffleRuby [2] seems to be way ahead in most, and at least ahead in all. Why truffleRuby isn't talk about much here?

[1] https://eregon.me/blog/2022/01/06/benchmarking-cruby-mjit-yj...

[2] https://github.com/oracle/truffleruby

petercooper · on April 21, 2022

I must note that TruffleRuby is a fantastic and genius bit of work with similarly genius folks working on it, but to answer your question.. there's a big cultural aspect. Things involving Oracle, the JVM, and Java generally do not tend to particularly click well with the entire Ruby world (but are just fine in specific niches). Consider that JRuby was significantly faster than CRuby for a long time yet the level of usage it got never really reflected that. Also consider just how poor Windows support was in Ruby for many years - this also wasn't purely a technical issue.

pizza234 · on April 20, 2022

When I read about its performance, I had the same thoughts, however, I was surprised to read this in the Github project readme:

> TruffleRuby might not be fast yet on Rails applications and large programs. Notably, large programs currently take a long time to warmup on TruffleRuby and this is something the TruffleRuby team is currently working on. Large programs often involve more performance-critical code so there is a higher chance of hitting an area of TruffleRuby which has not been optimized yet.

I guess that they have a high-performing JIT, that is optimized for small but not large programs yet. I'm curious though, what, technically, makes such difference.

stormbrew · on April 20, 2022

This is always a problem for any JIT. Large codebases, especially ones as heavy on dynamic code paths as rails, run individual pieces of code less frequently than smaller ones (because they're just doing more work in general, and in rails' case are constantly spawning new code to deal with).

Then you have to instrument the code while it's running under a VM to decide what (and how) to JIT, and then you have to compile and assemble it. You also probably have to deal with some quasi-locking around the call sites as you switch code from using the VM to using the JIT.

So, basically by the laws of thermodynamics, all else equal a JITing VM will be slower than a non-JITing one, and the benefits of JIT won't kick in until you have enough code instrumented and compiled to make a dent in that performance loss from the extra work.

And then, the cleverer your JIT, and the more you optimize the code under compile, the more off-balance this gets, because doing those things gets more expensive.

erk__ · on April 20, 2022

The proposal for this was previously discussed at: https://news.ycombinator.com/item?id=29971360

epage · on April 20, 2022

I found this part odd

> . If YJIT is built in dev mode, then cargo is used to fetch development dependencies, but when building in release, cargo is not required, only rustc

I'm not finding any information on why they bypass cargo and build directly with rustc. I'm curious what requirements led to this.

the_duke · on April 20, 2022

The Cargo.toml file gives the answer: https://github.com/Shopify/ruby/blob/rust-yjit-upstreaming/y...

There is only a single, optional dependency which is apparently only used for testing.

nightpool · on April 20, 2022

My guess is that this is a tradeoff made to make it more compatible with downstream distro packaging systems like Debian and Redhat, who generally look very negatively on requirements to use external package managers like Cargo. By keeping their set of dependencies very small, removing Cargo from their build process has tons of benefits in terms of how complex it will be to compile the Ruby codebase

steveklabnik · on April 20, 2022

Both Debian and Red Hat package Cargo, and also package various Rust crates as their own native packages, and then have Rust programs use Cargo to use them.

(Okay actually I’m unsure about Red Hat, but this is how Fedora does it…)

rtpg · on April 20, 2022

Maybe the production release has no dependencies to download, but the dev release has some helper stuff for running tests etc?

stormbrew · on April 20, 2022

Cargo is .. not a pleasant build tool to integrate into other build systems. If you have a project (like mainline ruby) where the dominant mode is C and you need to integrate some rust into it, you will eventually feel like it'd be a useful use of your time to bypass cargo and use the compiler directly.

Cargo is a fantastic tool, easily one of the best of its ilk, but real talk: it needs to be normalized that sometimes you don't want or need to use it. It is fit to a very specific set of tasks (mostly producing stand-alone binaries), and that set of tasks is a subset of the tasks rust as a whole is fit for.

steveklabnik · on April 20, 2022

Hilariously, years ago I did some Rust/Ruby integration, for fun, and had Ruby’s makefiles just call Cargo to build the rust code. It worked just fine.

It doesn’t work for all things in all cases, of course, but it can be workable. At work we built a build system on top of Cargo to paper over some of its deficiencies. It’s not ideal but IMHO it’s still better than dealing with rustc directly. In this case it’s easier for them since they have no external dependencies.

stormbrew · on April 20, 2022

I mean, calling rustc isn't so bad other than managing dependencies. I don't think it's really all that much more fraught than the compilers of other complex languages (including C++) that people manage to interact with directly. But cargo is simultaneously so good at dealing with dependencies, and (for lack of a better word) parasitic in its integration with nearly every crate in existence, that the moment you want to pull something else in it gets Hard.

Where it gets real messy is if you want to go back and forth (C->rust->C or rust->C->rust where the bookends are in the same codebase). This was a thing we wanted to do at the job I just left, but we never managed to make it work in a way that wasn't very janky. This was in a very mature and large C codebase managed by cmake, where we were gradually eating parts of it with rust, though.

steveklabnik · on April 20, 2022

Yes, absolutely, the end of your first paragraph is really what I mean; you end up having to basically rebuild cargo anyway. If you have a self contained code base, it’s not like rustc is inherently bad to call directly, for sure.

stormbrew · on April 20, 2022

Yeah. I really wish cargo had a "create a build environment" mode. It would make integrating rust into other systems a lot easier.

jfmc · on April 20, 2022

"YJIT code ported from C99 to Rust" Beyond passing the test suite, are there more numbers to compare both versions? (e.g., compilation time, lines of code, size of binaries, performance, etc.)

asymmetric · on April 20, 2022

The PR itself says:

> The new Rust version of YJIT has reached parity with the C version, in that it passes all the CRuby tests, is able to run all of the YJIT benchmarks, and performs similarly to the C version (because it works the same way and largely generates the same machine code). We've even incorporated some design improvements, such as a more fine-grained constant invalidation mechanism which we expect will make a big difference in Ruby on Rails applications.

faitswulff · on April 20, 2022

RoR comparison benchmarks would be nice to see.

matsadler · on April 20, 2022

I think the goal of this right now is just to match the C version.

The C implementation of YJIT supported x86 Unix/Linux platforms, and it sounds like adding Windows and arm64 support, plus other improvements was a daunting task with the tools C provides.

Now it’s in Rust we’ll hopefully see further improvements quicker.

anitil · on April 20, 2022

This is the first time I've felt that Rust is starting to eat C's lunch.

tormeh · on April 20, 2022

Rust will eclipse C++. C is a harder nut to crack, particularly for the embedded space where ease of implementing and maintaining a compiler back-end/code-emitter for your new weird 8-bit architecture is important. C is pretty close to an assembly macro and it's barely updated, which is great for that use-case. But for use cases like interpreters Rust is perfectly suitable.

vlovich123 · on April 20, 2022

This position is like saying C or C++ won't eat ASM's lunch. While technically true since there's a lot of ASM code still being written, especially for extremely low-level or high performance code, the vast majority of C and C++ developers don't actually touch ASM (i.e. C/C++ dominate ASM in terms of number of developer hours spent).

I think you may also be overlooking the GCC backend for rustc and gccrs, a ground-up standalone reimplementation of the Rust language frontend for GCC. Both of those should drastically improve the coverage and availability of Rust to all the same platforms you would be using GCC to compile C code for.

Depending on the compiler support, you might get that architecture for free unless the vendor is providing their own C compiler. The harder part is that your new weird 8-bit architecture probably won't benefit as much from the strong nostd ecosystem of libraries, so the overhead of writing Rust won't be counterbalanced. Still, like I said at the outset, this is an extremely niche use-case. Rust doesn't have to wipe C or C++ from the map for it to crack that nut.

The harder nut for Rust to crack I think is actually C++. There are extremely large C++ codebases. Industry would love for there to be a significantly easier/cheaper story to tell in terms of integrating Rust with those codebases. That way you could set metrics around converting the codebase, new code has to be written in Rust etc. However, the challenge is that Rust can only replace components with very well-defined boundaries. Those boundaries are less clearly defined in C++ codebases than they are in C codebases (linkage + templates in particular are challenging). To truly crack the C++ nut probably requires solving this problem unless Rust codebases just starting eating C++ codebases commercially through development velocity (which is a much longer and harder path).

rapsey · on April 20, 2022

Very few will actually rewrite code in rust. It is enough for Rust to be used for new projects which would otherwise be c or c++

vlovich123 · on April 21, 2022

It's extremely important for meaningful commercial Rust adoption for legacy codebases to be able to adopt it incrementally (i.e. all new code is Rust). I think you're underestimating how much C/C++ code there is out there (Linux Kernel, Chrome, all of Google's internal infrastructure, all of Amazon's internal infrastructure etc). We're talking about many billions dollars worth of code that is never going to get rewritten and lines of code that keep accruing. Now competitors starting today may make other choices but there's enormous value to be had by cracking the nut of seamless integration of progressive migration (i.e. so that you can say "no more new C++ code"). The failure of this lesson is seen in banks that continue to run on Fortran at best and at worst other businesses that continue to run on old unsupported languages/technologies. Thankfully, I think the tech companies are engineering-led and understand this so I suspect they're paying people to figure out this problem.

isaacimagine · on April 21, 2022

If gcc supports Rust as a frontend ootb, I could see existing C/++ projects incrementally adopting components written in the language.

pjmlp · on April 20, 2022

Good luck with that on anything GPU or HPC related, or industries with language standards.

Also until Rust compilers are bootstraped, they will always rely on a C++ infrastructure.

FullyFunctional · on April 20, 2022

I've been around for a long time and I haven't seen a PL with this much momentum since Java was launched. Inertia is real, but the benefits over C++ are undeniable.

Bootstrapping seems a silly thing to be obsessed about as C++ will be around forever still, but it obviously can be bootstrapped if that becomes important.

pjmlp · on April 20, 2022

Ada/SPARK already provided such benefits, and NVidia has chosen it instead of Rust for automotive firmware.

Rust momentum is meaningless for GPUs unless NVidia decides it gets to play in CUDA, and they are now one of the companies with more ISO C++ people on their payroll.

It is also meaningless for PlayStation, Nintendo and Xbox, unless the respective SDKs integrate Rust.

Bootstraping isn't silly, because LLVM and GCC are written in C++, so there isn't any "Rust will eclipse C+", when it depends on it for its existence.

pcwalton · on April 20, 2022

SPARK doesn't provide the same feature set as Rust. If you want safe heap allocation in SPARK, then you get a garbage collector (unless you're talking really recent experimental extensions IIRC). If you want to forego the GC and remain memory-safe, then you also forego heap allocation. This might work for avionics code, but not for most apps.

Besides, the post you're replying to is talking about "momentum", and it's obvious in 2022 that Ada doesn't have the momentum that Rust does (however you define "momentum"). NVIDIA is not the entire industry.

Much of the rest of your post concerns video games, which are only a small portion of the total C++ code in existence. (And in any case it's not accurate to say that languages are "meaningless" unless the platform vendor officially supports them—console vendors don't maintain C# VMs either and yet Unity titles work just fine.)

pjmlp · on April 20, 2022

What garbage collector? Ada never had one, besides the optional one in early standards, never implemented in any commercial compiler, thus removed in Ada 2012.

I wasn't the one asserting momentum, and can relate to plenty of other industries where Rust isn't even on the radar.

Going back to Ada example, Rust certainly doesn't have any momentum over Ada in high integrity computing.

Console vendors do happen to collaborate with Unity, and make it first party on their SDKs, so yet another lack of information.

pcwalton · on April 20, 2022

WebRender is certainly "GPU related" and is shipping to millions of happy Firefox users.

And yes, LLVM is written in C++. So what? C++ compilers depend on C code in libc. Portions of libc are written in assembler. Some assembly instructions are decomposed into microcode. Yet nobody doubts that C++ has eclipsed assembly language in terms of importance to the industry nowadays. We'll always need a way for humans to read the actual instructions that the silicon interprets, but relatively few people need to be able to do that nowadays. That dynamic is what the parent post means by one language "eclipsing" another.

pjmlp · on April 20, 2022

For how long? 3% and decreasing.

Libc is UNIX only.

As for the rest, it is useful to tone down hype with some cold water reality check.

Ar-Curunir · on April 20, 2022

> tone down hype with some cold water reality check.

I mean, you're the one who keeps mentioning Ada/SPARK on every Rust thread, so if anyone needs to stop hyping things, it's perhaps you?

pjmlp · on April 21, 2022

Where is Rust powering avionics and high integrity systems in production for the last 40 years? The very definition of secure software.

No one is asserting how Ada is going to wipe language XYZ.

rubyfan · on April 20, 2022

Can you say more about why you think that?

anitil · on April 21, 2022

I started in C and generally when people say 'rewrite it in Rust' I just roll my eyes, because I know how hard that is. But seeing it happen on a sophisticated project has made me take another look.

Obviously for the embedded world everything is pitched at C currently and I don't think that will change, but for larger projects this is proof that my intuition was wrong.

I suppose that's a long winded way of saying that it might be time for me to learn Rust.

tmikaeld · on April 20, 2022

Was looking for the same thing, what does this mean for Ruby performance?

Rafert · on April 20, 2022

YJIT benchmarks can be found at https://speed.yjit.org/

The Rust port doesn't change performance much according to the pull request description.

ewalk153 · on April 20, 2022

I didn't know this was public, sweet! Nice that the tooling that generates this report is also published: https://github.com/Shopify/yjit-metrics

npalli · on April 20, 2022

Your own link states --

Overall YJIT is 33.4% faster than interpreted CRuby! On Railsbench specifically, YJIT is 32.4% faster than CRuby!

WJW · on April 20, 2022

Yes, but YJIT in rust is the same ~33.4% faster than vanilla CRuby than YJIT in C. The rewrite into Rust is expected to make YJIT easier to maintain and that may in turn make possible further improvements to code generation, but the rewrite generates the same machine code (and therefore the same speedup) as before.

ModernMech · on April 20, 2022

According to the post not much. The Rust version performs about the same because it generates mostly the same machine code.

pizza234 · on April 20, 2022

Very likely, the performance of a JIT comes from:

- the architecture of the JIT itself

- the generated code

AFAIK, the Rust YJIT doesn't change any (they explicitly say that the generated code is approximately the same), so there no significant difference in performance should be expected.

rvz · on April 20, 2022

It means completely nothing for performance.

block_dagger · on April 20, 2022

Nothing

MuffinFlavored · on April 20, 2022

I wonder if they found any bugs in the C99 version due to Rust's "memory/type safety" and all that?

dj_gitmo · on April 20, 2022

> ... it works the same way and largely generates the same machine code

How can they make this determination? Do they just eyeball a few sections of the machine code from each output? Is there some tool that can compare binaries? Is this just a very literal, function by function, translation from C to Rust?

I don't know much reading/comparing machine code.

pjmlp · on April 20, 2022

They mean the output of JIT compiler, not the binary itself.

pizza234 · on April 20, 2022

I'm not familiar with what YJIT generates, however, in general terms, if the ASM code for a given bytecode is small enough (Which I think it is), one can just compare them side by side, or just log them and compare them separately. I think a JIT for Ruby should compile relatively small chunks of ASM, not big walls of code (but again, this is my guess).

xutopia · on April 20, 2022

How will this benefit Ruby?

riffraff · on April 20, 2022

if YJIT is successful, ruby will be faster, which is good™. The rationale for the rust rewrite is that rust may be better suited for writing a JIT than C is.

coder543 · on April 20, 2022

Explained here: https://bugs.ruby-lang.org/issues/18481

WolfOliver · on April 20, 2022

What is the benefit? Can I run Ruby in the browser now?

WJW · on April 20, 2022

Yes you can run Ruby in the browser if you want, but not because of this PR. Ruby-in-WASM was merged a few weeks ago.

This PR rewrites the YJIT just-in-time compiler code from C into Rust, because the dev team likes Rust better and expects that it will make development of new features easier.

vinceguidry · on April 20, 2022

Don't forget Opal!

https://opalrb.com/

throwaway-m3232 · on April 20, 2022

Why not C++, for better portability? If I want to design my own CPU, I will have to add it to GCC. But Rust is LLVM so if I want to support Ruby-jit on my CPU, I will also will have to support LLVM.

brobinson · on April 20, 2022

Why not a memory safe language, to avoid those 70% of CVEs?

(67% of 0-days last year: https://news.ycombinator.com/item?id=31085539)

infamouscow · on April 20, 2022

Because Ruby is already memory safe and JIT miscompilation is a logic bug.

pizza234 · on April 20, 2022

The fact that a language is memory safe doesn't imply that the underlying virtual machine/interpreter is.

On the other hand, it's definitely true that the ASM generated is as unsafe as it gets, but the first point still stands. The memory unsafety of the VM is simply an additional attack vector.

carlmr · on April 20, 2022

How is JIT miscompilation or vulnerabilities in the JIT compiler not an issue?

infamouscow · on April 20, 2022

A Ruby program can delete all of the files on a computer, insert arbitrary rows into a database, drop a table, send email with attachments, etc. Am I correct that you're concerned the Ruby JIT itself will have a security vulnerability in the act of JIT compiling Ruby code? This seems extremely myopic.

criticaltinker · on April 20, 2022

JS engines have had many serious vulnerabilities in their JIT optimizers, it’s not myopic at all and is a well known technique in the industry.

I agree that some folks aren’t executing untrusted ruby code so they wouldn’t have to worry about this - but how many PaaS/SaaS products out there are? Or how about third party dev tools that are blindly downloaded and executed on local workstations or CI pipelines?

infamouscow · on April 20, 2022

> JS engines have had many serious vulnerabilities in their JIT optimizers, it’s not myopic at all and is a well known technique in the industry.

HotSpot and V8 are both written in C++ and get more use than any other JIT on Earth.

Can you provide a link to a CVE caused by JIT miscompilation and explain how Rust would have been able to prevent the bug in a way that C++ wouldn't?

> I agree that some folks aren’t executing untrusted ruby code so they wouldn’t have to worry about this - but how many PaaS/SaaS products out there are?

This is what Xen, KVM, and Hyper-V do.

> Or how about third party dev tools that are blindly downloaded and executed on local workstations or CI pipelines?

Are you suggesting a Ruby JIT shouldn't generate machine code that corresponds to the Ruby program, but somehow magically prevent stupid developers from doing stupid things?

bastawhiz · on April 20, 2022

It's a bad look if a malicious HTTP request to your Rails app can trigger RCE on your server. It's not about running code that's malicious, it's about bad data triggering a code path in the VM that is able to change the function of the application.

infamouscow · on April 20, 2022

What you're describing is a logic bug.

JITs write instructions to memory in a manner that's only slightly different than writing bytes to a file. The generation of those instructions can either be correct or incorrect and happens regardless of programming language.

A JIT written in Python is equally capable of generating bad code as a JIT written in C or Rust or Lisp. A perfect port of a buggy JIT written in language A will generate the same buggy code even after being ported to language B.

Tobu · on April 20, 2022

Rust's type system is enough to get rid of memory safety and UB, but it does that by enforcing more invariants, invariants which you also use to encode properties you care about. 70% percent of vulnerabilities are memory unsafety which is impossible in safe Rust etc etc, but a better type system, a language that doesn't disclaim commonly found code as unsupported, more productive errors, lower cognitive load… also tends to help with the rest of the bugs.

mustache_kimono · on April 20, 2022

I'm not sure I understand why some people really hate Rust, but when the argument feels like "But can't we be miserable forever?" I just have to laugh.

mustache_kimono · on April 20, 2022

FYI -- my technical thinking -- because Rust is a nicer language for the people who have to work with it. Full stop.

Rust offers substantial memory safety guarantees, but that isn't the only thing it offers. People who don't know this are those that haven't tried it. Others have focused on security in this thread, and I think that's wrong headed. That's obviously not the reason for choosing Rust here. It's that it makes things that are important now and in the future, like say concurrency, easier and more likely to be correct. Yes, ergonomics and a nice dev experience actually matter even for the people writing your compiler!

Moreover, Rust GCC support is far closer to being a thing that yjit is to being a thing. So -- let the kids play.

sanxiyn · on April 20, 2022

This is a non-issue. YJIT only targets x86-64. After all, this is a JIT. If you designed a new architecture X, you need to port YJIT itself to target X, in addition to GCC, LLVM, etc.

throwaway-m3232 · on April 20, 2022

Oh, so YJIT is highly coupled to x86-64? Porting GCC + yjit is less work than porting GCC + yjit + LLVM.

byroot · on April 20, 2022

It’s not that it’s highly coupled, just that it’s still the early days and only x86_64 was on the roadmap. Arm64 is planned, and will hopefully make it into Ruby 3.2

FullyFunctional · on April 20, 2022

And with an Arm64 backend, adding RISC-V is probably going to be a walk in the park.

FooBarWidget · on April 20, 2022

It's not like new architectures appear very quickly, much less adopted very quickly. The benefits of maintenance overhead reduction and development speed increase, far outweight the theoretical downside of having to port LLVM to that new architecture.

lalaithion · on April 20, 2022

Why even port GCC at all, and not simply LLVM?

lnxg33k1 · on April 20, 2022

But why not just buy an existing CPU on amazon

dkersten · on April 20, 2022

So just port LLVM + yjit.

matharmin · on April 20, 2022

If you want to design your own CPU, supporting LLVM is going to give you much greater benefits than supporting Ruby. Nevermind the fact that you don't even need this to support Ruby.

Tobu · on April 20, 2022

To add to your point, following Woodruff's "Weird architectures weren't supported to begin with", Robert O'Callahan pointed out[1] that for one definition of the open-source platform (looking at the requirements of Linux distributions), a new architecture would need to support at least: LLVM and GCC targets, a port of the Linux kernel, a V8 backend, and acceleration for various codecs.

And while at this point a platform needs to have support from both compilers, I can see the GCC/glibc ecosystem being made redundant; LLVM is more adaptable and has found its way into so many specialized compiler stacks.

[1]: https://lwn.net/Articles/847830/

bilkow · on April 20, 2022

> If I want to design my own CPU, I will have to add it to GCC.

Why do you "have" to add it to GCC? You could only add it to LLVM instead.

cesarb · on April 20, 2022

> I will also will have to support LLVM.

This won't be an issue for long, as there's already a GCC backend for Rust in development.

unrealhoang · on April 20, 2022

Because Rust is much easier to learn than C++ so the authors are more comfortable with Rust?

antonvs · on April 20, 2022

C++ is a 28-year old language that's been showing its age for at least a decade or two. If we want the software world to progress we need to move on from such languages.

cjg · on April 20, 2022

Rust has some GCC support.