"The motivation behind this is that we are facing challenges in terms of code maintainability. As you know, JIT compilers can get very complex, and C99 doesn't offer many tools to manage this complexity. There are no classes and methods, limited type checking, and it's hard to fully separate code into modules, for instance."
"We believe that having access to object oriented programming and a more expressive type system would help us manage growing complexity better and also improve the safety/robustness of YJIT. For instance we would like to add Windows support and a new backend to YJIT. That means we’ll have two separate backends (x86, arm64) and we’ll need to support two different calling conventions (Microsoft, SystemV), but currently, we have limited tools to build the abstractions needed, such as preprocessor macros and if-statements."
This is so cool! If new contributions to Ruby could be written in Rust, I'd be a lot more inclined to contribute. I don't think I'm alone here. Andy Kelley noted that the new Zig compiler has significantly more contributors, likely due to it being written in Zig and not C++.
Some people may roll their eyes at this, but it is a lot more enticing to work on a Rust codebase than a C/C++ one. I'm less likely to screw up and create a serious bug; I get a lot more help from the compiler; the build system is standardized and simple; and it's just plain fun.
True, although Matz has changed his mind in the past (type annotations come to mind). I wouldn't be surprised if people notice that YJIT contributions are far more common, Matz may reconsider this.
Here's a benchmark [1] done in Jan'22 against many ruby implementations, truffleRuby [2] seems to be way ahead in most, and at least ahead in all. Why truffleRuby isn't talk about much here?
I must note that TruffleRuby is a fantastic and genius bit of work with similarly genius folks working on it, but to answer your question.. there's a big cultural aspect. Things involving Oracle, the JVM, and Java generally do not tend to particularly click well with the entire Ruby world (but are just fine in specific niches). Consider that JRuby was significantly faster than CRuby for a long time yet the level of usage it got never really reflected that. Also consider just how poor Windows support was in Ruby for many years - this also wasn't purely a technical issue.
When I read about its performance, I had the same thoughts, however, I was surprised to read this in the Github project readme:
> TruffleRuby might not be fast yet on Rails applications and large programs. Notably, large programs currently take a long time to warmup on TruffleRuby and this is something the TruffleRuby team is currently working on. Large programs often involve more performance-critical code so there is a higher chance of hitting an area of TruffleRuby which has not been optimized yet.
I guess that they have a high-performing JIT, that is optimized for small but not large programs yet. I'm curious though, what, technically, makes such difference.
This is always a problem for any JIT. Large codebases, especially ones as heavy on dynamic code paths as rails, run individual pieces of code less frequently than smaller ones (because they're just doing more work in general, and in rails' case are constantly spawning new code to deal with).
Then you have to instrument the code while it's running under a VM to decide what (and how) to JIT, and then you have to compile and assemble it. You also probably have to deal with some quasi-locking around the call sites as you switch code from using the VM to using the JIT.
So, basically by the laws of thermodynamics, all else equal a JITing VM will be slower than a non-JITing one, and the benefits of JIT won't kick in until you have enough code instrumented and compiled to make a dent in that performance loss from the extra work.
And then, the cleverer your JIT, and the more you optimize the code under compile, the more off-balance this gets, because doing those things gets more expensive.
> . If YJIT is built in dev mode, then cargo is used to fetch development dependencies, but when building in release, cargo is not required, only rustc
I'm not finding any information on why they bypass cargo and build directly with rustc. I'm curious what requirements led to this.
My guess is that this is a tradeoff made to make it more compatible with downstream distro packaging systems like Debian and Redhat, who generally look very negatively on requirements to use external package managers like Cargo. By keeping their set of dependencies very small, removing Cargo from their build process has tons of benefits in terms of how complex it will be to compile the Ruby codebase
Both Debian and Red Hat package Cargo, and also package various Rust crates as their own native packages, and then have Rust programs use Cargo to use them.
(Okay actually I’m unsure about Red Hat, but this is how Fedora does it…)
Cargo is .. not a pleasant build tool to integrate into other build systems. If you have a project (like mainline ruby) where the dominant mode is C and you need to integrate some rust into it, you will eventually feel like it'd be a useful use of your time to bypass cargo and use the compiler directly.
Cargo is a fantastic tool, easily one of the best of its ilk, but real talk: it needs to be normalized that sometimes you don't want or need to use it. It is fit to a very specific set of tasks (mostly producing stand-alone binaries), and that set of tasks is a subset of the tasks rust as a whole is fit for.
Hilariously, years ago I did some Rust/Ruby integration, for fun, and had Ruby’s makefiles just call Cargo to build the rust code. It worked just fine.
It doesn’t work for all things in all cases, of course, but it can be workable. At work we built a build system on top of Cargo to paper over some of its deficiencies. It’s not ideal but IMHO it’s still better than dealing with rustc directly. In this case it’s easier for them since they have no external dependencies.
I mean, calling rustc isn't so bad other than managing dependencies. I don't think it's really all that much more fraught than the compilers of other complex languages (including C++) that people manage to interact with directly. But cargo is simultaneously so good at dealing with dependencies, and (for lack of a better word) parasitic in its integration with nearly every crate in existence, that the moment you want to pull something else in it gets Hard.
Where it gets real messy is if you want to go back and forth (C->rust->C or rust->C->rust where the bookends are in the same codebase). This was a thing we wanted to do at the job I just left, but we never managed to make it work in a way that wasn't very janky. This was in a very mature and large C codebase managed by cmake, where we were gradually eating parts of it with rust, though.
Yes, absolutely, the end of your first paragraph is really what I mean; you end up having to basically rebuild cargo anyway. If you have a self contained code base, it’s not like rustc is inherently bad to call directly, for sure.
"YJIT code ported from C99 to Rust"
Beyond passing the test suite, are there more numbers to compare both versions? (e.g., compilation time, lines of code, size of binaries, performance, etc.)
> The new Rust version of YJIT has reached parity with the C version, in that it passes all the CRuby tests, is able to run all of the YJIT benchmarks, and performs similarly to the C version (because it works the same way and largely generates the same machine code). We've even incorporated some design improvements, such as a more fine-grained constant invalidation mechanism which we expect will make a big difference in Ruby on Rails applications.
I think the goal of this right now is just to match the C version.
The C implementation of YJIT supported x86 Unix/Linux platforms, and it sounds like adding Windows and arm64 support, plus other improvements was a daunting task with the tools C provides.
Now it’s in Rust we’ll hopefully see further improvements quicker.
Rust will eclipse C++. C is a harder nut to crack, particularly for the embedded space where ease of implementing and maintaining a compiler back-end/code-emitter for your new weird 8-bit architecture is important. C is pretty close to an assembly macro and it's barely updated, which is great for that use-case. But for use cases like interpreters Rust is perfectly suitable.
This position is like saying C or C++ won't eat ASM's lunch. While technically true since there's a lot of ASM code still being written, especially for extremely low-level or high performance code, the vast majority of C and C++ developers don't actually touch ASM (i.e. C/C++ dominate ASM in terms of number of developer hours spent).
I think you may also be overlooking the GCC backend for rustc and gccrs, a ground-up standalone reimplementation of the Rust language frontend for GCC. Both of those should drastically improve the coverage and availability of Rust to all the same platforms you would be using GCC to compile C code for.
Depending on the compiler support, you might get that architecture for free unless the vendor is providing their own C compiler. The harder part is that your new weird 8-bit architecture probably won't benefit as much from the strong nostd ecosystem of libraries, so the overhead of writing Rust won't be counterbalanced. Still, like I said at the outset, this is an extremely niche use-case. Rust doesn't have to wipe C or C++ from the map for it to crack that nut.
The harder nut for Rust to crack I think is actually C++. There are extremely large C++ codebases. Industry would love for there to be a significantly easier/cheaper story to tell in terms of integrating Rust with those codebases. That way you could set metrics around converting the codebase, new code has to be written in Rust etc. However, the challenge is that Rust can only replace components with very well-defined boundaries. Those boundaries are less clearly defined in C++ codebases than they are in C codebases (linkage + templates in particular are challenging). To truly crack the C++ nut probably requires solving this problem unless Rust codebases just starting eating C++ codebases commercially through development velocity (which is a much longer and harder path).
It's extremely important for meaningful commercial Rust adoption for legacy codebases to be able to adopt it incrementally (i.e. all new code is Rust). I think you're underestimating how much C/C++ code there is out there (Linux Kernel, Chrome, all of Google's internal infrastructure, all of Amazon's internal infrastructure etc). We're talking about many billions dollars worth of code that is never going to get rewritten and lines of code that keep accruing. Now competitors starting today may make other choices but there's enormous value to be had by cracking the nut of seamless integration of progressive migration (i.e. so that you can say "no more new C++ code"). The failure of this lesson is seen in banks that continue to run on Fortran at best and at worst other businesses that continue to run on old unsupported languages/technologies. Thankfully, I think the tech companies are engineering-led and understand this so I suspect they're paying people to figure out this problem.
I've been around for a long time and I haven't seen a PL with this much momentum since Java was launched. Inertia is real, but the benefits over C++ are undeniable.
Bootstrapping seems a silly thing to be obsessed about as C++ will be around forever still, but it obviously can be bootstrapped if that becomes important.
Ada/SPARK already provided such benefits, and NVidia has chosen it instead of Rust for automotive firmware.
Rust momentum is meaningless for GPUs unless NVidia decides it gets to play in CUDA, and they are now one of the companies with more ISO C++ people on their payroll.
It is also meaningless for PlayStation, Nintendo and Xbox, unless the respective SDKs integrate Rust.
Bootstraping isn't silly, because LLVM and GCC are written in C++, so there isn't any "Rust will eclipse C+", when it depends on it for its existence.
SPARK doesn't provide the same feature set as Rust. If you want safe heap allocation in SPARK, then you get a garbage collector (unless you're talking really recent experimental extensions IIRC). If you want to forego the GC and remain memory-safe, then you also forego heap allocation. This might work for avionics code, but not for most apps.
Besides, the post you're replying to is talking about "momentum", and it's obvious in 2022 that Ada doesn't have the momentum that Rust does (however you define "momentum"). NVIDIA is not the entire industry.
Much of the rest of your post concerns video games, which are only a small portion of the total C++ code in existence. (And in any case it's not accurate to say that languages are "meaningless" unless the platform vendor officially supports them—console vendors don't maintain C# VMs either and yet Unity titles work just fine.)
What garbage collector? Ada never had one, besides the optional one in early standards, never implemented in any commercial compiler, thus removed in Ada 2012.
I wasn't the one asserting momentum, and can relate to plenty of other industries where Rust isn't even on the radar.
Going back to Ada example, Rust certainly doesn't have any momentum over Ada in high integrity computing.
Console vendors do happen to collaborate with Unity, and make it first party on their SDKs, so yet another lack of information.
WebRender is certainly "GPU related" and is shipping to millions of happy Firefox users.
And yes, LLVM is written in C++. So what? C++ compilers depend on C code in libc. Portions of libc are written in assembler. Some assembly instructions are decomposed into microcode. Yet nobody doubts that C++ has eclipsed assembly language in terms of importance to the industry nowadays. We'll always need a way for humans to read the actual instructions that the silicon interprets, but relatively few people need to be able to do that nowadays. That dynamic is what the parent post means by one language "eclipsing" another.
I started in C and generally when people say 'rewrite it in Rust' I just roll my eyes, because I know how hard that is. But seeing it happen on a sophisticated project has made me take another look.
Obviously for the embedded world everything is pitched at C currently and I don't think that will change, but for larger projects this is proof that my intuition was wrong.
I suppose that's a long winded way of saying that it might be time for me to learn Rust.
Yes, but YJIT in rust is the same ~33.4% faster than vanilla CRuby than YJIT in C. The rewrite into Rust is expected to make YJIT easier to maintain and that may in turn make possible further improvements to code generation, but the rewrite generates the same machine code (and therefore the same speedup) as before.
AFAIK, the Rust YJIT doesn't change any (they explicitly say that the generated code is approximately the same), so there no significant difference in performance should be expected.
> ... it works the same way and largely generates the same machine code
How can they make this determination? Do they just eyeball a few sections of the machine code from each output? Is there some tool that can compare binaries?
Is this just a very literal, function by function, translation from C to Rust?
I'm not familiar with what YJIT generates, however, in general terms, if the ASM code for a given bytecode is small enough (Which I think it is), one can just compare them side by side, or just log them and compare them separately. I think a JIT for Ruby should compile relatively small chunks of ASM, not big walls of code (but again, this is my guess).
if YJIT is successful, ruby will be faster, which is good™. The rationale for the rust rewrite is that rust may be better suited for writing a JIT than C is.
Yes you can run Ruby in the browser if you want, but not because of this PR. Ruby-in-WASM was merged a few weeks ago.
This PR rewrites the YJIT just-in-time compiler code from C into Rust, because the dev team likes Rust better and expects that it will make development of new features easier.
Why not C++, for better portability? If I want to design my own CPU, I will have to add it to GCC. But Rust is LLVM so if I want to support Ruby-jit on my CPU, I will also will have to support LLVM.
The fact that a language is memory safe doesn't imply that the underlying virtual machine/interpreter is.
On the other hand, it's definitely true that the ASM generated is as unsafe as it gets, but the first point still stands. The memory unsafety of the VM is simply an additional attack vector.
A Ruby program can delete all of the files on a computer, insert arbitrary rows into a database, drop a table, send email with attachments, etc. Am I correct that you're concerned the Ruby JIT itself will have a security vulnerability in the act of JIT compiling Ruby code? This seems extremely myopic.
JS engines have had many serious vulnerabilities in their JIT optimizers, it’s not myopic at all and is a well known technique in the industry.
I agree that some folks aren’t executing untrusted ruby code so they wouldn’t have to worry about this - but how many PaaS/SaaS products out there are? Or how about third party dev tools that are blindly downloaded and executed on local workstations or CI pipelines?
> JS engines have had many serious vulnerabilities in their JIT optimizers, it’s not myopic at all and is a well known technique in the industry.
HotSpot and V8 are both written in C++ and get more use than any other JIT on Earth.
Can you provide a link to a CVE caused by JIT miscompilation and explain how Rust would have been able to prevent the bug in a way that C++ wouldn't?
> I agree that some folks aren’t executing untrusted ruby code so they wouldn’t have to worry about this - but how many PaaS/SaaS products out there are?
This is what Xen, KVM, and Hyper-V do.
> Or how about third party dev tools that are blindly downloaded and executed on local workstations or CI pipelines?
Are you suggesting a Ruby JIT shouldn't generate machine code that corresponds to the Ruby program, but somehow magically prevent stupid developers from doing stupid things?
It's a bad look if a malicious HTTP request to your Rails app can trigger RCE on your server. It's not about running code that's malicious, it's about bad data triggering a code path in the VM that is able to change the function of the application.
JITs write instructions to memory in a manner that's only slightly different than writing bytes to a file. The generation of those instructions can either be correct or incorrect and happens regardless of programming language.
A JIT written in Python is equally capable of generating bad code as a JIT written in C or Rust or Lisp. A perfect port of a buggy JIT written in language A will generate the same buggy code even after being ported to language B.
Rust's type system is enough to get rid of memory safety and UB, but it does that by enforcing more invariants, invariants which you also use to encode properties you care about. 70% percent of vulnerabilities are memory unsafety which is impossible in safe Rust etc etc, but a better type system, a language that doesn't disclaim commonly found code as unsupported, more productive errors, lower cognitive load… also tends to help with the rest of the bugs.
I'm not sure I understand why some people really hate Rust, but when the argument feels like "But can't we be miserable forever?" I just have to laugh.
FYI -- my technical thinking -- because Rust is a nicer language for the people who have to work with it. Full stop.
Rust offers substantial memory safety guarantees, but that isn't the only thing it offers. People who don't know this are those that haven't tried it. Others have focused on security in this thread, and I think that's wrong headed. That's obviously not the reason for choosing Rust here. It's that it makes things that are important now and in the future, like say concurrency, easier and more likely to be correct. Yes, ergonomics and a nice dev experience actually matter even for the people writing your compiler!
Moreover, Rust GCC support is far closer to being a thing that yjit is to being a thing. So -- let the kids play.
This is a non-issue. YJIT only targets x86-64. After all, this is a JIT. If you designed a new architecture X, you need to port YJIT itself to target X, in addition to GCC, LLVM, etc.
It’s not that it’s highly coupled, just that it’s still the early days and only x86_64 was on the roadmap. Arm64 is planned, and will hopefully make it into Ruby 3.2
It's not like new architectures appear very quickly, much less adopted very quickly. The benefits of maintenance overhead reduction and development speed increase, far outweight the theoretical downside of having to port LLVM to that new architecture.
If you want to design your own CPU, supporting LLVM is going to give you much greater benefits than supporting Ruby. Nevermind the fact that you don't even need this to support Ruby.
To add to your point, following Woodruff's "Weird architectures weren't supported to begin with", Robert O'Callahan pointed out[1] that for one definition of the open-source platform (looking at the requirements of Linux distributions), a new architecture would need to support at least: LLVM and GCC targets, a port of the Linux kernel, a V8 backend, and acceleration for various codecs.
And while at this point a platform needs to have support from both compilers, I can see the GCC/glibc ecosystem being made redundant; LLVM is more adaptable and has found its way into so many specialized compiler stacks.
C++ is a 28-year old language that's been showing its age for at least a decade or two. If we want the software world to progress we need to move on from such languages.
"The motivation behind this is that we are facing challenges in terms of code maintainability. As you know, JIT compilers can get very complex, and C99 doesn't offer many tools to manage this complexity. There are no classes and methods, limited type checking, and it's hard to fully separate code into modules, for instance."
"We believe that having access to object oriented programming and a more expressive type system would help us manage growing complexity better and also improve the safety/robustness of YJIT. For instance we would like to add Windows support and a new backend to YJIT. That means we’ll have two separate backends (x86, arm64) and we’ll need to support two different calling conventions (Microsoft, SystemV), but currently, we have limited tools to build the abstractions needed, such as preprocessor macros and if-statements."