How to write slow Rust code

brundolf · on Aug 2, 2021

Implementing a Lisp cons algorithm as-is using a Vec is so far from being apples-to-apples that it makes me question the results of the entire article, despite the fact that things were eventually "optimized" to not do this.

Gathering a list by calling cons over and over is only remotely advisable when you're working with a linked-list (which you are, in Lisp). Blindly replacing it with a Vec demonstrates either a total ignorance of CS fundamentals, or a willingness for the benchmarks to come out a certain way.

A real "apples-to-apples" comparison would at least have Rust using an actual linked-list for this algorithm. A mutable Vec is more idiomatic and probably more efficient, but cloning an immutable Vec at every step turns O(N) into O(N^2).

Note that this isn't Rust apologism: the same would be true if the author did this with a C++ std::vector, or a Java ArrayList, or a Python list, or a JavaScript array. Most languages' go-to structures for holding a sequence have roughly the performance characteristics of a vector.

I guess the article at least fulfills its title: "How to write slow Rust code"

lalaithion · on Aug 2, 2021

The more general view to take from this is that when you program a lot in one language, you get so used to certain performance patterns that you simply write your code in a way that takes advantage of them first try[1]. Peter Norvig's Lisp solution is an optimized solution for Lisp in a lot of little ways.

Directly translating an optimized-for-Lisp program into any other language is silly, because you miss out on the optimizations available in that language to an experienced programmer and you lose out doing pointless things which were essential in Lisp.

brundolf · on Aug 2, 2021

I'm arguing that this case stands above that. This isn't some nuance of either language; it's CS 101

tux3 · on Aug 2, 2021

I was going to argue the same: This isn't Java vs Rust, it's the author's familiarity with Lisps not translating very well to less purely functional languages.

Writing an O(N^2) algorithm should give you pause in any imperative language, if you can spot the copy.

The author is clearly not a beginner dev, but the mistake here isn't really Rust specific, it's "vaguely imperative systems language" programming patterns

brundolf · on Aug 2, 2021

That's not quite it; it's not really about functional vs imperative (though there's some loose correlation)

Common Lisp isn't purely functional (it's arguably the least-functional of the major Lisps), and in fact it has a vector data structure of its own, so you could make the same mistake there if you went out of your way to. Similarly, Rust has a LinkedList in its standard library. This algorithm is simply O(N) if you do it with a linked-list, and O(N^2) if you do it with a naive vector, regardless of language or programming style.

tux3 · on Aug 2, 2021

Fair enough, though I would argue Rust's Vec being the 'default' data structure plays a big role in non-beginners avoiding that mistake, versus Lisp where you more rarely see people reaching for vectors (though they exist).

I have, in fact, never seen Rust code using a linked-list in the wild, though obviously there are many valid uses cases where linked lists beat vectors (and I probably don't need to tell you =]), and surely there are many examples. And the same goes for C++, you don't reach for std::list without a specific reason. Those de-facto 'defaults' in traditional 'vaguely imperative/systems' languages steer people towards different code styles.

Of course both have access to all mainstream data structure, but I'd like to think the author does know about the complexity of linked-lists versus vectors, they only made that mistake because they probably don't reach for vectors as often in Common Lisp as someone in Rust would — or presumably they would have quickly gained an innate eye for expensive copies in recursive functions

phoe-krk · on Aug 2, 2021

> the same would be true if the author did this with a C++ std::vector, or a Java ArrayList, or a Python list, or a JavaScript array.

...or, to be fully honest, a Common Lisp VECTOR that is then CONCATENATE'd with a new element at its beginning.

As I said in another comment, this isn't the problem of using a given language - it's the problem of using a wrong data structure for the job.

brundolf · on Aug 2, 2021

Exactly. And the fact that it's framed as "well I just wanted a straight comparison, I didn't want to wade too far into idioms and Rust-isms in the first pass" feels disingenuous.

phoe-krk · on Aug 2, 2021

It seems true to me that working with the Rust borrow checker is eventually fruitful, but this post shows a micro-case where working with lifetimes seems like a lot mental overhead with little gain, compared to a built-in garbage collector available in Lisp.

There is a part over there that compares:

    (cons word words)

to:

    let mut partial_solution = words.clone();
    partial_solution.push(word);

In this particular case this isn't Rust being slow, this is an attempt to compare a O(1) cons operation to a O(n) process of copying a list. Doing this N times results in accidentally quadratic complexity for Rust - no doubt it's slow.

I cannot say much more about Rust syntax since I am even more of a Rust newbie than the author - perhaps someone can have some further comments on the quality of the resultant Rust code.

Still, the Lisp version can be pushed even further with regards to performance - it still lacks explicit type declarations and I am unsure if the author asked the Lisp compiler to optimize for speed and resolved any notes that the compiler might have produced.

steveklabnik · on Aug 2, 2021

> seems like a lot mental overhead with little gain

Now, I have bias, of course, and I do think this is the relationship at the start, but for me personally, it's the opposite. I never need to think about lifetimes. That's what the compiler is for. I don't think about it. If I make a mistake, the compiler tells me where I made a mistake. I fix it, and then I move on.

Huuuuuge YMMV here, of course. But when I'm reading C code, the mental overhead of keeping track of all of this is massive. But it's generally nonexistent in Rust. That's why we pass this thinking off to the machine! Humans get tired, they get sloppy, they may even be intoxicated in some form. Compilers do not. (They do have bugs though...)

Cu3PO42 · on Aug 2, 2021

> It seems true to me that working with the Rust borrow checker is eventually fruitful, but there is a micro-case where working with lifetimes seems like a lot mental overhead with little gain, compared to a built-in garbage collector available in Lisp.

I fully agree. In the past year I have worked in Rust a lot and while quarrels with the borrow checker have definitely gotten more and more rare, they still happen occasionally. It's not that the borrow checker presents a reason which I think is incorrect or invalid, it's rather that it's sometimes difficult or just tedious to encode the proof that what you're doing is actually fine.

That said, there have been situations in which I was absolutely confident my code would have been fine if it weren't for the meddling borrow checker, but it actually wasn't and it helped me catch bugs.

Fortunately there is a bit of an escape hatch: if you want to, you can just start reference counting and get close to the experience you'd have with a proper GC. For example, there's nothing preventing you from implementing a reference counted linked list and writing

    LinkedList::Cons(word, words)

nybble41 · on Aug 3, 2021

In this case the linked list is just tracking "breadcrumbs" which mirror the path back up the call chain, so you don't even need reference counting; a simple borrowed reference will do. I put together a version in Rust based on stack-allocated linked lists[0], and the results were promising: on 10 million inputs the Rust version completes in about 40 seconds on my machine, while the Java port (Main2.java in the code; "Java1" from the article IIUC) takes 82 seconds. This Rust version doesn't allocate anything from the heap, apart from loading the dictionary at startup; all the bookkeeping, including the cons cells, is kept on the stack.

[0] https://github.com/nybble41/prechelt-phone-number-encoding

nybble41 · on Aug 3, 2021

> This Rust version doesn't allocate anything from the heap

I lied—the BigUint values used as keys were still being allocated from the heap. Replacing these with Vec<u8> in the HashMap and &[u8] for the lookups brought the time down to about 31 seconds while also simplifying the code and eliminating a dependency. Bypassing the fmt library in the output code shaved off another second.

pizza234 · on Aug 2, 2021

I'm confused at why the author is using static mutable variables, which, while sometimes required in Rust, they are at a high risk of design issues.

In addition to the design issues, they are also implemented via mutexes, which have overhead - and I don't see threads in the file.

EDIT: Even more skeptical. By replacing the ONE/TEN pseudo-static variables with standand/idiomatic local variables (and borrowing them), I get a 10% increase in performance.

mst · on Aug 2, 2021

Um, the article mentions that somebody explained to him that was wrong, and he replaced them with something better.

This whole article is about "here's a bunch of things I got wrong because I'm not very good at rust and people were kind enough to help me fix them and I'm going to show you the fixes", I'm not sure what there is to be skeptical about here?

pizza234 · on Aug 3, 2021

I don't see much (if any) value in this article.

The code was very unidiomatic to start with, and the issues are not just a matter of style - they're very substantial, and the code is still unidiomatic.

At least one of the conclusions through the article is wrong - the author uses a vector of String references, concluding that the logic can't be easily simplified (because lifetimes are required). This data type is wrong, as owned String's should be used, which don't require any maintenance.

This wrong conclusion has actually the side effect of describing a language complexity that doesn't exist (I'm certainly not arguing that Rust is complex, but credit should be due to cases where such complexity doesn't apply).

Reconnecting to the beginning, I'm not sure what the value is supposed to be. Benchmark unidiomatically implemented updated to still unidiomatic implementation and flawed algorithm, with at least one improper conclusion, gives results that are not meaningful?

lijogdfljk · on Aug 2, 2021

> It seems true to me that working with the Rust borrow checker is eventually fruitful, but there is a micro-case where working with lifetimes seems like a lot mental overhead with little gain, compared to a built-in garbage collector available in Lisp.

Fwiw, my shop uses Rust in some basic web applications and lifetimes are not used in day to day programming. It heavily depends on what you're doing. Are there some applications/libraries/etc where we're using lifetimes? Definitely. But for us they're very niche. AND, perhaps most importantly, even in those you _still_ don't have to. If you're okay with the performance trade off, which we most certainly are (we have some Python usage for example), adding in basic reference counting can be a no-brainer tradeoff.

I actually quite like lifetimes. I have some side projects where i use them extensively. But just like standard Generics they're a tool you can use, but that doesn't mean you always should. Just like accepting a `s: &str` instead of `s: S) where S: AsRef<str>` can make your life cleaner and simpler, using `Rc<Foo>` can equally simplify.

Just my 2c :)

[1]: because we like it, no other reason.. really, a team of ~30.

nestorD · on Aug 2, 2021

One thing I love about Rust is that, while the code is accidentaly quadratic, it is explicit when looking at the code and seing the call to `clone` which makes it much harder to produce those behaviour than in languages where copies can happen behind the scene (C++ is often guilty of that).

pizza234 · on Aug 3, 2021

Posted in a sibling comment, but I repeat it because I think it's a very important point.

> I cannot say much more about Rust syntax since I am even more of a Rust newbie than the author - perhaps someone can have some further comments on the quality of the resultant Rust code.

The author is using an improper data type - a vector of borrowed strings (`Vec<&String>`), which forces the dev to use lifetimes.

The owned version, `Vec<String>` doesn't suffer from this problem.

More in general, this is a lack of experience problem - designing ownership is difficult at the beginning. And there's no problem with it! But don't publish a (micro)benchmark, please :)

In particular, the author's conclusion on the necessity of lifetimes:

> This means that, yes, writing efficient code in Rust requires a lot of forethought. In this case, our tiny modification to use a mutable Vec actually requires us to only deal with strings whose life times match or exceed that of our Vec.

assigns a complexity to Rust that in this case doesn't exist. Rust is complex, but this is not one of those cases.

brabel · on Aug 4, 2021

You keep repeating your wrong argument, so please allow me to correct you.

If you knew any Rust, you would immediately see that the owner of the words is the HashMap, which represents the dictionary of valid words to use. If the author had used a `Vec<String>` to represent the possible solutions, every single String would need to be either cloned or removed from the dictionary. A String cannot have two owners. This is basic Rust.

If you disagree, please show me the magic code that can get around that.

pizza234 · on Aug 6, 2021

Here's the diff in Pastebin: https://pastebin.com/wT8gXtBJ

One just clones the single word, rather than cloning the whole words vector.

Something that can also be improved in the code above (I've done the most minimal change) is to move the Vector instantiation outside the for loop, since print_translations() guarantees that the vector has the same content before/after invocation.

brabel · on Aug 7, 2021

You cloned every word, which is much worse than cloning the vector (which only clones a couple of pointers).

Here's the performance of the original code:

===> ./phone_encoder Proc,Memory(bytes),Time(ms) ./phone_encoder,24584192,198 Proc,Memory(bytes),Time(ms) ./phone_encoder,24584192,461 Proc,Memory(bytes),Time(ms) ./phone_encoder,24612864,1735 Proc,Memory(bytes),Time(ms) ./phone_encoder,24612864,3207

After your change:

Proc,Memory(bytes),Time(ms) ./phone_encoder,24608768,205 Proc,Memory(bytes),Time(ms) ./phone_encoder,24592384,490 Proc,Memory(bytes),Time(ms) ./phone_encoder,24600576,1745 Proc,Memory(bytes),Time(ms) ./phone_encoder,24608768,3388

It's actually surprising that your change didn't make it much worse, but compared with the version where the lifetimes are used to avoid cloning anything at all, both of these are extremely slow.

pjmlp · on Aug 2, 2021

Also regarding Lisp, the language might have been born just with List Processing capabilities during its early days, however since the 70's that any proper Lisp supports all common datastructures, including value types, stack allocation and deterministic resource management.

myWindoonn · on Aug 2, 2021

We can actually go a little deeper. In a famous pair of papers, "CONS should not actually CONS its arguments", Henry Baker argued that (cons) is emblematic of a class of primitive/simple functions which vanish under continuation-passing style, and that LISPs should deliberately be structured around this fact. These papers directly inspired CHICKEN, the first modern fast LISP runtime.

This is ignoring many Common Lisp specifics which are relevant to the blog post's original code, but I figured that on this example, the historical microscope is appropriate.

lkey · on Aug 2, 2021

I don't recommend drawing too much of a conclusion from these tests, about anything other than printing in <your language>. Once the algorithm is somewhat optimized, the benchmark is completely dominated by printing to stdout, not the algorithm in question. Pulling from the near the conclusion here, bugaevc finds: "And the measured time reflects how long it take to actually compute the solutions (≈1.6s), not how much time it takes to output/buffer them (≈59s)." source: https://github.com/renatoathaydes/prechelt-phone-number-enco...

Zababa · on Aug 2, 2021

If you look at the memory data https://docs.google.com/spreadsheets/d/14MFvpFaJ49XIA8K1coFL..., Rust seems like the only one that uses constant memory, and it uses way less than the others. The other thing that I get from this article is that performance is really complex, and never just a function of the language you chose. I'm glad to read articles like that as a junior, I think this will give me a good mental model for performance.

mamcx · on Aug 2, 2021

>and never just a function of the language you chose.

I understand your point, but is important to know how NUANCED is this.

Performance is MOSTLY driven by the inherent design choices of each language(+ your own!). If you are on Gc, that choice is an impact for everything. Rust have Vectors as base, and lisp have GC-backed linked list. That is another deeeeep choice.

But IF you know (or hit the luck!) which are these designs, and understand that not matter which language everyone wanna gotta fast and most common operations (the ones for what THAT lang was designed for!) are fast enough... then all is ok, in average.

Is when you go against that design that you see how the illusion crack a little. Then, is all about how much you can workaround about it to recover most of the performance

Zababa · on Aug 2, 2021

That is very true, for example Java, even while being as fast as Rust and Lisp, takes way more memory than both of them.

My point was more for myself (and maybe other people that haven't heard a lot about performance), as I had very strict ideas about it, but thanks to articles like that I can understand how it's more nuanced. Software engineering and computer science are both really vast and deep fields, but a lot of information is here online, shared freely by awesome people. That means that for a lot of things, being aware of something is enough to make good decisions. These types of articles allows me to be more aware of the nuances of performance (and how complex it is as a topic).

For example, if I'm just trying to optimize the speed of my algorithm, and only know Java, Java might be enough. But if I'm trying to optimize my cloud spendings, considering the memory consumption of Java, it might not be the best choice.

Comments like yours are also a great way to learn more. I know that linked-lists are really good in Lisp (and OCaml too!), but with other languages there's not often the best choice.

native_samples · on Aug 2, 2021

We don't really know how much memory Java needs on this benchmark. The JVM will use as much memory as you give it. Its viewpoint is, if you tell me I can use 300mb then I'll just not do any GC work until I've used 300mb. Why work harder?

To know how much memory you really need, you'd need to keep shrinking the Xmx max heap size setting until performance becomes unacceptably poor. However, that isn't done here.

nybble41 · on Aug 5, 2021

> We don't really know how much memory Java needs on this benchmark. The JVM will use as much memory as you give it.

  ## /usr/bin/time java -Xmx48M -cp build/java Main2 ...
  90.14user 1.36system 1:29.07elapsed 102%CPU (0avgtext+0avgdata 147396maxresident)k
  0inputs+288outputs (36major+37251minor)pagefaults 0swaps


  ## /usr/bin/time ./phone_encoder ...
  29.91user 0.08system 0:30.01elapsed 99%CPU (0avgtext+0avgdata 16704maxresident)k
  0inputs+0outputs (0major+4940minor)pagefaults 0swaps

I used the default dictionary, and the input file with 10 million numbers was generated with "java -cp build/util util.GeneratePhoneNumbers 10000000 true".

So… I gave the Java version over 2.25x as much heap memory as the maximum resident size of the Rust version (./phone_encoder), but it actually used almost 100 MB more than that, and still took 3x as long to finish. I also tried with -Xmx40M:

  ## /usr/bin/time java -Xmx40M -cp build/java Main2 ...
  270.91user 4.30system 1:40.23elapsed 274%CPU (0avgtext+0avgdata 150588maxresident)k
  0inputs+480outputs (60major+35615minor)pagefaults 0swaps

Here we start to see a substantial performance hit due to excessive GC, and maximum resident size has actually increased slightly. Elapsed time is only about 11 seconds longer than with -Xmx48M due to parallelism, but total CPU time (user + system) has ballooned to 274 seconds—nine times the Rust version.

This is comparing against my optimized Rust implementation[0], not the one from the article. It employs the same algorithm but eschews heap allocation. I used /usr/bin/time rather than the included benchmark_runner because benchmark_runner makes use of what appear to be MacOS (Darwin) APIs (libproc::libproc::pid_rusage) to measure memory use and consequently doesn't build on my Linux system with this functionality intact.

[0] https://github.com/nybble41/prechelt-phone-number-encoding

native_samples · on Aug 5, 2021

Well, that answers that then!

Zababa · on Aug 2, 2021

That's a fair point, however the other programs use less memory out of the box without any need for tuning, so that's a win for them.

pjmlp · on Aug 3, 2021

Optimizing for the sake of optimizing is wasted money.

If the customer has given you a 300MB budget and the application is able to perform its job with 180 MB, and keep the customer happy, there is no point paying for development costs to bring it down to 50 MB.

There are other projects to move into, instead of burning money with non-existent project delivery acceptance criteria.

Zababa · on Aug 3, 2021

The whole premise of this article was to optimize programs for speed, which is also pointless if your program is already fast enough. They also don't take into account programmer time spent to optimize the project, how easy it is to find programmers that work with that particular language and that can deliver performant software, etc.

Your criticism is valid, but I feel like it should be applied to this whole article rather than this particular point. I like how stuff like that is described on the Computer Language Benchmark Game: "… a pretty solid study on the boredom of performance-oriented software engineers grouped by programming language.". You could also probably substitute "boredom" with "ego".

pjmlp · on Aug 3, 2021

Indeed, it applies to any kind of optimization in general.

AndrewDucker · on Aug 3, 2021

Not using memory that's lying empty isn't a win. There's nothing better about "I didn't use a reusable resource that was lying about unused."

Zababa · on Aug 3, 2021

The same could apply to CPU usage, which is what this whole article is about.

vips7L · on Aug 3, 2021

IIRC it really depends on the chosen GC. ZGC will return memory back to the OS but older GCs will just hold on to the memory forever (up until -Xmx)

native_samples · on Aug 4, 2021

That's true but even modern GCs like G1 or ZGC won't do background GC work unless the program is idle. So for a benchmark like this it doesn't matter.

pjmlp · on Aug 2, 2021

Or when one fails to understand what is actually available to them on the toolbox and only look for what is visible at top level.

For example, yes Lisps have a GC backed linked list, but in the case of Common Lisp, they also have arrays, stack allocation and deterministic resource management macros.

xondono · on Aug 2, 2021

It might be just the write up, but the article reads like he was playing with the benchmarks until he got a result where Java was faster.

I don’t think anything malicious has happened, just pointing out that benchmarking is hard, and changing parameters just to see what happens is not benchmarking.

I haven’t gone deep enough into the code and the problem, but it looks like this:

   What if we swap the Rust HashMap implementation, as they say the standard one I was using applies a cryptographically-safe hash that the JVM doesn’t

Is quite a distinction, especially for the latter cases with lengthier phone numbers.

sharikone · on Aug 2, 2021

Conclusion is sound. Java/Rust/C/C++... are all fast enough that the bottleneck does not reside with the choice of the language.

I wonder about how Common Lisp can be this fast however. I would like comparisons with languages like Python, JS and Lua.

Hemospectrum · on Aug 2, 2021

Common Lisp has always had pretty good support for compiling to fast machine code. Most people underestimate it, for at least two reasons. One is that it doesn’t have good standing in popular benchmarks relative to C, even with speedy implementations like SBCL, and the common takeaway from such benchmarks is that a language is either as fast as C or infinitely slow. The second reason is that the process of building standalone executables is less accessible than in other languages; even to people familiar with Lisp’s strengths, it’s kind of arcane, giving off the impression that deploying CL code in production is an unsolved problem. Also, Lisp is an old language family, and the quirks of Common Lisp in particular are weird and unfamiliar when coming from any modern language, even Clojure.

tialaramex · on Aug 2, 2021

> the bottleneck does not reside with the choice of the language.

I think that's too bold a claim. Your code might be too slow in any of these languages because you made poor algorithmic or data structure choices but once you have made the best possible algorithmic and data structure choices you may reach a point where you can't express the program with the choices you need and get the desired performance in some languages.

At that point, in the immediate present, you have to choose the tool that actually gets the job done. It doesn't matter if C++ could totally hit your numbers "once that LLVM optimiser bug gets fixed" or if Java could do it "when they port the new JIT to our platform", you need the thing that works now.

Over the medium term, language evolution makes an impact here. Once upon a time C++ didn't have move semantics. Obviously in some cases your compiler can move as an optimisation, but in other cases if you can't ask for one explicitly you're often out of luck. However C++ has a specific performance goal ("Leave no room for a lower-level language") that guides them to find a way to get great performance, and they changed the C++ language to fix this. If your ideal solution is slow in C++ then the C++ committee ostensibly wants to fix that. Just maybe not this year. In contrast for Java performance is not a primary goal and sometimes Java is OK with worse performance. In this particular essay it happens for RAM consumption. This problem shouldn't run out of RAM on your mid-1990s server at the scale demonstrated, but the Java solution would either run out of RAM or spin up the GC mid-solution (and thus ruin speed).

No argument that "I'll rewrite it in Rust/C++" is just as much the wrong first instinct as "I'll hand roll the assembler". But if you're under-performing with the right algorithm you need to either spend money on more (faster/ bigger) hardware or on rewriting in the language that might buy you that last drop of performance.

pjmlp · on Aug 2, 2021

Lisp was used to write full OSes, naturally it is fast.

Python, JS and Lua are not in the same league of AOT and JIT compiler technology available to Lisp, even with all the money spent on stuff like V8, JS JIT cannot compensate for the fact that they lack the ability to provide the required semantics to AOT code with good enough performance to write a Xerox PARC graphical workstation (see Interlisp-D).

It is no wonder that Julia, a Lisp like language with Algol syntax, is the only modern dynamic language on the HPC club.

https://www.hpcwire.com/off-the-wire/julia-joins-petaflop-cl...

Banana699 · on Aug 2, 2021

I think Julia being an HPC lang despite dynamicity has nothing to do with being close to lisp in some regards.

I didn't use the language but the overwhelming community consensus I get is that performance is out of the window once you truly embrace dynamic (type-unstable in julia parlance) code. Julia performance depends heavily on the compiler being able to infer a single type for variables in the hot loop so that it can specialize the used functions to it then get out of the way. Julia is not a Fortran-speed python, it's a language that can be _either_ Fortran _or_ python, depending on how you write it.

Lisp influences on julia are more for semantic power than performance.

pjmlp · on Aug 2, 2021

Depends on how one sees it,

https://www.researchgate.net/publication/267008293_Fast_floa...

The point being what dynamic homoiconic languages are able to achieve when the community actually embraces the push for performance instead of hand waving and rewriting everything in C.

If the JavaScript community had the same attitude as Python one, surely we would still be doing HTML 4 without pesky SPAs.

bjoli · on Aug 2, 2021

SBCL is fast. People seem to lump together dynamically typed languages, but the reality is that python and ruby forgot all the things learned during the 80s and early 90s.

My experience is that small programs like this are usually at least an order of magnitude faster in SBCL compared to python. At least if the python code spends some time in actual python instead of offloading to C.

omginternets · on Aug 2, 2021

>ruby forgot all the things learned during the 80s and early 90s

Out of curiosity, what things are you referring to?

bjoli · on Aug 2, 2021

From the top of my head before I get off the bus:

Things like proper static modules so that you know what is being called so you can elide lookups at call-time. That is a pretty big omission, TBH.

For a long time (still?) python had a horribly inefficient value representation in memory. Now, integer tagging is probably no fun to add as an afterthought, but it shouldn't have been an afterthought.

Exception handling is sloooow. And iterators use exceptions to signal that younhave exhausted the collection. Because iterating over a complete collection is exceptional.

I understand the need to keep cpython simple, but some of these things ARE simple and have been well understood since a decade before python 1.0.

orf · on Aug 2, 2021

Static modules with elided lookups is completely nonsensical in Python.

StopIteration is special-cased and has negligible overhead. See here (https://www.python.org/dev/peps/pep-0234/) for why exceptions are used.

bjoli · on Aug 2, 2021

On the contrary. Static modules would mean static classes with no possibility of changing a class. That would be another pretty nice optimization you could do, making class lookups a lot faster. I am not alone in thinking this would be a good idea. The Instagram folks wrote about it, iirc.

That rationale for using an exception is only valid because CPython procedure calls are expensive. A proper optimizing compiler will have a hard time rewriting that implicit try/catch into a goto.

I think Dart nailed iterators in this regard: no special values, and a protocol that is transparent to the compiler, which has the benefit of requiring no special cases at all. In the current state of cpython, of course, that falls into the "2 expensive procedure calls that can't be inlined because everything can be mutated at any time" type of a problem.

omginternets · on Aug 3, 2021

>Static modules would mean static classes with no possibility of changing a class. That would be another pretty nice optimization you could do, making class lookups a lot faster.

I think the whole point of python is to make everything dynamic. It seems like this trade-off is voluntary, i.e. it is a feature, not a bug.

likeabbas · on Aug 2, 2021

There’s a distinct difference between Java and Rust/C++ which is that Java has a stop-the-world-GC and rust/c++ don’t. And in large scale systems, Java’s GC is usually the limiting factor.

nobleach · on Aug 2, 2021

Newer versions of Java have more interesting GC implementations that reduce that "stop the world, swab the decks" performance. The fine engineers working on the JVM keep finding ways to incrementally garbage collect. It's getting to a point where it's imperceptible in some applications. Granted, if you're writing high-speed/high-frequency trading apps for Wall Street, Java still isn't there. Rust could be the logical next step in that arena. (Although, most C++ devs I know in that space don't hate C++)

maccard · on Aug 2, 2021

> if you're writing high-speed/high-frequency trading apps for Wall Street, Java still isn't there.

Stop-the-world GC doesn't stop you from using Java for high-speed apps, unpredictable stalls do. It doesn't matter if your GC takes 2 seconds to run fully as long as it doesn't kick off during your hot loop, and the opposite is true; there will never be a low enough overhead "magic" GC that can be run in the background that may cause a microsecond stutter on your finely tuned trading algotirhm, it has to be explicit.

lordnacho · on Aug 2, 2021

> if you're writing high-speed/high-frequency trading apps for Wall Street, Java still isn't there

A fair few trading system job ads seem to be for Java. I also wrote code that faced a Java server at various banks. Seems to be fast enough for certain market related applications.

magila · on Aug 2, 2021

The use of Java in trading systems is widely misunderstood. Java is typically used to implement the high level business logic where microsecond latency is not required. The output of the Java service is feed to another system which handles the actual trade execution. That part is typically done with C++, or FPGAs driven by C++.

nobleach · on Aug 2, 2021

This was the method used by my former employer. Are entire backend was Kotlin/Java/Scala... and a couple of NodeJS apps. Right up until the very edge where C++ handled the interface with the exchanges. We also did the typical thing of having our servers as physically close as possible (in New Jersey).

tuxman · on Aug 2, 2021

From my understanding of previous talks on this, a fair amount of HFT's and trading firms use Java. The trick is that the trading day is a well-defined window with clear starts and stops. So if you load up your server with huge amounts of memory, you can get away with never calling GC during the trading day. The GC can be paused until after hours or just kill the program and start it up before market open the next day.

nobleach · on Aug 2, 2021

One solution I had been dying to implement is the Erlang style of letting a Kubernetes POD OOM, and just respawning it. If the app performs well right up until it runs out of memory, I see no reason to have a GC turned on at all. (obviously there ARE reasons, I just wanted to try this approach)

pjmlp · on Aug 2, 2021

Yeah, like 1ms when handling heaps of 3TB sizes, when most people are still complaining about 16 GB on laptops, I can live with that.

Diggsey · on Aug 2, 2021

Firstly, 1ms for a 3TB heap size is very unrealistic even for modern JVMs.

Secondly, the figures that are often quoted for how long the program must be stopped are for the "happy path" where enough memory can be freed that the program can continue without issue.

In practice, what often happens with large heap sizes is that the GC starts lagging behind because there's not enough memory head-room. In these cases, the GC stops the world for much longer, because the program hangs on a memory allocation that cannot be satisfied until the GC completes.

This primarily affects programs that have high rates of allocation. You can avoid this in Java with effort, but you end up with very non-idiomatic code.

pjmlp · on Aug 2, 2021

Enjoy, https://malloc.se/blog/zgc-jdk16

I guess Oracle, IBM, Azul, PTC, Microsoft can provide the necessary data about their pauseless GC implementantions to any relevant customer.

native_samples · on Aug 2, 2021

Modern JVMs have a GC that has O(1) pause times, often of less than 1msec regardless of heap size. In ZGC and Shenandoah, GC is fully concurrent and pauses are basically imperceptible. Even walking the stacks to locate pointers into the heap is done whilst the application is still running.

nlitened · on Aug 2, 2021

If I understand correctly, modern Java has multiple GC engines available (that can be chosen with a command-line switch), some of which are either fully concurrent or with sub-millisecond stop times.

divs1210 · on Aug 2, 2021

SBCL and Chez Scheme are probably the most advanced dynamic language environments around.

Unfortunately, CL has started to show its age (case insensitive, different namespaces for variables and functions, weirdly named stuff, etc.) and Chez Scheme doesn't have a lot of libraries.

In my experience, Chez Scheme is very small, blazing fast, and VERY high level. It is a great platform to use as a target language for compilers / interpreters. Racket has chosen to go this route, for example.

phoe-krk · on Aug 2, 2021

Can't say that different namespaces are a sign of Lisp showing its age. At least in my mental model, it is convenient to know that a single symbol can name several different things in different namespaces at the same time - e.g. a variable, a function, a type, and a few more that are possibly defined by the user.

Enforcing the "strictly one meaning per symbol" paradigm seems pretty self-defeating to me, since even in a Lisp-1 users can - and eventually will - create their own namespaces, implementing a Lisp-N themselves.

pjmlp · on Aug 2, 2021

Kind of, SBCL doesn't have the tooling of Allegro or LispWorks.

gameswithgo · on Aug 2, 2021

fast enough for this workload. If you were trying to maintain 144fps with some kind of 3d interface with no frame drops, any garbage collector will be a challenge.

ost-ing · on Aug 2, 2021

Not for embedded environments

kennywinker · on Aug 2, 2021

I got a lot out of this article, thanks! One of my problems with learning rust has been the “draw the f**ing owl” problem. There are lots of tutorials on getting started, and while advanced code is easy to find i have not found a lot of intermediate or advanced code with good explanatory annotations. Great seeing someone take beginner code and rustify it!

mst · on Aug 2, 2021

Right, I really enjoyed it on that basis too.

I'm quite amazed how many of the responses are criticising his original code when the whole point of the article is that his original code was n00btastic and he's very grateful to've been helped to improve it and wanted to share.

igouy · on Aug 2, 2021

> … his original code was n00btastic…

Sorry, but this is at-least the second blog post, in the last week, and previously —

"I've discussed the Rust and Julia versions with experts in those languages (—— I know Rust well enough ——, but very little Julia) and no obvious mistakes were found."

https://www.reddit.com/r/lisp/comments/osqgqe/common_lisp_st...

Before that, on finding the output from his Java program was different than the output from Peter Norvig's Lisp program —

"It looks like Norvig's Lisp solution must have a bug: it can't find all valid solutions!"

https://github.com/renatoathaydes/prechelt-phone-number-enco...

mastrsushi · on Aug 2, 2021

By writing anything :^)