Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

We don't really know how much memory Java needs on this benchmark. The JVM will use as much memory as you give it. Its viewpoint is, if you tell me I can use 300mb then I'll just not do any GC work until I've used 300mb. Why work harder?

To know how much memory you really need, you'd need to keep shrinking the Xmx max heap size setting until performance becomes unacceptably poor. However, that isn't done here.



> We don't really know how much memory Java needs on this benchmark. The JVM will use as much memory as you give it.

  ## /usr/bin/time java -Xmx48M -cp build/java Main2 ...
  90.14user 1.36system 1:29.07elapsed 102%CPU (0avgtext+0avgdata 147396maxresident)k
  0inputs+288outputs (36major+37251minor)pagefaults 0swaps


  ## /usr/bin/time ./phone_encoder ...
  29.91user 0.08system 0:30.01elapsed 99%CPU (0avgtext+0avgdata 16704maxresident)k
  0inputs+0outputs (0major+4940minor)pagefaults 0swaps
I used the default dictionary, and the input file with 10 million numbers was generated with "java -cp build/util util.GeneratePhoneNumbers 10000000 true".

So… I gave the Java version over 2.25x as much heap memory as the maximum resident size of the Rust version (./phone_encoder), but it actually used almost 100 MB more than that, and still took 3x as long to finish. I also tried with -Xmx40M:

  ## /usr/bin/time java -Xmx40M -cp build/java Main2 ...
  270.91user 4.30system 1:40.23elapsed 274%CPU (0avgtext+0avgdata 150588maxresident)k
  0inputs+480outputs (60major+35615minor)pagefaults 0swaps
Here we start to see a substantial performance hit due to excessive GC, and maximum resident size has actually increased slightly. Elapsed time is only about 11 seconds longer than with -Xmx48M due to parallelism, but total CPU time (user + system) has ballooned to 274 seconds—nine times the Rust version.

This is comparing against my optimized Rust implementation[0], not the one from the article. It employs the same algorithm but eschews heap allocation. I used /usr/bin/time rather than the included benchmark_runner because benchmark_runner makes use of what appear to be MacOS (Darwin) APIs (libproc::libproc::pid_rusage) to measure memory use and consequently doesn't build on my Linux system with this functionality intact.

[0] https://github.com/nybble41/prechelt-phone-number-encoding


Well, that answers that then!


That's a fair point, however the other programs use less memory out of the box without any need for tuning, so that's a win for them.


Optimizing for the sake of optimizing is wasted money.

If the customer has given you a 300MB budget and the application is able to perform its job with 180 MB, and keep the customer happy, there is no point paying for development costs to bring it down to 50 MB.

There are other projects to move into, instead of burning money with non-existent project delivery acceptance criteria.


The whole premise of this article was to optimize programs for speed, which is also pointless if your program is already fast enough. They also don't take into account programmer time spent to optimize the project, how easy it is to find programmers that work with that particular language and that can deliver performant software, etc.

Your criticism is valid, but I feel like it should be applied to this whole article rather than this particular point. I like how stuff like that is described on the Computer Language Benchmark Game: "… a pretty solid study on the boredom of performance-oriented software engineers grouped by programming language.". You could also probably substitute "boredom" with "ego".


Indeed, it applies to any kind of optimization in general.


Not using memory that's lying empty isn't a win. There's nothing better about "I didn't use a reusable resource that was lying about unused."


The same could apply to CPU usage, which is what this whole article is about.


IIRC it really depends on the chosen GC. ZGC will return memory back to the OS but older GCs will just hold on to the memory forever (up until -Xmx)


That's true but even modern GCs like G1 or ZGC won't do background GC work unless the program is idle. So for a benchmark like this it doesn't matter.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: