Yes definitely, this is what we already do with cpu simulation. But we had many people with benchmarks using syscalls for network,fs and other resources. And we found the only solution for this is to actually measure wall time
I hadn't considered this, but it would be really interesting to take it into account. Given that the size of the benchmark suites directly affects the false positive rate, and counterintuitively, the more benchmarks in the suite, the more the chances of false positives, even with super steady benchmarks. (Thanks, it could also be an interesting follow-up article!)
We're building software performance optimization tools to optimize and measure code performance before it is deployed to production. We avoid regressions that impact UX and help developers solve their performance issues faster. We're already live and trusted by top-tier open-source project teams such as Pydantic, Ruff, and Prisma.
We’re at an exciting early stage and looking for talented engineers who share our passion for helping to enhance the performance of software used by billions, improving the software development lifecycle, and building tools we love to use ourselves.
System trace is pretty comprehensive, as long as it's something in there and you don't need a crazy strong profiler and are okay with occasional hangs in the profiler UI, it's pretty good
I thought about this quite a lot and feel like it's super important as well. Often, having the proper infrastructure/tools/resources to automate performance testing is hard. I created codspeed.io to try and fix that.
>This is a neat approach! I'm curious how well it maps to actual perf degradations though. Valgrind models an old CPU with a more naive branch predictor. For low-level branch-y code (say a native JSON parser), I'd be curious how well valgrind's simulated numbers map to real world measurements?
We didn't try it on this specific case but on we found that on branchy code valgrind does aggravate the branching cost. Probably, we could mitigate this issue by collecting more data relative to the branching and incorporate those in our reported numbers to map more accurately to the reality and more recent CPUs.
>My probably naive intuition guesses that some low-level branchy code that valgrind thinks may be slower may run fine on a modern CPU (better branch predictor, deeper cache hierarchy). I'd expect false negatives to be rarer though - if valgrind thinks it's faster it probably is? What's your experience been like here?
Totally! We never encountered a false positive in the reports yet. But as you mentioned since valgrind models an old CPU, it's likely to happen. But even though the cache simulated has a quite old, it still improves the relevance of our measures. When we have some time, we'd really enjoy refreshing the cache simulation of valgrind since it would probably eliminate some edge cases and reflect memory accesses more accurately.