Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: CodSpeed – Continuous Performance Measurement (codspeed.io)
32 points by art049 on July 13, 2023 | hide | past | favorite | 3 comments
Hi HN! We’re Arthur and Adrien from CodSpeed. We’re building a tool measuring software performance before any production deployment, catching performance regressions before they hit production environments and reporting performance changes directly in Pull Request comments. It’s kind of like Codecov but for performance measurement.

Today, the go to solution to measure performance is probably to use an APM(DataDog, Sentry, …), continuously analyzing your production environment. However, since those solutions are operating on real environments they need real users to experience poor performance in order to report issues and unfortunately, performance remains an afterthought appearing only at the end of the development cycle.

Another possibility to measure performance is to create benchmarks while developing and to run them on a regular basis to have an idea of the performance trend of your project. However, with this approach, the variance in the results creates a lot of noise and it’s rarely possible to compare your results with the ones from a co-worker or a production environment.

To make consistent performance measurement as easy as unit testing and fully integrated in CI workflows, we chose a benchmark based solution. And, to eliminate the usual variance associated with running them, we measure the number of instructions and memory/cache accesses through CPU instrumentation performed with Valgrind. This approach gives repeatable and consistent results that couldn’t be obtained with a time based statistical approach, especially in extremely noisy CI and cloud environments.

We have been in closed beta for a few months, already being used by popular open-source projects such as Prisma and Pydantic. Notably, CodSpeed helped Pydantic through their Rust migration, empowering them to make the library 17x faster: https://docs.pydantic.dev/latest/blog/pydantic-v2/#performan...

Today, we’re super excited to finally make the product available to everyone. We currently support Python, Node.js and Rust and are looking forward to integrate with more languages soon.

The product is and will be free forever for open-source projects. Also, we have a per-seat pricing for private repository usage. We have a lot of exciting features planned regarding additional integrations, such as Database and GPU integrations that should come in upcoming months.

Don’t hesitate to try out the product and give your honest feedback. We’re looking forward to your comments!



> we measure the number of instructions and memory/cache accesses through CPU instrumentation performed with Valgrind. This approach gives repeatable and consistent results that couldn’t be obtained with a time based statistical approach, especially in extremely noisy CI and cloud environments.

This is a neat approach! I'm curious how well it maps to actual perf degradations though. Valgrind models an old CPU with a more naive branch predictor. For low-level branch-y code (say a native JSON parser), I'd be curious how well valgrind's simulated numbers map to real world measurements?

My probably naive intuition guesses that some low-level branchy code that valgrind thinks may be slower may run fine on a modern CPU (better branch predictor, deeper cache hierarchy). I'd expect false negatives to be rarer though - if valgrind thinks it's faster it probably is? What's your experience been like here?


>This is a neat approach! I'm curious how well it maps to actual perf degradations though. Valgrind models an old CPU with a more naive branch predictor. For low-level branch-y code (say a native JSON parser), I'd be curious how well valgrind's simulated numbers map to real world measurements?

We didn't try it on this specific case but on we found that on branchy code valgrind does aggravate the branching cost. Probably, we could mitigate this issue by collecting more data relative to the branching and incorporate those in our reported numbers to map more accurately to the reality and more recent CPUs.

>My probably naive intuition guesses that some low-level branchy code that valgrind thinks may be slower may run fine on a modern CPU (better branch predictor, deeper cache hierarchy). I'd expect false negatives to be rarer though - if valgrind thinks it's faster it probably is? What's your experience been like here?

Totally! We never encountered a false positive in the reports yet. But as you mentioned since valgrind models an old CPU, it's likely to happen. But even though the cache simulated has a quite old, it still improves the relevance of our measures. When we have some time, we'd really enjoy refreshing the cache simulation of valgrind since it would probably eliminate some edge cases and reflect memory accesses more accurately.


Seems interesting... how did you land on valgrind vs some other means of simulation? Looking at valgrind, it sounds like cachegrind? Imo seems like the biggest gap is non instruction, non cache sources of latency, like mutex contention or kernel slowness? (Or does it capture kernel delays?)

Pur most recent performance issues have been from someone accidentally creating a new thread pool in a request, from generating tons of stack traces in an error handling path, and from some thread hop delays. Sounds like the first two would probably be caught but maybe not the third?


[deleted]




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: