Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Parsing overhead is sensitive to the amount of work the service is doing. If the service is doing relatively little, and there's a lot of junk in your request, you are forced to parse a lot of data which you then throw away. JSON is particularly nasty because there's no way to skip over data you don't care about. To parse a string, you have to look at every byte. This can be parallelized (via SIMD), but there's limits. In contrast, protobuf sprays length encoded tags all over the place which allows you to quickly handle large sequences of strings and bytes.

If your cache is hot and has a good hit-rate, the majority of your overhead is likely parsing. If you microbatch 100 requests, you have to parse 100 requests before you can ship them to the database for lookup (or the machine learning inference service). If the service is good at batch-processing, then the parsing becomes the latency-sensitive part.

Note the caveat: the 60% is for large payloads. JSON contains a lot of repetition in the data, so you often see people add compression to JSON unknowingly, because their webserver is doing it behind their back. A fairly small request on the wire deflates to a large request in-memory, and takes way more processing time.

That said, the statistician in me would like to have a distribution or interval rather than a number like "60%" because it is likely to vary. It's entirely possible that 60% is on the better end of what they are seeing (it's plausible in my book), but there's likely services where the improvement in latency is more mellow. If you want to reduce latency in a system, you should sample the distribution of processing latency. At least track the maximal latency over the last minute or so, preferably a couple of percentiles as well (95, 99, 99.9, ...).



Either way, the conclusion made by the headline is cherry-picked and is misleading for anyone who is concerned about average situations.

From the article:

> The result of Protocol Buffers adoption was an average increase in throughput by 6.25% for responses and 1.77% for requests. The team also observed up to 60% latency reduction for large payloads.

It's very sneaky to describe throughput improvements using average requests/responses (which is what most people are interested in) but then switch to the 'worst case' request/response when describing latency... And doubly sneaky to then use that as the headline of the article.


I agree.

There's also a lot of alarm bells going on when you have reports of averages without reports of medians (quartiles, percentiles) and variance. Or even better: some kind of analysis of the distribution. A lot of data will be closer to a Poisson-process or have multi-modality, and the average is generally hiding that detail.

What can happen is that you typically process requests around 10ms but you have a few outliers at 2500ms. Now the average is going to be somewhere between 10ms and 2500ms. If you have two modes, then the average can often end up in the middle of nowhere, say at 50ms. Yet you have 0 requests taking 50ms. They take either 10 or 2500.


Tail latency is really important. Usually more than average, because it's what drives timeouts in your system.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: