Except gzip is tragically slow, so crippling protobuf by running it through gzip...

hansvm · on Dec 17, 2024

"gzipped json" vs "protobuf"

lowbloodsugar · on Dec 17, 2024

Then something is very wrong.

hansvm · on Dec 18, 2024

Protobufs have a massive data dependency baked into the wire format, turning parsing into an intensive core-bound problem.

Interestingly, they're not usually smaller than gzipped JSON either (the compression it has built-in is pretty rudimentary), so if you don't compress it and don't have a stellar network you might actually pay more for the total transfer+decode than gzipped JSON, despite usually being somewhat faster to parse.

lowbloodsugar · on Dec 18, 2024

Got any references to share?

hansvm · on Dec 18, 2024

The docs [0] are fairly straightforward. I'll spit out a little extra data and a few other links in case it's helpful. If this is too much or not enough text, feel free to ask followup questions.

As far as data dependencies are concerned, you simply can't parse a byte till you've parsed all the preceding bytes at the same level in a message.

A naive implementation would (a) varint decode at an offset, (b) extract the tag type and field index, (c) use that to parse the remaining data for that field, (c1) the exact point in time you recurse for submessages doesn't matter much, but you'll have to eventually, (d) skip forward the length of the field you parsed, (e) if not done then go back to (a).

You can do better, but not much better, because the varints in question are 8 bytes, requiring up to 10 bytes on the wire, meaning AVX2 SIMD shenanigans can only guarantee that you parse 3 varints at a time. That's fine and dandy, except most fields look like 2 varints followed by some binary data, so all you're really saying is that you can only parse one field at a time and still have to skip forward an unpredictable amount after a very short number of bytes/instructions.

If you have more specialized data (e.g., you predict that all field indexes are under 32 and all fields are of type "LENGTH"), then there are some tricks you can do to speed it up a bit further. Doing so adds branches to code which is already very branchy and data-dependent though, so it's pretty easy to accidentally slow down parsing in the process.

Something close to the SOTA for varint decoding (a sub-component of protobuf parsing) is here [1]. It's quite fast (5-10 GB/s), but it relies on several properties that don't actually apply to the protobuf wire format, including that their varints are far too small and they're all consecutively concatenated. The SOTA for protobuf parsing is much slower (except for the sub-portions that are straight memcopies -- giant slices of raw data are fairly efficient in protos and not in JSON).

This isn't the best resource [2], but it's one of many similar examples showing people not finding protos substantially faster in the wild, partly because their protos were bigger than their json objects (and they weren't even gzipping -- the difference there likely comes from the tag+length prefix structure being more expensive than delimiters, combined fixed-width types favoring json when the inputs are small). AFAICT, their json library isn't even simdjson (or similar), which ought to skew against protos even further if you're comparing optimal implementations.

In terms of protos being larger than gzipped json, that's just an expected result for almost all real-world data. Protobuf adds overhead to every field, byte-compresses some integers, doesn't compress anything else, and doesn't bit-compress anything. Even if your devs know not to use varint fields for data you expect to be negative any fraction of the time, know to use packed arrays, ..., the ceiling on the format (from a compression standpoint) is very low unless your data is mostly large binary blobs that you can compress before storing in the protobuf itself.

For a few other random interblags comparisons, see [3], [4]. The first finds protos 3x-6x faster (better for deserializing than serializing) compared to json. The second finds that protos compress better than json, but also that compressed json is much smaller than ordinary protos for documents more than a few hundred bytes (so to achieve the size improvements you do have to "cripple" protos by compressing them).

If you start looking at the comparisons people have done between the two, you'll find results largely consistent with what I've been saying: (1) Protos are 2x-10x faster for normal data, (2) protos are usually larger than gzipped json, (3) protos are sometimes slower than gzipped JSON, (4) when you factor in sub-par networks, the total transfer+decode time can be much worse for protos because of them being larger.

As a fun experiment, try optimizing two different programs. Both operate on 1MB of pseudo-random bytes no greater than 10. Pick any cheap operation (to prevent the compiler from optimizing the iteration away) like a rolling product mod 256, and apply that to the data. For the first program (simulating a simplified version of the protobuf wire format), treat the first byte as a length and the next "length" bytes as data, iterating till you're done. For the second, treat all bytes as data. Using a system's language on any modern CPU, you'll be hard-pressed to get an optimized version of the length-prefixed code even as fast as 10x slower than an un-optimized version of the raw data experiment.

Cap'n proto and flatbuffers (whether gzipped or not), as examples, are usually much faster than both JSON and protobufs -- especially for serialization, and to a lesser extent deserialization -- even when you're parsing the entire message (they shine comparatively even more if you're extracting sub-components of a message). One of them was made by the original inventor/lead-dev of the protobuf team, and he learned from some of his mistakes. "Proper" binary formats (like those, though they're by no means the only options) take into account data dependencies and other features of real hardware and are much closer to being limited by RAM bandwidth instead of CPU cycles.

[0] https://protobuf.dev/programming-guides/encoding/

[1] https://www.bazhenov.me/posts/rust-stream-vbyte-varint-decod...

[2] https://medium.com/@kn2414e/is-protocol-buffers-protobuf-rea...

[3] https://medium.com/streamdal/protobuf-vs-json-for-your-event...

[4] https://nilsmagnus.github.io/post/proto-json-sizes/