> twenty five thousand lines of pure C not counting CMake files. ... Keep in min...

bboozzoo · 2026-03-27T15:46:49 1774626409

You forgot to include https://github.com/9front/9front/tree/front/sys/src/libflate which gzip is built around, which brings it closer to 10k lines.

commandlinefan · 2026-03-27T20:09:50 1774642190

I wrote a standalone gzip decompressor in about 500 lines of code (including comments, with braces on the next line), with no dependencies at all: https://commandlinefanatic.com/cgi-bin/showarticle.cgi?artic...

carlos256 · 2026-03-27T17:10:33 1774631433

Interesting, the decompressor in Jdeflate is around 4k LoC. https://github.com/Jpn666/jdeflate

tyingq · 2026-03-27T14:57:27 1774623447

His also omits CRC, which is part of the 25k lines, no --fast/--best/etc, missing some output formats, and so on. I'm sure the 25k includes a lot of bloat, but the comparison is odd. Comparing to your list would make much more sense.

kibwen · 2026-03-27T15:08:12 1774624092

I would expect a CRC to add a negligible number of lines of code. The reason that production-grade decompressors are tens of thousands of LOC is likely attributable to extreme manual optimization. For example, I wouldn't be surprised if a measurable fraction of those lines are actually inline assembly.

nayuki · 2026-03-27T15:45:07 1774626307

True. A most basic CRC implementation is about 7 lines of code: (presented in Java to avoid some C/C++ footguns)

    int crc32(byte[] data) {
        int crc = ~0;
        for (byte b : data) {
            crc ^= b & 0xFF;
            for (int i = 0; i < 8; i++)
                crc = (crc >>> 1) ^ ((crc & 1) * 0xEDB88320);
        }
        return ~crc;
    }

Or smooshed down slightly (with caveats):

    int crc32(byte[] data) {
        int crc = ~0;
        for (int i = 0; i < data.length * 8; i++) {
            crc ^= (data[i / 8] >> (i % 8)) & 1;
            crc = (crc >>> 1) ^ ((crc & 1) * 0xEDB88320);
        }
        return ~crc;
    }

But one reason that many CRC implementations are large is because they include a pre-computed table of 256× 32-bit constants so that one byte can processed at a time. For example: https://github.com/madler/zlib/blob/7cdaaa09095e9266dee21314...

xxs · 2026-03-27T15:53:08 1774626788

That's java code, though... bit weird, esp. i % 8 (which is just i & 7). The compiler should be able to optimize it since 'i' is guaranteed to be non-negative, still awkward.

Java CRC32 nowadays uses intrinsics and avx128 for crc32.

kevin_thibedeau · 2026-03-27T20:12:17 1774642337

With C++20 you can use consteval to compute the table(s) at compile time from template parameters.

ack_complete · 2026-03-27T15:50:20 1774626620

Doesn't need to be inline assembly, just pre-encoded lookup tables and intrinsics-based vectorized CRC alone will add quite a lot of code. Most multi-platform CRC algorithms tend to have at least a few paths for byte/word/dword at a time, hardware CRC, and hardware GF(2) multiply. It's not really extreme optimization, just better algorithms to match better hardware capabilities.

The Huffman decoding implementation is also bigger in production implementations for both speed and error checking. Two Huffman trees need to be exactly complete except in the special case of a single code, and in most cases they are flattened to two-level tables for speed (though the latest desktop CPUs have enough L1 cache to use single-level).

Finally, the LZ copy typically has special cases added for using wider than byte copies for non-overlapping, non-wrapping runs. This is a significant decoding speed optimization.

tyingq · 2026-03-27T15:30:51 1774625451

Yes, there's subdirs with language bindings for many non-C langs, an examples folder with example C code, win32 specific C code, test code, etc.

More reasons it's an odd comparison.

fullstop · 2026-03-27T15:17:59 1774624679

gzip also contains a significant amount of compatibility code for different platforms.

xxs · 2026-03-27T15:51:17 1774626677

Crc32 can be written in handful lines of code. Although it'd be better to use the vector instruction set - e.g. AVX when available.