I would expect a CRC to add a negligible number of lines of code. The reason that production-grade decompressors are tens of thousands of LOC is likely attributable to extreme manual optimization. For example, I wouldn't be surprised if a measurable fraction of those lines are actually inline assembly.
That's java code, though... bit weird, esp. i % 8 (which is just i & 7). The compiler should be able to optimize it since 'i' is guaranteed to be non-negative, still awkward.
Java CRC32 nowadays uses intrinsics and avx128 for crc32.
Doesn't need to be inline assembly, just pre-encoded lookup tables and intrinsics-based vectorized CRC alone will add quite a lot of code. Most multi-platform CRC algorithms tend to have at least a few paths for byte/word/dword at a time, hardware CRC, and hardware GF(2) multiply. It's not really extreme optimization, just better algorithms to match better hardware capabilities.
The Huffman decoding implementation is also bigger in production implementations for both speed and error checking. Two Huffman trees need to be exactly complete except in the special case of a single code, and in most cases they are flattened to two-level tables for speed (though the latest desktop CPUs have enough L1 cache to use single-level).
Finally, the LZ copy typically has special cases added for using wider than byte copies for non-overlapping, non-wrapping runs. This is a significant decoding speed optimization.