Blosc, an extremely fast, multi-threaded, meta-compressor library

jasode · on Sept 22, 2014

>faster than a memcpy() OS call

I usually don't nitpick terminology but memcpy() is a C language runtime library function and not a Linux/Win32 os call.

oofabz · on Sept 22, 2014

> Blosc comes with a pre-filter (also called pre-conditioner) called shuffle which rearranges bytes in a clever way for the compression stage.

This sounds like the Burrows-Wheeler transform, which bzip2 uses:

https://en.wikipedia.org/wiki/Burrows%E2%80%93Wheeler_transf...

pkhuong · on Sept 22, 2014

Fixed-width binary data (e.g., a sequence of double floats) often benefit from a simpler transform: just transpose the bits/bytes so that, e.g., the least significant bytes form a contiguous region, followed by all the second least significant bytes, etc.

> Meant for binary data: can take advantage of the type size meta-information for improved compression ratio (using the integrated shuffle pre-conditioner).

makes it sound like that's what Blosc is doing.

shiningmuppet · on Sept 23, 2014

Yes, exactly, like so:

http://www.blosc.org/images/shuffle.png

rdc12 · on Sept 23, 2014

Sounds interesting is there a name for that technique? Or more to the point something that can be searched for

shiningmuppet · on Sept 26, 2014

we call it the 'shuffle filter' but that is all

DanBC · on Sept 22, 2014

I wish they'd done some benchmarking to demonstrate how quick it is across different data.

faltet · on Sept 23, 2014

Here you have a benchmark based on the MovieLens database:

http://nbviewer.ipython.org/github/Blosc/movielens-bench/blo...

The results are explained here:

http://www.blosc.org/docs/bcolz-EuroPython-2014.pdf

and, more in-depth here:

https://python.g-node.org/wiki/starving_cpu

_ondq · on Sept 22, 2014

Pretty cool. Faster and better compression ratios, according to the tutorial.

shiningmuppet · on Sept 23, 2014

Depends on the data.

kolev · on Sept 23, 2014

No GitHub? No Bitbucket? Just a source code dump? Weird!

shiningmuppet · on Sept 23, 2014

https://github.com/Blosc/

kolev · on Sept 23, 2014

Thanks!

fenollp · on Sept 22, 2014

Yes but does it achieve optimal tip-to-tip efficiency?

adwilk · on Sept 22, 2014

Yeh it's fast, but what's its Weissman score?

shiningmuppet · on Sept 23, 2014

It's off the charts...