Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Blosc, an extremely fast, multi-threaded, meta-compressor library (blosc.org)
35 points by 0x1997 on Sept 22, 2014 | hide | past | favorite | 16 comments


>faster than a memcpy() OS call

I usually don't nitpick terminology but memcpy() is a C language runtime library function and not a Linux/Win32 os call.


> Blosc comes with a pre-filter (also called pre-conditioner) called shuffle which rearranges bytes in a clever way for the compression stage.

This sounds like the Burrows-Wheeler transform, which bzip2 uses:

https://en.wikipedia.org/wiki/Burrows%E2%80%93Wheeler_transf...


Fixed-width binary data (e.g., a sequence of double floats) often benefit from a simpler transform: just transpose the bits/bytes so that, e.g., the least significant bytes form a contiguous region, followed by all the second least significant bytes, etc.

> Meant for binary data: can take advantage of the type size meta-information for improved compression ratio (using the integrated shuffle pre-conditioner).

makes it sound like that's what Blosc is doing.



Sounds interesting is there a name for that technique? Or more to the point something that can be searched for


we call it the 'shuffle filter' but that is all


I wish they'd done some benchmarking to demonstrate how quick it is across different data.


Here you have a benchmark based on the MovieLens database:

http://nbviewer.ipython.org/github/Blosc/movielens-bench/blo...

The results are explained here:

http://www.blosc.org/docs/bcolz-EuroPython-2014.pdf

and, more in-depth here:

https://python.g-node.org/wiki/starving_cpu


Pretty cool. Faster and better compression ratios, according to the tutorial.


Depends on the data.


No GitHub? No Bitbucket? Just a source code dump? Weird!



Thanks!


Yes but does it achieve optimal tip-to-tip efficiency?


Yeh it's fast, but what's its Weissman score?


It's off the charts...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: