MacOS has transparent memory compression. Unclear to me if that's made its way t...

astrange · on Nov 10, 2022

Memory compression is a generalization of swap, which is only for dynamic memory; files on disk don't need it because you can just read them off the disk.

The problem is that GPUs don't support virtual memory paging, so they can't read files nor decompress nor swap anything unless you write it yourself, which is a lot slower.

Also, ML models (probably) can't be compressed because they already are compressed; learning and compression are the same thing!

earthscienceman · on Nov 10, 2022

Wait. This comment just blew my mind. Does that imply that you might be able to measure the efficiency of a model by it's compressibility? Note, I'm trying to recognize efficient and accurate are not the same. One could imagine evaluating a model on a 2d performance and compression map somehow.

ColonelPhantom · on Nov 10, 2022

> Also, ML models (probably) can't be compressed because they already are compressed; learning and compression are the same thing!

I feel like they're kind of two sides of the same coin: learning is about putting more information in the same data, while compression is about putting the same information in less data.

I'm wondering if some lossy floating-point compressor (such as zfp) would work.

astrange · on Nov 10, 2022

> I'm wondering if some lossy floating-point compressor (such as zfp) would work.

Well apparently this can work; StableDiffusion comes as 32-bit and 16-bit float versions. I'm kind of surprised they both work, but that's lossy compression.

ColonelPhantom · on Nov 11, 2022

Sure, but 16-bit float is pretty primitive compression, as it does not exploit any redundancy in the input. zfp groups numbers together in chunks, which means that correlated numbers can be represented more precisely. Its algorithm is described here: https://zfp.readthedocs.io/en/release1.0.0/algorithm.html#lo...

I would like to see if the zfp can be applied to something like Stable Diffusion (or other ML models) and give better results than regular floats at the same size.

comboy · on Nov 10, 2022

Memory compression? I can't find any good resources to read about it, any hints? I'm having trouble imaging how could it possibly work without totally destroying performance.

kccqzy · on Nov 10, 2022

It doesn't destroy performance for the simple reason that nowadays memory access is slower than pure compute. If you need to use compute to produce some data to be stored in memory, your overall throughput could very well be faster than without compression.

There have been a large amount of innovation on fast compression and fast decompression in recent years. Traditional compression tools like gzip or xz are geared towards higher compression ratio, but memory compression tends to favor speed. Check out those algorithms:

* lz4: https://lz4.github.io/lz4/

* Google's snappy: https://github.com/google/snappy

* Facebook's zstd in fast mode: http://facebook.github.io/zstd/#benchmarks

miohtama · on Nov 10, 2022

On Mac, you can find Compressed memory in Activity monitor.

It’s something similar to swap - apps do not need to have built in support for it.

flatiron · on Nov 10, 2022

It segments a certain amount of ram to “swap” to which means compress and store. Normal blue sky ram operations are not compressed on macOS

smcleod · on Nov 10, 2022

Many operations are actually a lot faster with compressed memory than without. It's all about where the bottleneck is.

comboy · on Nov 10, 2022

Oh, yes compressed swap makes much more sense, thanks.

kergonath · on Nov 10, 2022

It is not compressed swap, the compressed data is still in RAM. The OS just compresses inactive memory, with a couple of criteria to define “inactive”.

miohtama · on Nov 10, 2022

My guess is that iPhone is purely “kill app” instead of “compress memory / swap” OOM model. This makes more sense for mobile.

Sirened · on Nov 10, 2022

iOS uses memory compression but not swap. iOS devices actually have special CPU instructions to speed up compression of up to page size increments specifically to aid in this model [1]

[1] https://github.com/apple-oss-distributions/xnu/blob/bb611c8f...

musicale · on Nov 10, 2022

IIRC from WWDC they said that inactive/suspended apps get their memory compressed to free up memory for the current active/foreground app.

Seems to mesh well with the iOS idea of using a single app at a time and minimizing background processing in apps that you aren't actively using.

In an out of memory situation I think apps just get killed as you suggest.