A buddy of mine and I are working on a weekend project together. We recently realized that we don't need all that much precision, just a little, and switching from doubles or floats to 16-bit fixed point in our main data structure actually makes it small enough to fit an instance in a typical cache line (< 64 bytes).
Completely unnecessary for our target platform but deeply satisfying.
For performance sensitive code memory bandwidth is very often the limiting factor, thus compressing values tends to make a lot of sense. The number of CPU cores is increasing much faster than memory bandwidth.
I expect it will have some effect given that we anticipate having ~1000 instances of that data structure alive at worst case, at least dozens at any given time.
Completely unnecessary for our target platform but deeply satisfying.