Atomic refcounting consumes global, system wide resource of inter-core and inter-CPU (like QPI) communication. 20M-50M of those events per second can render your 4 CPU 128 core system frozen.
Also just freeing memory can create a surprisingly long pause. Depends on allocator, of course.
> Atomic refcounting consumes global, system wide resource of inter-core and inter-CPU (like QPI) communication.
Forgive me if I'm wrong, but wouldn't reference counters consume these global resources only when there's contention for the cacheline? Once the cacheline is in an exclusive state, there's no need to go outside the core at all.
Yes, and refcounting creates false contention. For example, a global config object is constantly having its refcount bumped up and down every time someone dereferences it, even though the actual contents are read-only.
There are ways to mitigate this, but they’re complex.
This is actually an excellent case where Rust can help you completely remove the need to lock and help you keep your memory safe — if you can make it clear that your config object will live longer than all threads referencing it (which shouldn’t be hard if it is truly global), then Rust will let you share it among all threads with no further overhead at all, for programmer or processor.
One interesting thing about rust here is that you can mitigate refcount traffic, since you can also take a regular reference to the data. You only need to increment/decrement on actual ownership stuff.
The point still stands, though. There's not much contention resources available, so if high system performance is desired, one should look for alternatives to avoid creating chatter between cores and CPU sockets.
Also just freeing memory can create a surprisingly long pause. Depends on allocator, of course.