OP specifically mentioned contention, though -- not marginally higher cost of at...

SkiFire13 · 2025-09-29T05:10:32 1759122632

> not marginally higher cost of atomic inc/dec vs plain inc/dec.

Note that the difference is not so marginal, and the difference is not just in hardware instructions as the non-atomic operations generally allow for more optimizations by the compiler.

loeg · 2025-09-29T06:18:40 1759126720

The actual intrinsic is like 8-9 cycles on Zen4 or Ice Lake (vs 1 for plain add). It's something if you're banging on it in a hot loop, but otherwise not a ton. (If refcounting is hot in your design, your design is bad.)

It's comparable to like, two integer multiplies, or a single integer division. Yes, there is some effect on program order.

vlovich123 · 2025-09-29T04:50:45 1759121445

Can’t you have cross core contention just purely because of other processes doing atomics that happen to have a cache line address collision in the lock broadcast?