I should correct and clarify: I meant 3-4x more expensive in relative terms. Meaning:
- For C++ programs, the allocator (allocating+freeing) consumes roughly 5% of cycles.
- For Go programs, the allocator (runtime.mallocgc) used to consume ~20% of cycles (this is the data I referenced). I checked and recently it's become closer to 15%, thanks to optimizations.
I have not tested the performance differential on a per-byte level (though that will also differ with object structure in Go).