> If you have a garbage collector that runs in 200us
The problem is GCs for popular languages are nowhere near this good. People will claim their GC runs in 200us, but it's misleading.
For example, they'll say they have a 200us "stop the world" time, but then individual threads can still be blocked for 10ms+. Or they'll quote an average or median GC time, when what matters is the 99.9th percentile time. If you run GC at every 120 Hz frame then you hit the 99.9th percentile time every minute.
Finally, even if your GC runs in parallel and doesn't block your game threads it still takes an unpredictable amount of CPU time and memory bandwidth while it's running, and can have other costs like write barriers.
Benchmarking a full sweep with 0 objects to free in Julia:
julia> @benchmark GC.gc()
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 64.959 ms (100.00% GC)
median time: 66.848 ms (100.00% GC)
mean time: 67.062 ms (100.00% GC)
maximum time: 73.149 ms (100.00% GC)
--------------
samples: 75
evals/sample: 1
Julia's not a language normally used for real time programs (and it is common to work around the GC / avoid allocating), but it is the language I'm most familiar with.
Julia's GC is generational; relatively few sweeps will be full. But seeing that 65ms -- more than 300 times slower than 200us -- makes me wonder.
Yep. (You know this, but) just as another data point, an incremental pass takes more like 75 microseconds, and a 'lightly' allocating program probably won't trigger a full sweep (no guarantees though).
these aren't theoretical numbers, they're the numbers that people are hitting in production. See this thread wrt gc pause times on a production service at twitter https://twitter.com/brianhatfield/status/804355831080751104 also referenced here, which talks at length about gc pause time distributions and pause times at the 99.99th percentile https://blog.golang.org/ismmkeynote
> they'll say they have a 200us "stop the world" time, but then individual threads can still be blocked for 10ms+
As far as I know this is still true of the Go GC. Write barriers are also there and impact performance vs. a fixed size arena allocator that games often use that has basically zero cost.
It's also important to consider how often the GC runs. GC that runs in 200us but does so ten times within a frame deadline might as well be a GC that runs in 2ms. Then there are issues of contention, cache thrashing, GC-associated data structure overhead, etc. The impact of GC is a lot more than how long one pass takes, and focusing on that ignores many other reasons why many kinds of systems might avoid GC.
The problem is GCs for popular languages are nowhere near this good. People will claim their GC runs in 200us, but it's misleading.
For example, they'll say they have a 200us "stop the world" time, but then individual threads can still be blocked for 10ms+. Or they'll quote an average or median GC time, when what matters is the 99.9th percentile time. If you run GC at every 120 Hz frame then you hit the 99.9th percentile time every minute.
Finally, even if your GC runs in parallel and doesn't block your game threads it still takes an unpredictable amount of CPU time and memory bandwidth while it's running, and can have other costs like write barriers.