The good thing about this usecase is that they do not need to solve the general problem of forking a multithreades program.
The child process is doing a very specific job that amounts to serializing the contents of memory and writing it to a file. Once that is done is can simply exit, and all the orphans are cleaned up.
While this is admittadly more complicated, it is the same general idea behind fork+exec. Sure, your serialization logic cannot call malloc, but how often do you really need dynamic memory allocation. And if you really do need to, you can use a simple mmap backed bump allocator (or any other custom allocator you want)
For datastructure locks protecting gamestate; just wait until the gamestate is in a consistent state before forking.
Of course, if you try reusing existing code, or forget about the limitations while writing new code; it would be easy to introduce non-deterministic bugs.
It is very difficult to write general purpose C++ code that does not call malloc somehere.
Something fails and you want to display a message `"Error: " + reason`? Bam, malloc right here, serialisation process may hang forever in a malloc lock.
You quit the game, the parent process exits, and the serialisation process gets reparented to init, invisibly using up your RAM until you reboot.
fork()+exec() works in C because C has no invisible memory allocation, and even there you'd usually try to not call any function in between the two to be very sure.
Using fork() without being 100% sure there can be no malloc usually means inviting years of rare, hard-to-reproduce random weird bug reports.
Beyond that, as the post mentions, fork() needs "requires a significant amount of RAM to work" if many pages are touched due to copy-on-write, and copy-on-write also slows down the main game.
It seems much safer to use a thread for saving the game state.
In principle, using a thread for saving the game state is a much more difficult problem, since the game state itself will mutate. You need to be extremely careful to assure that the state you serialize is consistent and is at least a plausible game state (even if there is a bit of wiggle room to allow for game states that technically never existed). This complexity invades every aspect of game logic; and bugs here tend to be subtle corruption of your save data that will take a while to notice; thereby obscuring the relationship to the corruption, and the save/reload cycle.
In contrast, forking with its COW semantics is conceptually easy. You just fork. The main process can continue running, and the child process gets a frozen snapshot. There is a bunch of overhead from the copy part of copy-on-write. However, most of that overhead will likely be spent in the first frame; which is still a significant improvement over the pause time associated with stop the world saving. In practice, coding for the child process is tricky. However, it is self contained and responsible only for a relatively simple problem. No complex problems to solve, just a relatively small amount of code that needs to be written carefully.
The RAM usage is a real trade-off inherent in the approach.
> You quit the game, the parent process exits, and the serialisation process gets reparented to init, invisibly using up your RAM until you reboot.
Or until the short-lived child process finishes its work and exits on its own.
If they have the ability to pause the entire game in a consistent state while its state is being saved (in the foreground save case), then they certainly have the ability to pause the entire game in a consistent state so they can fork. Just the latter pause will be much much shorter.
Ah, I see. I was assuming that the achievement of consistent state must somehow be achieved in the child.
If the parent can achieve consistent state (e.g. doing the equivalent of pressing "pause"), why not do the following instead:
While paused, memcpy the current memory to a buffer, then simultaneously {resume game, spawn thread to write the buffer to disk}. In C++ the memcpy might be even more convenient with the copy constructor.
This will introduce a short delay for the copy, at the speed of RAM bandwidth.
But that copy will need to be done anyway, straight away, as the parent poster says:
> that [copy] overhead will likely be spent in the first frame
With fork() it just happens in the kernel instead of in userspace, thus likely slower (1000s of individual sequential page faults, instead of a single contiguous allocation).
So if the fork() approach can somehow do it faster, I'd be curious via what mechanics.
The post was already getting long so I didn't mention this, but I already found an fixed a potential deadlock due to each process not closing the redundant ends of the pipe, which was specifically touted as being a potential cause for freezing. However, someone reported the bug again after that change went out, so there's still SOMETHING else causing it to hang that I haven't tracked down yet.
Is that relevant if the forked process only has to save the game-state to disk and then die? I'd think reading the game-state wouldn't require some mutex lock.