You can see that there's a common backend ("configerator") that a lot of other systems ("sitevars", "gatekeeper", ...) build on top of.
Just imagine that these systems have been further developed over the last decade :)
In general, there's 'configuration change at runtime' systems that the deployed code usually has access to and that can switch things on and off in very short time (or slowly roll it out). Most of these are coupled with a variety of health checks.
More seriously, at my old company they just never got removed. So it wasn’t really about control. You just forgot about the ones that didn’t matter after awhile.
If that sounds horrible, that’s probably the correct reaction. But it’s also common.
Namespacing helps too. It’s easier to forget a bunch of flags when they all start with foofeature-.
I’ve seen those old flags come in handy once. Someone accidentally deleted a production database (typo) and we needed to stop all writes to restore from a backup. For most of it, it was just turning off the original feature flag, even though the feature was several years old.
At a previous workplace we managed flags with Launch Darkly. We asked developers not to create flags in LD directly but used Jira web hooks to generate flags from any Jira issues of type Feature Flag. This issue type had a workflow that ensured you couldn't close off an epic without having rolled out and then removed every feature flag. Flags should not significantly outlast their 100% rollout.
I work at a different company. Typically feature flags are short-lived (on the order of days or weeks), and only control one feature. When I deploy, I only care about my one feature flag because that is the only thing gating the new functionality being deployed.
There may be other feature flags, owned by other teams, but it's rare to have flags that cross team/service boundaries in a fashion that they need to be coordinated for rollout.
You have automated tools that yell at you to clean up feature flags and you force people to include sensible expiration dates at part of your PR process. Flags past the date result in increased yelling. If your team has too much crap in the codebase eventually someone politely tells you to clean it up.
You also have tooling that measures how many times a flag was encountered vs. how many times it actually triggered etc. Once it looks like it's at 100% of traffic, again you have automations that tell people to clean up their crap.