> An infinite loop made its way onto a majority(? all?) of production servers, a...

> An infinite loop made its way onto a majority(? all?) of production servers, and the immediate response is more or less 'we shouldn't have deployed to as many customers, failure should have only happened to a small subset'?

All server software has one or more "infinite loops." It is a fundamental object in all listeners.

Plus when they say infinite loop, I assumed they meant it continuously entered a crash/restart cycle rather than a while(true) {} in a line of code.

I think the reality on the ground is that bug-free software is a myth. All you can do is have processes (like gradual deployment) to mitigate the damage it can do, rather than making it your goal to write the mythical perfect code.

> But this exemplifies a major problem our industry suffers from, in that it just taken as a given that critical errors will sometimes make their way into production servers and the best we can do is reduce the impact. I find this absolutely unacceptable

It is a major problem. It costs billions every year. But what can be done? If there was a magic wand solution I'm sure people would be scrambling to deploy it as it saves them money.

> How about we short circuit the process and identify ways to stop that from happening? Were there enough code reviews? Did automated testing fail here?

This seems like a somewhat naive view of software development in general. Like what I'd call a "mathematician's view," in the sense that they think large complex systems can be reduced to a simple quantifiable process.

Code reviews and more importantly unit tests can help find bugs. But inter-connectivity between large complex systems is harder to test again, and harder to code review (because the bugs don't exist on any single line of code, or in any single block even).