Worst case scenario you service is not available for a couple of hours. In 99% of business, customers are totally okay with that (if it's just not every week). IRL shops are also occasionally closed due to incidents; heck even ATMs and banks don't work 100% of the time. And that's the worst case: because your setup is so simple, restoring a backup or even doing a full setup of a new machine is quite easy. Just make sure you test you backup restore system regularly.
Simple system also tend to fail much less: I've run a service (with customers paying top euro) that was offline for ~two hours due to an error maybe once or twice in 5 years. Both occurrences were due to a non-technical cause (a bill that wasn't payed - yes this happened, the other one I don't recall).
We were offline for a couple of minutes daily for updates or the occasional server crash (a go monolith, crash mostly due to an unrecovered panic), however the reverse proxy was configured to show a nice static image with the text along the lines "The system is being upgraded, great new features are on the way - this will take a couple of minute". I installed this the first week when we started the company with the idea that we would do a live-upgrade system when customers started complaining. Nobody ever complained - in fact customers loved to see we did an upgrade once in a while (although most customers never mentioned having seen the image).
Depending on your product, this could mean tens of thousands to millions of dollars worth of revenue loss. I don't really see how we've gone backwards here.
You could just distribute your workloads using...a queue, and not have this problem, or have to pay for and pay to maintain backup equipment etc.
If your product going down for an hour will lead to the loss of millions of dollars, then you should absolutely be investing a lot of money in expensive distributed and redundant solutions. That's appropriate in that case.
The point here is that 99% of companies are not in that scenario, so they should not emulate the very expensive distributed architectures used by Google and a few other companies that ARE in that scenario.
For almost all companies on the smaller side, the correct move is to take the occasional downtime, because the tiny revenue loss will be much smaller than the large and ongoing costs of building and maintaining a complex distributed system.
> The point here is that 99% of companies are not in that scenario
I‘d argue that is wrong for any decently sized ecommerce platform or production facility. Maybe not millions per hour, but enough to warrant redundancy. There’s many revnue and also redundancy levels between Google and your mom and pop restaurant menu.
From the original post: “Your business is not Google and will never be Google”
From the post directly above: “Most businesses…”
The thread above is specifically discussing business which won’t lose a significant amount of money if they go down for a few minutes. They also postulate that most businesses fall into this category, which I’m inclined to agree with.
I understand it in practice but I also think it's weird to be working on something that isn't aiming to grow, maybe not to good scale but building systems which are "distributable" from and early stage seems wise to me.
It could. In those cases, you set up the guardrails to minimize the loss.
In your typical seed, series A, or series B SaaS startup, this is most often not the case. At the same time, these are the companies that fueled the proliferation of microservice-based architectures, often with a single-point of failure in the message queue or in the cluster orchestration. They shifted easy-to-fix problems into hard-to-fix problems.