We also have much much bigger single machines available for reasonable money. So...

Dalewyn · on June 19, 2024

It's kind of mind boggling just how powerful mundane desktop computers have gotten, let alone server hardware.

Think about it: That 20 core CPU (eg: i7 14700K) you can buy for just a couple hundred dollars today would have been supercomputer hardware costing tens or hundreds of thousands of dollars just a decade ago.

EVa5I7bHFq9mnYK · on June 19, 2024

According to geekbench, an i9 4790 processor released a decade ago is ~5 times slower than i7 14700. 4790's go for $30 at ebay, vs $300 for 14700, so price/performance seems to be in favor of older hardware:)

inglor_cz · on June 19, 2024

What about power consumption? When running a server 24/7, power is likely to be a bigger cost concern than the one-off cost of purchasing the processor.

sgarland · on June 19, 2024

Under full load, roughly 100W for the 4790, and 350W for the 14700. Note that both links are for the K variant, and also, both were achieved running Prime95. More normal workloads are probably around 2/3 those peak values.

For a desktop, yeah, you’re generally better off buying newer from a performance/$ standpoint. For servers, the calculus can shift a bit depending on your company’s size and workloads. Most smaller companies (small is relative, but let’s go with “monthly cloud bill is < $1MM”) could run on surprisingly old hardware and not care.

I have three Dell R620s, which are over a decade old. They have distributed storage via Ceph on NVMe over Mellanox ConnectX3-PRO. I’ve run DB benchmarks (with realistic schema and queries, not synthetic), and they nearly always outclass similarly-sized RDS and Aurora instances, despite the latter having multiple generations of hardware advancements. Local NVMe over Infiniband means near-zero latency.

Similarly, between the three of them, I have 384 GiB of RAM, and 36C/72T. Both of those could go significantly higher.

Those three, plus various networking gear, plus two Supermicro servers stuffed with spinning disks pulls down around 700W on average under mild load. Even if I loaded the compute up, I sincerely doubt I’d hit 1 kW. Even then, it doesn’t really matter for a business, because you’re going to colo them, and you’re generally granted a flat power budget per U.

The downside of course is that you need someone[s] on staff that knows how to provision and maintain servers, but it’s honestly not that hard to learn.

[0]: https://www.guru3d.com/review/core-i7-4790k-processor-review...

[1]: https://www.tomshardware.com/news/intel-core-i9-14900k-cpu-r...

Breza · on June 25, 2024

You might like https://labgopher.com/

scotty79 · on June 19, 2024

I think for server type workloads to get performance improvement estimate it would be reasonable to compare single core performance and multiply by the ratio of number of cores.

Dalewyn · on June 19, 2024

On the other hand, the E7-8890 v3 (the closest equivalent to a 14700K in core count at the time from a quick glance) had an MSRP of $7174.00[1].

So maybe I was a bit too high on the pricing earlier, but my point still stands that the computing horsepower we have such easy access to today was literal big time magic just a decade ago.

[1]: https://ark.intel.com/content/www/us/en/ark/products/84685/i...

teleforce · on June 20, 2024

The RAM also get much larger and cheaper, and it is now possible to have several terabyte (TB) of RAM memory (not storage), in a single PC or workstation. This i7 14700K can support 192 GB RAM but other lower end Xeon CPU W for workstation for example w3-2423 costing around USD350 can support 2 TB RAM albeit only 6-core [1]. But then with not so much more extra budgets you can scale the machine to your heart's content [2].

[1] Intel Xeon w3-2423 Processor 15M Cache, 2.10 GHz:

https://www.intel.com/content/www/us/en/products/sku/233484/...

[2] Intel Launches Xeon W-3400 and W-2400 Processors For Workstations: Up to 56 Cores and 112 PCIe 5.0 Lanes:

https://www.anandtech.com/show/18741/intel-launches-xeon-w-3...

Breza · on June 25, 2024

Good point. And going back to the start of this thread, you can put a whole lot of Postgre into a machine with even a few hundred gigs of RAM.

koliber · on June 20, 2024

This was true in the 2000's and 2010's as well. A lot of the work could be handled by a single monolithic app running on one or a small handful of servers. However, because of the microservices fad, people often created complicated microservices distributed across auto-scaling kubernetes clusters, just for the art of it. It was unneeded complexity then, as it is now, in the majority of cases.

didgetmaster · on June 24, 2024

Hardware has certainly progressed significantly over the years, but the size of workloads has also grown.

The big question is: Does a 'reasonable' workloads today fit on a single machine better than a 'reasonable' workload did 20 years ago?

alfiedotwtf · on June 19, 2024

This! You NEEDED to scale horizontally because machines were just doing too much. I remember when our Apache boxes couldn’t even cope doing SSL so we had a hardware box doing it on ingress!

throwaway7ahgb · on June 19, 2024

I used to administrate a small fleet of sun sparc hosts with SSL accelerators. They were so much money $$$$.

I proposed dumping them all for a smaller set of x86 hosts running linux, it took 2-3 years before the old admins believed in the performance and cost savings. They refused to believe it would even work.

JeremyNT · on June 19, 2024

I lived through that era too, it was wild to see how quickly x86 dethroned sparc (even Intel's big misses like Itanium were only minor bumps in the road).

Those days, you had to carefully architect your infrastructure and design your workload to deal with it, and every hardware improvement required you to reevaluate what you were doing. Hence novel architectural choices.

Everything is way easier for normal sized organizations now, and that level of optimization is just no longer required outside of companies doing huge scale.

rastignack · on June 19, 2024

I have the same memories, trying to convince people to dump slow as hell sparc processors for database workloads in favor of X86 machines costing a 10th of the price.

To this day I still argue with ex Solaris sysadmins.

bjourne · on June 19, 2024

15 years ago I ran a website (Django+Postgres+memcached) serving 50k unique daily visitors on a dirt cheap vps. Even back then the scalability issues were overstated.

namaria · on June 20, 2024

As the stock market prices now expected future growth, architecture had to justify rising stock prices by promising future scalability.

It was never about the actual workloads, much more about growth projections. And a whole lot of cargo cult behavior.

bamboozled · on June 19, 2024

What happens when the single machine fails?

marhee · on June 19, 2024

Worst case scenario you service is not available for a couple of hours. In 99% of business, customers are totally okay with that (if it's just not every week). IRL shops are also occasionally closed due to incidents; heck even ATMs and banks don't work 100% of the time. And that's the worst case: because your setup is so simple, restoring a backup or even doing a full setup of a new machine is quite easy. Just make sure you test you backup restore system regularly. Simple system also tend to fail much less: I've run a service (with customers paying top euro) that was offline for ~two hours due to an error maybe once or twice in 5 years. Both occurrences were due to a non-technical cause (a bill that wasn't payed - yes this happened, the other one I don't recall). We were offline for a couple of minutes daily for updates or the occasional server crash (a go monolith, crash mostly due to an unrecovered panic), however the reverse proxy was configured to show a nice static image with the text along the lines "The system is being upgraded, great new features are on the way - this will take a couple of minute". I installed this the first week when we started the company with the idea that we would do a live-upgrade system when customers started complaining. Nobody ever complained - in fact customers loved to see we did an upgrade once in a while (although most customers never mentioned having seen the image).

bamboozled · on June 19, 2024

Depending on your product, this could mean tens of thousands to millions of dollars worth of revenue loss. I don't really see how we've gone backwards here.

You could just distribute your workloads using...a queue, and not have this problem, or have to pay for and pay to maintain backup equipment etc.

pjlegato · on June 19, 2024

If your product going down for an hour will lead to the loss of millions of dollars, then you should absolutely be investing a lot of money in expensive distributed and redundant solutions. That's appropriate in that case.

The point here is that 99% of companies are not in that scenario, so they should not emulate the very expensive distributed architectures used by Google and a few other companies that ARE in that scenario.

For almost all companies on the smaller side, the correct move is to take the occasional downtime, because the tiny revenue loss will be much smaller than the large and ongoing costs of building and maintaining a complex distributed system.

Noneofya · on June 20, 2024

> The point here is that 99% of companies are not in that scenario

I‘d argue that is wrong for any decently sized ecommerce platform or production facility. Maybe not millions per hour, but enough to warrant redundancy. There’s many revnue and also redundancy levels between Google and your mom and pop restaurant menu.

sturmdev · on June 19, 2024

From the original post: “Your business is not Google and will never be Google”

From the post directly above: “Most businesses…”

The thread above is specifically discussing business which won’t lose a significant amount of money if they go down for a few minutes. They also postulate that most businesses fall into this category, which I’m inclined to agree with.

bamboozled · on June 20, 2024

I understand it in practice but I also think it's weird to be working on something that isn't aiming to grow, maybe not to good scale but building systems which are "distributable" from and early stage seems wise to me.

Noneofya · on June 20, 2024

Hours, not minutes. That is relevant for most businesses.

koliber · on June 20, 2024

It could. In those cases, you set up the guardrails to minimize the loss.

In your typical seed, series A, or series B SaaS startup, this is most often not the case. At the same time, these are the companies that fueled the proliferation of microservice-based architectures, often with a single-point of failure in the message queue or in the cluster orchestration. They shifted easy-to-fix problems into hard-to-fix problems.

fennecbutt · on June 22, 2024

Hellishly and endlessly optimising for profit is how we've gotten the world into its current state, lmao.

perbu · on June 19, 2024

Machine failures are few and far between these days. Over the last four years I've had a cluster of perhaps 10 machines. Not a single hardware failure.

Loads of software issues, of course.

I know this is just an anecdote, but I'm pretty certain reliability has increased by one or two orders of magnitude since the 90s.

sgarland · on June 19, 2024

Also anecdotally, I’ve been running 12th gen Dells (over a decade old at this point) for several years. I’ve had some RAM sticks report ECC failures (reseat them), an HBA lose its mind and cause the ZFS pool to offline (reseat the HBA and its cables), and precisely one actual failure – a PSU. They’re redundant and hot-swappable, so I bought a new one and fixed it.

bamboozled · on June 19, 2024

You didn't answer the question though. You're answer is "it won't" and that isn't a good strategy.

Zababa · on June 19, 2024

It is in that if something happens less often, you don't need to prepare for it as much if the severity stays the same (cue in Nassim Taleb entering the conversation).

bamboozled · on June 19, 2024

I'm not sure what types of products you work on, but it's kind of rare at most companies I've worked at where having a backup like that is a workable solution.

koliber · on June 20, 2024

Your monitoring system alerts you on your phone, and you fix the issue.

When I worked with small firms who used kubernetes, we had more kubernetes code issues that machines failing. The solution to the theoretical problem was the cause of real issues. It was expensive to keep fixing this.

venv · on June 19, 2024

Depending on your requirements for uptime, you could have a stand-by machine ready or you spin up a new one from backups.