Even if the Moore's law is not dead, single-thread performance and clock frequency have plateaued 10 years ago. This is the key factor. Because of heating even if you squeeze more transistors onto a chip you need to reduce the clock, so even if you may get higher computational throughput the latency will go down. And this is another argument for chiplets or any other alternative computational architectures.
It's amazing how often this is parroted. Anyone with a passing familiarity with the numbers knows this is actually not true at all.
Better caching, branch prediction, plus vast amounts of SRAM. There's been a slow & steady increase in the vast variety of single threaded workloads grouped together by "instructions per clock."
Both at the peak of the voltage frequency curve for workstations & overclocking, the apex of the optimization curve for data centre, and especially at the bare minimum for mobile devices with idle workloads.
Yes, it's a small fraction of the old days. It's still double in 10 years.
And as anyone who's migrated from an Intel Mac to Apple Silicon knows, "merely doubling" is a LOT.
Sorry, doubling in 10 years vs doubling in 18 months effectively is plateauing! Especially since it isn't really a consistent 10% growth per year, but a decelerating growth over that decade. Furthermore, much of the purported single thread performance is taken from a small set of benchmark tests, and so chip makers just optimize them for those tests. Generic single thread performance has undoubtedly not doubled in that 10 years.
In any field of any kind except probably silicon, 100% growth in a decade would be marvelous. I don't think anyone could call it a plateau.
> Furthermore, much of the purported single thread performance is taken from a small set of benchmark tests, and so chip makers just optimize them for those tests
"Single thread" is a notoriously difficult benchmark to quantify. Instruction queue depth, floating vs integer, branching vs linear, there are so many variables.
Passmark is fine. Workload simulation is state of the art.
Have clock speeds really plateaued? Sure it’s not “double every 18 months”, but in mid 2018 I bought an Intel 8700K that turbo’d to 4.7GHz and could (with liquid metal, dark magic and luck) overclock to exactly 5GHz. I remember people saying progress was slowing down, that we might not make it to 6GHz.
4.5 years later and Intel is bragging that their upcoming topline CPU will run 6GHz stock. I suppose one could call this a plateau compared to the good old days of the 80s and 90s, but it’s definitely still progress.
In late 2000, Intel promised that Pentium 4s will hit 10GHz by 2005 – on a presumed 130W power budget –, after the last 5 years saw clock speeds increase from 150MHz to 1.4GHz for the P6 architecture (at a stable 30-40W power budget), and other vendors saw similar increases.
Over 20 years later, we're barely scratching the 6GHz barrier with an opportunistic turbo mode that isn't guaranteed to kick in, if your cooling isn't up to the task of dissipating a record-breaking 250W of peak power consumption.
Part of why that happens is Intel selling chips closer to the red line. You need cooling similar to what used to be exclusive to overclocking just to keep the stock CPU cool.
Yep. We're apparently finding out that it's mostly a waste of electricity to get an extra 5% performance due to how far outside the efficiency sweet spot chips are being pushed.
Not just Intel either. AMD has joined the game as of Zen 4, and NVIDIA's been playing it with their GPUs forever as well.
Zen 4 desktop CPUs appear to have (as expected) virtually unchanged single core performance, and maybe 5% reduced multi-core performance, on CPU-bound workloads by reducing the power limit to cut total power consumption -- by over a 100W reduction in the case of the new 7950X! Granted, Intel's been doing that forever -- reign in Alder Lake and its power consumption also comes way down, again for barely a performance hit in CPU-bound multi-core tasks.
-----
Enthusiast grade CPUs and GPUs are basically sold in the equivalent of a TV's retail "demo mode" now -- where a TV has max brightness, contrast and saturation that you'd NEVER use, but is intended to just grab a customer's attention as they walk by. Being pushed so far outside of their efficiency sweet spot just to get that extra 5% to "win benchmarks", when outside of specific use cases (and even if you actually need that 5%!) you're just consuming 50-100% more electricity for utterly marginal gains.
What a waste of resources! All so children (or people who still act like them) can feel better about a purchase as they argue on the internet over nothing worth arguing about.
If you truly wanted to maximize performance per watt, you'd pick a very different design more reminiscent of GPU's. But then single-thread and low-multi-thread performance would really suck. So it will always be a carefully tuned tradeoff.
Nope. Not even. Again, as the grandparent post stated, everything is being sold with the default configuration being redlined.
You absolutely can reign it back in to sanity, and get better performance per watt over the previous gen, and still be noticeably faster over the previous gen.
-- -----
With AMD, apparently we're going to see BIOS updates from board manufacturers to make doing that simple. Set it to a lower power limit in a few keystrokes, still have something blazing fast (faster than previous gen), but use 145W instead of 250W. Or go even lower, still be a bit faster than previous gen while using around 88W on a 7950X instead of the 118W a 5950X did.
Intel -- who has been redlining their CPUs for years now -- even noted Raptor Lake's efficiency at lower power levels. Again, cut power consumption by 40-50%, for only a tiny performance hit. They actually made entire slides for their recent presentation highlighting this!
NVIDIA no different, and has been for years. Ampere stock voltages were well outside their sweet spot. Undervolt, cut power consumption by 20-25% and have performance UNCHANGED.
-- -----
Sure, there's more efficient stuff. Take last generation's 8-core Ryzen 7 PRO 5750GE. About 80% of the performance of an Intel Core i5-12600K, but only uses 38W flat out instead of 145W.
You don't even really need to rein it back, modern processors will throttle back automatically depending on how effective the cooling is. Anyway, the issue with manual undervolting is that it may adversely impact reliability if you ended up with a slightly substandard chip, that will still work fine at stock settings. That's why it can't just be a default.
>You don't even really need to rein it back, modern processors will throttle back automatically depending on how effective the cooling is
This isn't about thermals. This is about power consumption.
I'm not suggesting reigning in a CPU's power limits because it's "too hot".
I'm suggesting getting 95% of the performance for 59% of the power consumption. Because it's not worth spending 72% more on electricity for 5% increased performance. Again, even the manufacturers themselves know this and are admitting it. Look at this slide from Intel: https://cdn.arstechnica.net/wp-content/uploads/2022/09/13th-... Purported identical performance of previous gen at 25% of the power consumption. They KNOW the default configuration setting (which you can change in a few keystrokes) is total garbage in terms of power efficiency.
-----
I guarantee you server CPUs aren't going to be configured to be this idiotic out of the box. Because in the datacenter, perf/W matters and drives purchasing decisions.
Pentium 4 HT 3.8F, November 2004, 3.8GHz, 115W TDP
Core i9-13900KF, October 2022, 3.0GHz, 125W TDP
Of course, the latter does give you 8 performance cores and 16 efficiency cores so performance-per-watt has clearly improved; and it has 'turbo boost'. But in terms of sustained single-core performance? It's clear Intel's attention has been elsewhere. Such as on the laptop market, where power efficiency is king.
I'm curious if single threaded* games/applications that were CPU limited when the P4 originally came out run better on the 13900k with the same code.
That was my impression from games at the time, that they were coded with an expectation that clock speeds would keep going up in the future. But they didn't and the games probably run just as bad now as they did before.
And even in the Pentium 4 days, AMD CPUs managed similar performance as a Pentium 4 while running at lower clock speeds (which was a problem with some games that tried to auto-tune their performance-settings purely based on Pentium 4 clock speeds, even if you had an AMD CPU – thankfully at least in the Maxis/EA case, those settings were easily hackable, though, and could be easily adjusted to better match non-Pentium 4 CPUs, too).
the frequency plateau always been a power consumption/leakage thing, and power draw for recent intel cpus only reinforces that. it's probably too early to tell if 6ghz is a new normal
and fwiw ive had a 5ghz+ overclock on every cpu ive bought in the last ten years with a corsair 240mm aio, going back to the 3570k
I really hope there are breakthroughs that allow us to use other, trickier, semiconductors like GaN for chips, IIUC their efficiency could allow us to hit much higher frequencies for the same heat output. That said, I doubt we'd see 3nm processes for something like that.
IPC is still improving, so single thread performance is still increasing, even if clock speeds are not (at least not at the same as before). And new instructions (AVX ect) also help, especially if you can optimize and recompile your code.
It's at least enough that we have to take it into account:
We run our workloads across multiple Intel cpu generations and to be able to optimize utilization we have a "speedup factor" which is currently up to 1.7 for the latest generation we've tuned it for. And the base 1.0 performance is from Ivy Bridge, launched 2013.