The comparisons are actually exciting me more than the M3 itself. Yay, more choice!
However I guess you can crank any core up to any performance if you're willing to throw power at it - performance/watt seems to be the by far most interesting metric. I suppose it's certain that the i9 can't keep up at that - but maybe the Qualcomm chip can? If it's 30% or so worse it would actually still be quite good ...
So a bit faster than 7950x in eco mode (which has a much lower tdp), and the 7950x process is two generations old. It's clear that Apple's advantage is mostly just one thing: "TSMC".
A laptop that can match the best of the competition's desktop is pretty big advantage IMHO.
Don't forget that this M3 Max chip is measured inside a laptop that can run at the same performance on battery. Apple's desktop systems already match the greatest offerings of the competition with it's previous gen chip and with the constraints on size and cooling system noise.
I wonder what Apple can do in its labs when pushing their silicons to the limits without constraints on cooling. Every now and then a fringe geekbench score will pop-up, maybe from those lab experiments.
Sure, if you define "a bit" to be %30 then you can say that.
Not everyone will agree with the "a bit" definition though, how many years took AMD to hit a %30 improvement?
Also, the high performance laptops with that chip appear to be %50 heavier than 16" Macbook Pro. According to reviews, those laptops can do 4 hours of general usage(not gaming) at best - so terrible battery life. Also, significant drop in performance when unplugged.
Overall, those machines are optimised for one thing only and apparently they can do it about %30 worse than a Macbook and everything else much, much worse.
For how long can the macbook do the 30% more? Because in my experience as someone who renders animations for multiple days or even weeks on machines, the newer Apple designs are good for burst loads, but don't shine when it comes to continous loads where the thermals start to kick in.
In theory at least. When rendering a blender animation my i7/rtx2040 notebook outperformed the back then strongest M1/Metal laptop easily by a factor of 2.
My suspicion is that during long workloads (more than a hour max load) something like a macbook tapers off into thermal throttle. In the studio we have a bunch of m2 Mac Minis, they are good for the price, but also with them I am unsure about thermals under continous load.
Not a hard achievement if you are focusing on non-3D chips. M3 and AMD's X3D chips have way more cache to CPU. This really livens up the game. It's so impressive that AMD's mobile variant of the X3D chips pretty much almost on par. It has the advantage of faster single core perf though.
As you can see in this list [1], there are several systems which exceed 21k. The 19k scores are mostly stock systems with no tuning (for example, RAM at 3200Mhz, stock fan etc). The Macs would have been carefully tuned, so it's only fair that you'd do so on assembled kits.
I haven't tested Eco mode myself, but from what i've read performance drops by 10-20% depending on the configuration (105w vs 65w). Still in the ballpark.
If we use an overlocked 7590X – ignored the additional power draw – in 65W eco mode, using a rounded up 22K GB6 score, and assume a 20% performance loss (~17600 GB6), the 7950X, at 40% higher TDP than the M3 Max, is still 10% slower than the M3 Max.
Which is especially insane considering the 7950X is a desktop-class CPU, and the M3 Max is sitting in a laptop.
Since we agree on the non-linear relationship between TDP and benchmark scores, can we also agree that further reduction in power (to say 45w) would only cause a small drop in performance? For argument's sake, let's triple the 10% to 30% and also assume that TDP = avg power draw during the benchmark. The 5nm 7950x is then 30% slower than 3nm M3 Max.
My original argument was that Apple's advantage is mostly just TSMC.
> Which is especially insane considering the 7950X is a desktop-class CPU, and the M3 Max is sitting in a laptop.
7945HX3D is a 5nm 55W laptop part, and it scores 15-16k on GB6.
> Since we agree on the non-linear relationship between TDP and benchmark scores, can we also agree that further reduction in power (to say 45w) would only cause a small drop in performance?
You're confusing actual power with the system's configured power limit. When the default power limit is higher than the actual power draw during a given workload, then you obviously have headroom to lower that power limit quite a bit without severely reducing performance. That doesn't mean further reductions in the power limit will have similar impact, once you're working in the range where the power limit actually starts to kick in. And as you get to even lower power limits, a smaller fraction of that power budget is available for doing useful work as subsystems like the memory controller cannot reduce their power consumption as readily as the CPU cores.
The 7945HX3D is a mobile version of the 7950X3D (not 7900X3D), but the only way to consider it a 64W CPU is to ignore the power used by the IO die—and a CPU isn't much use without a memory controller.
Don't mistake a long-term sustained power limit that OEMs can freely adjust for an actual power consumption measurement, especially when discussing a benchmark that only does short bursts of work.
Yes, but not a m2 pro, m2 max, or m2 ultra. Or a M3 pro or M3 max.
Sometime in early 2024 AMD is supposed to release a new APU (CPU+iGPU) called strix. Doesn't seem particularly noteworthy, but mid to late 2024 AMD is going to bring out a chip called strix halo that FINALLY brings more than 128 bit memory system to a APU.
It baffles me that despite a huge GPU shortage that lasted years and shipping a huge number of XboxX and PS5 with nice memory systems that they didn't bother to ship a decent APU and a decent memory system for the desktop.
At least the halo strix should give the M3 pro a run for it's money, still half the M3 max and 1/4th of the M2 ultra.
If they were going to announce Zen 5 high-end mobile parts at CES in January 2024, they would have launched Zen 5 desktop parts by now (because the high-end -HX mobile parts are literally the desktop silicon put into a BGA package instead of LGA). A successor to the 7945HX3D can't be much less than a year away, meaning the 7945HX3D is less than halfway through its product lifecycle.
Actually their standard mobile lineup comes first. Then desktop or if the gains are just a small step they skip and go to premium laptop. So it goes standard laptop -> desktop -> premium laptop.
That is how AMD has released chips for the past six years.
Don't look at the model numbers, look at the architectures.
Zen 1 desktop processors (branded Ryzen 1000 series) were released spring 2017; Zen 1 mobile processors (branded Ryzen 2000 series) were released starting in fall 2017. Zen+ desktop processors (branded Ryzen 2000 series) were released spring 2018; Zen+ mobile processors (branded Ryzen 3000 series) were released at the beginning of 2019. Zen 2 desktop (branded Ryzen 3000 series) were released mid 2019, followed by Zen 2 mobile (branded Ryzen 4000 series) in spring 2020.
For Zen 3 desktop, they skipped 4000 series branding to catch up with the mobile branding: Zen 3 destkop (branded Ryzen 5000 series) launched late 2020, followed by Zen 3 mobile (branded Ryzen 5000 series) at the beginning of 2021. Zen 3+ (branded Ryzen 6000 series) was a mobile-only update to Zen3 (same CPU microarchitecture, minor die shrink, new memory controller) launched at the beginning of 2022. Zen 4 desktop (branded Ryzen 7000 series) launched fall 2022, followed by Zen 4 mobile (branded Ryzen 7000 series) at the beginning of 2023.
Their new architectures launch on desktop and server first, using the same CPU chiplets in both segments. The monolithic mobile processors come later. But every year, they increment the model numbers of their mobile parts whether or not they have a new architecture, and the mobile parts are almost always announced at CES in January; that's simply how the laptop market functions.
Zen 5 desktop and server parts aren't here yet, so whatever 8000 series mobile parts they introduce at CES in January 2024 either won't be using Zen 5, or they'll be announcing at the beginning of the year but not shipping until fall at the earliest. Recent rumors suggest that their high-end monolithic mobile chip (a new product segment for them) has been delayed from late 2024 to early 2025.
Sometimes they do launches at and during CES so definitely keep an eye. The biggest benefit is that it prevents companies from course correcting their design if they announce and release at the same time. Intel gets rug pulled. Which seems to be the strategy for the last few years.
Also if the 8k is launched it is usually a small limited run of their basic chips. Desktop chips will deifnitely be later in the year.
Does cache help some things sure, but not a replacement for bandwidth. I have seen cases where the zen 3 X3D wins against the zen 4 without X3D, especially on simulations, emulators, flight sims, etc.
Sadly there seems to be movements all over the industry to depend on caching over bandwidth and generally they win benchmarks, but often lose on real world use. Intel N100 and similar embedded type chips for appliances/routers went from 128 bit wide to 64. The M2 Pro has 256 bit wide memory and the M3 pro went to 192 bit wide. The Nvidia 3060 Ti 256 bit wide memory has 4060 Ti went down to 128 bit wide.
Sure the average performance is often higher (when cache friendly), but the performance is much more variable as the ratio between in cache performance and out of cache performance gets larger. GPUs with more cache and less bandwidth tend to get pickier about which games they run well on and the 1% lows get slower, which makers stuttering worse.
It's sad that from that for normal desktops from a few $100 to a few $1000 all have 128 bit wide memory interfaces, unless you buy a mac and you can get 128, 256, 512, or 1024 bit wide.
> True, but then again you get 400GB/sec of bandwidth
For what, other than toy AI inference? More HEDT users would go with more RAM at DDR5 speeds than 400GB/sec of bandwidth. As you can see from Geekbench, the additional bandwidth has no real world implications for non-GPU use cases.
But again, just as with everything else it depends.
Simulations (games or HPC), emulation, and editing multiple streams of 8k videos.
Why do you say "toy AI inference"? Getting 800GB/sec to 128GB of memory (on a desktop) or 400GB/sec to 128GB/sec on a laptop is hard to beat. You have to spend crazy money to get a GPU with that much ram and normal desktops like the mentioned 7950x are going to be 4-8 x slower.
I've seen people get over 5 tokens/sec inference with 180B models, not what I'd call "toy AI inference".
Most intel processors with dual channel will easily get 90GB/sec of bandwidth just for the CPU; if you build a performance-oriented machine you are supposed to put a dedicated GPU with that.
The Apple silicon bandwidth is nothing special considering they must share it with a GPU. Dedicated GPU have as much or more bandwidth than Apple Silicon; and better than that, at the low end they are highly likely to have more RAM available.
When you compare prices, even on laptops, it is likely that the competing windows PC will have both more bandwidth and more RAM available. Because 8GB of RAM in a dedicated GPU is not very rare at the price MacBook Pros are sold at, and those systems have loads of CPU RAM on top of it.
Even with the update, a base M3 Pro only has 18GB of ram to share between CPU and GPU. If you let the GPU use 8GB like it would be able to in the windows laptop, suddenly you only have 10GB of system RAM. Which is going to be very limiting for many things.
In the end, every other OEM just gave up on bandwidth because it does not matter for most things. Even Apple somewhat acknowledged that by lowering the bandwidth available for most SKUs; so, there is that.
And stop quoting the higher-end bandwidth that is only available in extremely expensive SKUs :
- 800GB/s is only available in the Mac Studio Ultra version and the cost equivalent would be a PC with two Nvidia 4090 that would completely crush it both bandwidth and speed wise.
- 400GB/s is only available in at least 2K Mac Studio desktop or minimum 3K laptop. And it is nothing special compared to the available bandwidth dedicated GPUs have at this price.
The bandwidth on the lower SKUs is whatever because it is barely better than what Intel has always provided in their CPUs with integrated graphics. It is a bit better but nothing special and there exist competing products with just as much bandwidth, not that this fact is particularly relevant for the type of tasks this low/mid-range hardware is supposed to carry out.
Apple fanboys are so delusional its bordering insanity.
Pretty much. And the "advantage" does not even matter in the end since it cannot run anything worthwhile with those speeds.
The current Apple marketing material is hilariously old/bad ports of software that really are not the leader in their fields but also generally just work better/faster on a windows PC (sometimes even on Linux).
Which is why the quoted battery life is complete bullshit for Apple fanboys who never go out of the walled garden. If you start using the software other people use and need the battery life becomes a lot less impressive, because those software actually makes use of the hardware.
It was ok when Apple wasn't too bad of a deal from a hardware pricing standpoint (considering build quality and potential longevity thanks to ease of repair/upgrade) but now it is plain stupid.
As far as im concerned, unless you are developing for Apple platforms or really have to use Logic/Final Cut; it doesn't make sense to invest in an Apple computer at the moment, considering the price. They just don't offer anything worth the pricing delta no matter what the marketing and fanboys say.
Entry level machines are barely relevant, but with the upgrade pricing on RAM/storage, high end machines just don't matter in the grand scheme of things. I could buy 2 fully loaded PCs for one high end Mac Studio, there is no way it can make up the productivity for that difference, no matter how little power it consumes (power that is cheap, and conveniently only gets used when actual work is being done...)
doing a few sketchy conversions puts Metal compute performance closer in capability to an RTX 4070.
M3 Max is ~170% M1 Max per some graphic Apple released. That puts it at ~95% the performance of a 6800 XT on the Metal charts. RTX 4070 is ~95% the performance of the 6800 XT on the Vulkan charts.
As a developer my Go builds take ~5 seconds. I invested in very fast NVMes and coming from ~5 minute Java builds this is great. Though what I want is <1sec builds.
What I want is AMD/Intel moving to RISCV without the 50 years X86 baggage and the 30 years of ARM baggage.
Give me cache on the dies instead of legacy transistors, give me faster NVMe directly to memory.
OT: Yesterday, an Apple forum thread [1] complaining about the performance of the current M3 chips vs older generations got to the homepage of HN [2].
It was flagged and killed, I assume because it was somehow misleading.
As some who doesn't understand these issues, could some of the knowledgeable fellows here ELI5 why that thread is junk, how current GPU's compare to those from Intel's day, and how important is the M3 GPU for LLM related to tasks.
And maybe also - I see that the M3 is lower than M2 in.. "P-cores"? I don't understand what this means? What is the case where the M2 is better than the M3?
Roughly 20% faster than the original M1 Max, which scored ~2400 on single core.
As mentioned elsewhere it’s 10% faster than the M2 Max. This is really an impressive step up each year compared to what came before. If Apple continues this trend I’ll probably upgrade every 2 years.
Hmm, I was of the opposite impression. Not much of an upgrade. The big change was going to Mx, mainly in battery time / power usage. But no big need to upgrade if you have M1, for instance.
Anyone have insight as to whether these will be useful for ML tasks - mostly on pre-trained models?
I have bulk jobs that I like to do for things like transcription and I'd like to add summarization as well (Whisper + Llama basically). My GTX 1070 is fine for proof-of-concept, but I was planning to build a 4090 box for this stuff.
Nvidia is notorious for nerfing the RTX cards for ML tasks on RAM specifically. Is there any universe in which it makes sense to put my 4090 box budget ($4000+ all in) towards an updated MacBook? I can get a laptop with 128Gb RAM for $4600 and the convenience of not having to run things remote and deal with another OS would be a big win.
If you want to run ML tasks, a 4090 will smoke any laptop plain and simple.
You get to decide about convenience of having a laptop. I wouldn’t want to deal with the inconvenience of a non-nvidia stack.
Have you tried running remote via parsec (if you want the full desktop experience)? With proper internet connection you won't realize you are on a remote connection.
I will make a note to try Parsec. I've been using RDP with both devices connected to the same LAN over ethernet. The experience is very good, the biggest annoyance is that I have to deal with Windows.
Seems like the winning formula is to have some performance cores and some efficiency cores, but not too many and only 2 kinds. Meanwhile Qualcomm and Snapdragon mobile processors have 3-4 different kinds of cores and a third more of them in total. No wonder android phones with much larger batteries are getting so much worse battery life. And what’s funny is they end up being slower despite all that because their performance cores are forced to be clocked lower to make up for the higher power density, and most things on a phone won’t take advantage of those extra cores.
In terms of multi-core
- M3 (8-core): 11500, 10% higher than M2, in the middle of M2 and M2 Pro 10-core
- M3 Max (16-core): 20600, 50% higher than M2 Max (12-core), same as i9-13900KS
For comparison, the Snapdragon X Elite (at 23W) announced recently scores 2800 in single-core and 14000 in multi-core, same as the M2 Max.
In terms of graphics (OpenCL):
- M3: 30000, 10% higher than M2, same as AMD Radeon 780M (currently best iGPU in x86 land)
- M3 Max: 94000, 10% higher than M2 Max, a bit above the RTX 2070.