Which is why I suggested comparing to the EPYC 73F3, which is a 5950X clocked at 3.5 - 4 GHz, with 4x the L3$, 4x the memory bandwidth (if you don't overclock it), and 5~6x the IO bandwidth.
We know a 5950X is roughly on-par with an M1 Max (at least ignoring the latter's 2 efficiency cores).
If the occasional wins of the M1 Max are due to memory bandwidth, this should more-or-less turn the tables.
HBM's downside is that it requires many, many, many pins. Each channel is 1024-pins of communications (and more pins for power). In practice, the only thing that can make HBM work are substrates. (Typical chips have 4x to 6x HBM stacks, for well over 4096 pins to communicate, plus more pins for power / other purposes)
But HBM is among the lowest power technologies available. Turns out that clocking every pin at like 500MHz (while LPDDR5 is probably a 3200 MHz clock) saves a lot on power. Because DRAM has such high latency, the channel speed is more for parallelism more so than anything else. (DDR4 parallelizes RAM into 4-bank groups, each with 4-banks. All 16 can be accessed in parallel across the channel).
HBM just does this parallel access thing at a lower clock rate, to save on power. But spends way more pins to do so.
I'm guessing Apple will go with HBM3/4 before too long due to the lower power consumption and great performance.