TLDR: For now, everyone is sold out of tokens: a ridiculous percentage of every Nvidia card is selling every token it's generating, every token generated by Google's TPUs sells, Amazon's Trainium, Groq's silicon giants (they don't really name their chips and the chips are like 30 cm in diameter, so let's go with giants), ... and Nvidia B200s are the cheapest way, by far, to generate tokens and are being sold at something like double the speed they can be produced.
Once the AI craze slows, the most surprising thing is going to happen: Nvidia sales will go up. Why? Because it's older cards that will get priced out first, and it will become a matter of survival for datacenter companies to fill datacenters that currently run older hardware with the newest Nvidia hardware ...
That's the bull case. Under unlimited token demand, Nvidia wins big. Under slowing token demand, Nvidia actually wins bigger, for a while, and only then slows. For now, everything certainly seems to indicate demand is not slowing. Ironically, under slowing demand, it's China that will suffer in this market.
And the threat? Well it is possible to beat Nvidia's best cards in intelligence, in usefullness, because the human mind is doing it, on 20W per head (200W for the "full machine"). And long story short: we don't know how, but obviously it's possible. Someone might figure it out.
“Nvidia wins either way” assumes the game stays the same — but Google, Amazon, and Meta aren’t building custom silicon to beat Nvidia on price, they’re building it to never need Nvidia at all. The moat isn’t the chips, it’s CUDA lock-in, and every major player is racing to break it.
I would argue it just means the game doesn't suddenly change all at once. If the game changes slowly, in the short term it'll be good for Nvidia. It will take quite a while for it to affect Nvidia.
Google, Amazon and Meta are to some extent solving the wrong problem, or not solving the whole problem. They're designing chips ... which they can't build because they don't have the infrastructure and don't have as long running contracts as Nvidia does. They can't match Nvidia even at 3nm, at 10nm ... Now, maybe they can go with Intel (though several have tried and given up), but ...
Nvidia GPUs are still at their core reliant on the PC architecture,
Inferencing on Nvidia cores will soon be like encoding a h265 stream on CPU.
I expect custom built TPUs will have progressively more and more advanced hardware acceleration where legacy aspects of the CUDA architecture will eventually limit their innovation without architecture changes (pci-e, nvme bus, cpu interrupts, reliance on system ram for index tables, etc
..) which fill their moat and level the playing field for google/Amazon/eventually apple
TLDR: For now, everyone is sold out of tokens: a ridiculous percentage of every Nvidia card is selling every token it's generating, every token generated by Google's TPUs sells, Amazon's Trainium, Groq's silicon giants (they don't really name their chips and the chips are like 30 cm in diameter, so let's go with giants), ... and Nvidia B200s are the cheapest way, by far, to generate tokens and are being sold at something like double the speed they can be produced.
Once the AI craze slows, the most surprising thing is going to happen: Nvidia sales will go up. Why? Because it's older cards that will get priced out first, and it will become a matter of survival for datacenter companies to fill datacenters that currently run older hardware with the newest Nvidia hardware ...
That's the bull case. Under unlimited token demand, Nvidia wins big. Under slowing token demand, Nvidia actually wins bigger, for a while, and only then slows. For now, everything certainly seems to indicate demand is not slowing. Ironically, under slowing demand, it's China that will suffer in this market.
And the threat? Well it is possible to beat Nvidia's best cards in intelligence, in usefullness, because the human mind is doing it, on 20W per head (200W for the "full machine"). And long story short: we don't know how, but obviously it's possible. Someone might figure it out.