The scale they are quoting at 100,000 chip clusters and 65 exaflops seems impossible. At 800W per chip, that's 80MW of power! Unless they literally built an entire DC of these things, nobody is training anything on the entire cluster at once. It's probably 10-20 separate datacenters being combined for marketing reasons here.
It's about what the I though the H100 was, that's 700W actually. But even at say, 400W, that's 40MW of power. I guess some datacenters are built in the 40-100MW range from some quick googling, but I really doubt they actually can network 100,000 chips together in any sort of performant way, that's supercomputer level interconnect. I don't think most datacenters support highly interlinked network interconnect like this would need either.
They have instances with 16 chips so I presume there are at least 16 chips per server. I'd also expect the power consumption to be more like 100-200W given they seem more like Google's TPUs than a H100.
For the interconnect I doubt this is their typical interconnect but it doesn't seem completely unreasonable. Even when not running massive clusters they'll still need the interconnect to pair the random collections of machines that people are using.
Well, think of it this way -- individual 1U servers can easily consume 1000W, or 1kW. Put about forty of those in a single rack, and that's 40kW. Divide 80MW for the datacenter by 40kW per rack and that's not very many racks to comprise the entire datacenter, right?
The footprint for 2000 racks would be over 1000m2; when you add the necessary spacing as well as supplementary utilities (power/networking) that probably means double that footprint.
I guess at the scale those companies are operating it's not that big, but that's still quite a large building !