The scale they are quoting at 100,000 chip clusters and 65 exaflops seems imposs...

tempay · on Nov 28, 2023

What makes you think it's 800W per chip?

buildbot · on Nov 28, 2023

It's about what the I though the H100 was, that's 700W actually. But even at say, 400W, that's 40MW of power. I guess some datacenters are built in the 40-100MW range from some quick googling, but I really doubt they actually can network 100,000 chips together in any sort of performant way, that's supercomputer level interconnect. I don't think most datacenters support highly interlinked network interconnect like this would need either.

tempay · on Nov 28, 2023

They have instances with 16 chips so I presume there are at least 16 chips per server. I'd also expect the power consumption to be more like 100-200W given they seem more like Google's TPUs than a H100.

For the interconnect I doubt this is their typical interconnect but it doesn't seem completely unreasonable. Even when not running massive clusters they'll still need the interconnect to pair the random collections of machines that people are using.

buildbot · on Nov 29, 2023

I don’t know - apparently they are watercooling this gen: https://www.servethehome.com/aws-graviton4-is-an-even-bigger...

You don’t watercool 200W chips typically, and you can in theory air cool 8x 800 watt nvidia h100s in a single system. These are also 4-5u systems!

16 chips in one node would be ambitious, I would expect the 16 chip offering to really be several closely located nodes in the same rack/nearby.

tempay · on Nov 29, 2023

> 16 chips in one node would be ambitious, I would expect the 16 chip offering to really be several closely located nodes in the same rack/nearby.

I'd expect it to be like Google's TPUs which have 4 "chips" in a "pod", attaching 4 of these pods to a single system doesn't seem unreasonable.

Looking at the corrosponding CPU and RAM of the available instance types it looks like they're using 32-core CPUs in dual socket systems.

buildbot · on Nov 29, 2023

Yeah that seems like a likely setup to me!

bluedino · on Nov 28, 2023

Per server? Our dual CPU intel servers take about 800-900W at full power

buildbot · on Nov 29, 2023

Per chip - they are are watercooling these babies: https://www.servethehome.com/aws-graviton4-is-an-even-bigger...

(Just like we had to do at microsoft for maia with the sidekick rack of just cooling: https://www.datacenterfrontier.com/machine-learning/article/...)

bradknowles · on Nov 29, 2023

Well, think of it this way -- individual 1U servers can easily consume 1000W, or 1kW. Put about forty of those in a single rack, and that's 40kW. Divide 80MW for the datacenter by 40kW per rack and that's not very many racks to comprise the entire datacenter, right?

buildbot · on Nov 29, 2023

That’s still 2000 racks, that’s not nothing?

bradknowles · on Nov 29, 2023

No, 2000 racks is not nothing.

But I would say that's a pretty small datacenter, wouldn't you?

seec · on Nov 29, 2023

The footprint for 2000 racks would be over 1000m2; when you add the necessary spacing as well as supplementary utilities (power/networking) that probably means double that footprint.

I guess at the scale those companies are operating it's not that big, but that's still quite a large building !

buildbot · on Nov 29, 2023

I actually have no idea to be honest!