Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I haven’t heard about CPUs failing that often, though. Usually it’s some other part of the server that dies, like the motherboard. In that light, the grandparent’s question is still valid — normally these servers that “died” would be torn apart and the non-broken parts refurbished and resold on the aftermarket.

Is AWS doing that?



No, but I spat out a vague answer rather quickly and was too flippant ("maybe you could do something"), so it's a fair question. Realistically, even the motherboard design, including landing pad on the PCB and boot sequence of the chip, from the root of trust to initial firmware bringup, is going to be custom on systems like Graviton4. For example, these use the Nitro system, which exists as hardware, and it is a key point of the whole design. And AWS designs their services to even resist some level of operator compromise, e.g. an operator trying to exfiltrate secrets from the Nitro system, so the amount of people who can exert influence there is extremely limited. Individual parts like the CPU are as good as useless without the chassis (and power supply, and attached switch equipment) they belong to. Even if you had the whole thing, you might very well not be able to do anything with it, making it as good as a brick.

Even if Nitro was out of the picture or whatever, and you just had the raw package -- it's not like you can really make a motherboard magically from thin air for these devices based on just the CPU pinout, and the tolerances just for power delivery and memory buses are pretty tight, not to mention a gazillion other things.

More broadly, designing compute that is used purely in-house versus large-scale high-volume COTS designs, through e.g. OEM partners, is literally a difference of years and tens or hundreds of millions of dollars. Support, documentation, supply chain relationships, etc. These take a lot of money to do right, and when you buy servers, part of the purchase goes to those departments, to fund them. Most places are better off just talking to Supermicro if they actually need servers, for that reason. But hyperscalers literally save ridiculous amounts of money by doing it themselves and not doing the other things Supermicro does, like OEM work, support, and NRE on generalist designs that are useful outside to third parties.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: