Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's certainly not ideal. But it's not unusual to spend a lot more time on making the runtime very redundant, but less time/money on dashboards and configuration change underpinnings. Doesn't work well in this case since it kills invalidating cache items for customers.

The comments seem to imply that having a redundant way to refresh the page cache, even if it were global/domain versus page, would be an okay backup for many.



I agree that first priority would be data integrety (which would be the runtime). But a large part of the CF experience of a CF customer would be the availability of their management APIs/dashboards, and that would be another part to optimize for.

I'm really suprised that they hosted all those non-vital but still quite critical services in just one DC, or somehow had one DC as a single point of failure. Network issues happen "regular enough" to want to protect against that, or at least have mitigations available.


To be fair, you have these latent single points of failures even in the most resilient distributed systems.

Such as S3. The bucket names are globally unique, which means that their source of truth is in a single location. (Virginia, IIRC.)

Now... a small thought exercise. If I wanted to take down a Cloudflare datacenter and I had access to a few suitably careless remote hands, I'd take out the power supplies to the core routers, and while the external network is out of commission, power down the racks where they have their PXE servers. That should keep anything, within the DC, from being unable to recover on its own.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: