Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I know there's a lot of anecdotal evidence and some fairly clear explanations for why `us-east-1` can be less reliable. But are there any empirical studies that demonstrate this? Like if I wanted to back up this assumption/claim with data, is there a good link for that, showing that us-east-1 is down a lot more often?


The unreliability claim is driven by two factors.

1. When aws deploys changes they run through a pipeline which pushes change to regions one at a time. Most services start with us-east-1 first.

2. us-east-1 is MASSIVE and considerably larger than the next largest region. There's no public numbers but I wouldn't be surprised if it was 50% of their global capacity. An outage in any other region never hits the news.


> a pipeline which pushes change to regions one at a time

> When AWS deploys updates to its services, deployments to Availability Zones in the same Region are separated in time to prevent correlated failure.

https://docs.aws.amazon.com/whitepapers/latest/aws-fault-iso...


> 1. When aws deploys changes they run through a pipeline which pushes change to regions one at a time.

This is true.

> Most services start with us-east-1 first.

This is absolutely false. Almost every service will FINISH with the largest and most impactful regions.


Agreed. Most services start deployments on a small number of hosts in single AZs in small less-known regions, ramping up from there. In all my years there I don’t recall “us-east-1 first”.


Each AWS service may choose different pipeline ordering based on the risks specific to their architecture.

In general:

You don't deploy to the largest region first because of the large blast radius.

You may not want to deploy to the largest region last because then if there's an issue that only shows up at that scale you may need to roll every single region back (divergent code across regions is generally avoided as much as possible).

A middle ground is to deploy to the largest region second or third.


I don't think its fair to dismiss a lot of anecdotal evidence, much of human experience is based off of it, and just being anecdotal doesn't make it incorrect. For those of us using aws for the last decade, there have been a handful of outages that are pretty hard to forget. Often those same engineers have services in other regions - so we witness these things going down more frequently in us-east-1. Now can I say definitively that us-east-1 goes down the most - nope. Have I had 4 outages in us-east-1 I can remember and only 1-2 us-west-2, yep.


Where are you getting the sense that anecdotal evidence is being dismissed?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: