Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> it means you can't run autoscaling, can't depend on spot instances which in an outage you can lose and can't replace

Yes, this is part of designing for reliability. If you use spot or autoscaling, you can't assume you will have high availability in those components. They're optimizations, like a cache. A cache can disappear, and this can have a destabilizing effect on your architecture if you don't plan for it.

This lack of planning is pretty common, unfortunately. Whether it's in a software component or system architecture, people often use a thing without understanding the implications of it. Then when AWS API calls become unavailable, half the internet falls over... because nobody planned for "what happens when the control plane disappears". (This is actually a critical safety consideration in other systems)



Sure, you can only use EC2, not use autoscaling or spot and instead just provision to your highest capacity needs, and not use any other AWS service that relies on dynamo as a dependency.

We still take some steps to mitigate control plane issues in what I consider a reasonable AWS setup (attempt to lock ASGs to prevent scale-down) but I place the control plane disappearing on the same level as the entire region going dark, and just run multi-region.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: