Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

We run all our ec2 instances on ephemeral disk instances. Historically running on ebs was a great way to ensure application downtime.

Ephemeral drives mean you need to change the design of your application to be able to withstand full loss of machines. But it's really not that hard. A good replicated database (riak, cassandra) spanning multiple availability zones gets you 95% of the way there.



I wouldn't really trust doing so with less than say 40-50 servers, which C* can do quite well, but this is very much overkill as an expense for most people.


I'm not really sure what you mean. The number of nodes you need depends on the capacity you need and your replication factor.

Most C* deployments are in the 9-15 node range. You could safely deploy with 6 nodes and RF3 if 6 nodes gives you enough capacity in terms of both disk space and IOPS.


True... but given that you can't control how many nodes come down at once, (up to a third typically in azure's case), or what data is on which node, it's less predictable. If I have 24 nodes each across 2 data centers, with a replication factor of 10, then I would consider the data relatively safe... short of that, you have a significant chance of down time should an outage occur.

Let alone in the case of a more significant issue, and again, if the server goes down, that data is effectively lost, since the individual node will no longer be there. You have to have multiple sites and higher replication factors and a good backup system.

That said, Azure storage actually works very well, aside from the relatively recent azure outage.


> ...with a replication factor of 10

RF 10 is insane overkill. There is probably someone out there doing it, but it is completely unnecessary for %99.999 real world deployments.

Getting back to your original comment, Azure persistent storage (or AWS, or Google) isn't giving you anything near RF10 * 2 DCs.


Azure storage, each write is to two local and a third (optionally geo redundant) location... This is data that will still be there if my VM reboots... the local/scratch disk is gone if my VM reboots... if a significant number of those VMs reboot, you will lose data. I do hope that you are at least backing up those nodes regularly to persistant options.


I don't know how Azure works but AWS reboots are fine with ephemeral disks.

We backup all sstables to s3. We've never needed to restore a node from an s3 backup in 5 years of running in production.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: