We run all our ec2 instances on ephemeral disk instances. Historically running on ebs was a great way to ensure application downtime.
Ephemeral drives mean you need to change the design of your application to be able to withstand full loss of machines. But it's really not that hard. A good replicated database (riak, cassandra) spanning multiple availability zones gets you 95% of the way there.
I wouldn't really trust doing so with less than say 40-50 servers, which C* can do quite well, but this is very much overkill as an expense for most people.
I'm not really sure what you mean. The number of nodes you need depends on the capacity you need and your replication factor.
Most C* deployments are in the 9-15 node range. You could safely deploy with 6 nodes and RF3 if 6 nodes gives you enough capacity in terms of both disk space and IOPS.
True... but given that you can't control how many nodes come down at once, (up to a third typically in azure's case), or what data is on which node, it's less predictable. If I have 24 nodes each across 2 data centers, with a replication factor of 10, then I would consider the data relatively safe... short of that, you have a significant chance of down time should an outage occur.
Let alone in the case of a more significant issue, and again, if the server goes down, that data is effectively lost, since the individual node will no longer be there. You have to have multiple sites and higher replication factors and a good backup system.
That said, Azure storage actually works very well, aside from the relatively recent azure outage.
Azure storage, each write is to two local and a third (optionally geo redundant) location... This is data that will still be there if my VM reboots... the local/scratch disk is gone if my VM reboots... if a significant number of those VMs reboot, you will lose data. I do hope that you are at least backing up those nodes regularly to persistant options.
Ephemeral drives mean you need to change the design of your application to be able to withstand full loss of machines. But it's really not that hard. A good replicated database (riak, cassandra) spanning multiple availability zones gets you 95% of the way there.