You generally feed a Storm topology with a queueing system at the source. For ex...

You generally feed a Storm topology with a queueing system at the source. For example, we use Kestrel to feed Storm topologies. There's a setting called "TOPOLOGY_MAX_SPOUT_PENDING" which controls how many tuples can be pending on any given spout task at a time. This is more than sufficient for controlling bursts of data. If things back up, they will queue on the queueing server (s). In the future (not for a few months, at least), Storm will have auto-scaling where it will automatically scale the topology to the data.

If a tuple fails to be processed, the tuple(s) that triggered that tuple are replayed from the spout. See https://github.com/nathanmarz/storm/wiki/Guaranteeing-messag... for more info on that. In that sense, Storm is an at-least-once delivery system (but messages are sent more than once only in failure scenarios). It is up to you to architect your systems to handle this. The approach I take is to build systems using a hybrid of batch processing (Hadoop) and realtime processing (Storm). With batch processing, you can run idempotent functions even with duplication, which lets you correct what's happening at the realtime layer. In that sense, Hadoop and Storm are extremely complementary. Here are some slides from a presentation I gave about this technique: http://www.slideshare.net/nathanmarz/the-secrets-of-building...

Fields grouping uses consistent hashing underneath. So if you redeploy with more parallelism, it scales naturally and easily.

If a supervisor dies, it starts back up like nothing happened. Most notably, nothing happens to the worker processes. All state is kept either in disk or in Zookeeper. All daemons in Storm are fail-fast, and the Supervisor uses kill -9's to kill workers, so that makes things extremely robust.

I hope that answered your questions!