Given past experience with other MPI-based software, I'm not sure that Watson would scale without extensive retooling. MPI tends to be extremely chatty (MPI - Message Passing Interface), we ran ours off a hypercube (network topology) switched Infiniband setup. MPI depends on broadcast/scatter/gather semantics. This squarely lands in the 'what if' category for now.
Infiniband sucks. IBM's or Cray's proprietary interconnect will scale your code because it removes the chatter. On a Cray our MPI latency is 20x better and has almost no discrepancy between PEs.
Interesting, I went to a super computer conference in Germany last week and was curious to see many papers using 10GbE and vendors talking about infiniband. Can you point me to any resources? Also have you any insight as to what the current thinking is about Hadoop clusters - are people making the move to 40GbE to try and get good throughput or is it pointless. Our tiny 8 node cluster has recently got in a fluster due to having a 1GbE switch, an obvious fix is to get a 10GbE one - but will this help?
MPI programs and Hadoop will scale until the MPI latency becomes too high to effectively perform a data swap. When you look at the profiler on a good machine you see that 25% of the time is spent in MPI_recieve. When the time spent in MPI_receive goes up you are done - the task simply can't scale. Vendors like Cray and IBM sell machines that have low MPI latency. This happens in both hardware and software. The interconnect is fast (I think Intel baught it) and the MPI layer performs optimization for all the traffic. OpenMPI doesn't even come close to optimizing the traffic. Ethernet doesn't do direct DMA likes the propriety interconnects - this adds latency and jitter. I don't believe the Ethernet strategy can scale for such applications, primarily because the time to complete the step is the highest latency on the network. If one node jitters and takes 100micros, it doesn't matter much that the rest of the guys took 20micros. Anyways, benchmarks would help.
The smallest machines Cray sells cost about $500,000. If you want scalability you gotta pay. 8 nodes isn't a real machine.