Agreed, but of course the massive write parallelism and fault tolerance of DBMS like Cassandra comes at the cost of dropping ACID, which may cause a lot of complexity elsewhere in the system. It also comes at the cost of limiting the types of queries you can perform without resorting to procedural code (at least in the case of column family based architectures). In other words, it comes at the cost of productivity.
So, even if sharding is not easy, you can do quite a lot of it in the time you save by not having to code around the limitations of most noSQL DBMS.
The day will come when RDBMS are secret productivity weapon of smaller companies that don't feel they have to act like they were Google.
That said, there are very good reasons not to use RDBMS in cases where the data model or specific access patterns just don't fit. But in most of those cases, I find that using in memory data structures combined with file system storage or something like BerkeleyDB is a much better fit than any server based DBMS.
Do remember though that, as discussed here recently, most RDBMSs do not act in a fully ACID compliant way by default. IIRC it is providing a complete isolation guarantee often also provides a hefty performance hit so compromises are made in this area unless you explicitly tell it to be as careful as it can. I imagine this can cause quite a nightmare for master<->master replication.
There are a lot of people using "noSQL" options for the wrong reasons (such as to be buzzword compliant, or because they don't understand SQL), but there are issues that traditional RDBMSs have that stick-in-the-muds like me (who cringe at the phrase "eventual consistency" should be more aware of than we generally are.
Making the right choices about your data storage can be hard.
That said, there are very good reasons not to use RDBMS in cases where the data model or specific access patterns just don't fit. But in most of those cases, I find that using in memory data structures combined with file system storage or something like BerkeleyDB is a much better fit than any server based DBMS.
I want to go into this just a little. The issue here is just that there are tradeoffs. Understanding these tradeoffs is what good design is about.
RDBMS's excel at one thing: presenting stored data for multiple purposes. This is what relational algebra gets you and while SQL is not relational algebra it is an attempt to bridge relational algebra with a programming language.
An RDBMS will never be as fast performance-wise as a basic object storage system. However, what it buys you is flexibility for that part of the data that needs to be presented for multiple uses.
So the sort of access pattern that doesn't fit is something like an LDAP server. Here you have a well-defined method for integrating the software, and the use cases for ad hoc reporting usually aren't there.
On the other hand, you have something like an ERP that allows file attachments to ERP objects. Even though the data model doesn't fit exactly, you can't do this gracefully with a NoSQL solution.
So I suggest people think about the need for flexibility in reporting. the more flexibility required, the more important an RDBMS becomes.
Additionally if you have multiple applications hitting the same database, an RDBMS is pretty hard to replace with any other possible solution.
Codd emphasised the relational model as being able to change the underlying storage representation without breaking apps.
Will this eventually be a problem for NoSQL? Or, is the scalability worth the sacrifice?
Or, does NoSQL typically have only one (main) app, so making it work with a specific storage representation is not a big deal? The relational use-case was many different apps, with different versions, needing different access paths to the data. But if you just have one known set of access paths (like a REST URI), you can just design the DB for that. Hierarchical databases worked well when just one DB, one app; they just weren't very flexible.
there's a project I have recently come across called Postgres-XC which would solve the problem you describe quite nicely. Basically it's Teradata-style clustering based on PostgreSQL. It isn't full-featured yet (version 0.9.6 is the current version, and it doesn't support things like windowing functions), but it looks extremely interesting in this space.
So, even if sharding is not easy, you can do quite a lot of it in the time you save by not having to code around the limitations of most noSQL DBMS.
The day will come when RDBMS are secret productivity weapon of smaller companies that don't feel they have to act like they were Google.
That said, there are very good reasons not to use RDBMS in cases where the data model or specific access patterns just don't fit. But in most of those cases, I find that using in memory data structures combined with file system storage or something like BerkeleyDB is a much better fit than any server based DBMS.