I don't think you really read the article since the author never said that parallelism is inefficient. He's saying, among other things, that decomposition (which you seem to favor) is not the right way to design parallel programs. In other words, if sequential objects are used to create sequential programs, it follows that parallel objects should be used to compose parallel programs. Sequential order thus becomes a consequence of signaling, i.e., communication between parallel objects.
The problem is coming up with these "parallel objects" through anything other than decomposition. For example, in the quick sort demo you linked to, his first step is to decompose quick sort into 5 sections which can run in parallel. But then in this article, he claims that decomposition is somehow "fake".
Message passing is a great idea, check out Erlang if you want more. Reading this guy's articles, it seems like he is half way towards something interesting, but has blown it all out of proportion. He's at war with his daemons, he even lists some of them on his "Enemies" page.
Very well put, I get the same impression about decomposition. The bottom line is each CPU core can execute some sequence of instructions. If there's more than one core, there can be more than one sequence. Threads are the best current abstraction for that, and this guy goes off on how threads are evil and all this. At the heart of it, there are no objects just data flowing from disk to ram to registers, calculations being performed in the core and data flowing back.
There are always monads in Haskell for really putting concurrency into the forefront...