I think the dream of parallelism in 1999 was that CPUs aren't going to get much ...

I think the dream of parallelism in 1999 was that CPUs aren't going to get much faster in terms of clock cycles, so how will algorithms run on more data. This is a bit dumbed-down, but things like sorting in parallel.

It turns out that local data didn't quite scale like that, we got more ram and SSDs, and that coordination usually makes small-scale parallel algorithms prohibitively expensive. Where parallelism does work today is some flavor of SIMD like vector instructions, parallel matrix multiplication on a GPU, or map-reduce.