My intuition tells me that awk and other text processing tools won’t scale well to a GPGPU. I might be wrong though. Is there any example of something like grep etc working well on a GPU?
That's very surprising, assuming they didn't doctor the results by choosing the workload all too carefully.
I would have expected a GPU regex too perform much worse, given that regex matching is probably very branchy code. Especially since computation is generally way faster than IO.
What specifically are you referring to? Branching on GPUs has not substantially changed for a decade. If all threads on warp skip a branch, it's free. If one takes it, the rest also pay the penalty and mask out the vector units.
What's at play here is that the needle in a haystack search of regex is going to spend almost all its time 0 or 1 deep in the state machine, so the threads skip the branches and the penalty is not large.
When this concept was previously posted on HN, the top comment pointed out how it's the pipes that are inefficient when working on GPUs due to copying data from the CPU to the GPU and vice-versa for each command & pipe pair. I think even if we don't get pipes per se but I think we could expose GPGPU resources in a unix-like way but I suppose it depends on driver support.
But as to your question, a lot of traditional tools like grep, sed, aren't really suited for the GPU unless you are running them on a lot of files at once.
I agree. I get the sense that a lot of these old text munging tools are seriously fast, and are mostly IO-bound. It would surprise me if moving chunks of text back and forth to a GPU would be very efficient.
Just counterparts to all the favourites that utilise the GPU ... imagine GPU awk.