Here's a use case: I was training a neutral net, and wanted to do some preprocessing (similar to image resizing, but without an existing C function). Inputs are batched, so the preprocessing is trivially parallelizable. I tried to multithread it in python, and got no speedup at all.
That was a really sad moment, and I've never felt good about python since.
That was a really sad moment, and I've never felt good about python since.