I find it's much easier to understand why 3 CNN layers applied on the raw image can learn convolutions that are relevant to solving my task... than it is to understand what is being done by "artisanal hand-engineered authentic patches" with a couple obscure dimensionality reduction algorithms thrown in, and an SVM on top.
Expert systems were also incredibly hard to build and debug, and weren't nearly as useful as ML systems are nowadays.
While performance might be important, at least the features were easy to understand by non experts. :)