I would also broadly agree that the overuse of statistical language and explanat...

I would also broadly agree that the overuse of statistical language and explanations is probably more driven by historical trends in NLP. I was always more interested in computer vision (including segmentation) and even deep regression. Especially in the case of deep regression, with the absence of a softmax and the ease of constructing task-specific custom loss functions (or like you say, the hinge loss example), it always seemed to me pretty clear none of this was all ever really particularly statistical in the first place.

I will definitely check out those RAAM and LRAAM papers, thanks for the references. You definitely seem to have a more rich historical knowledge than I do on these topics.