I think this is more about mechanistic understanding vs fundamental insight kind of situation. The linear algebra picture is currently very mechanistic since it only tells us what the computations are. There are research groups trying to go beyond that but the insight from these efforts are currently very limited.
However, the probabilistic view is very much clearer. You can have many explorable insights, both potentially true and false, by jıst understanding the loss functions, what the model is sampling from, what is the marginal or conditional distributions are and so on. Generative AI models are beautiful at that level. It is truly mind blowing that in 2025, we are able to sample from the megapixel image distributions conditioned on the NLP text prompts.
If you dig ml/vision papers from old, you will see that formulation-wise they actually did, but they lacked the data, compute, and the mechanistic machinery provided by the transformer architecture. The wheels of progress are slow and requires many rotations to finally reach somewhere.