The new large model uses DeepseekV2 architecture. 0 mention on the page lol. It'...

Jackson__ · 2025-12-02T21:01:54 1764709314

So they spent all of their R&D to copy deepseek, leaving none for the singular novel added feature: vision.

To quote the hf page:

>Behind vision-first models in multimodal tasks: Mistral Large 3 can lag behind models optimized for vision tasks and use cases.

Ey7NFZ3P0nzAe · 2025-12-02T21:12:29 1764709949

Well, behind "models" not "langual models".

Of course models purely made for image stuff will completely wipe it out. The vision language models are useful for their generalist capabilities

make3 · 2025-12-02T19:50:03 1764705003

Architecture difference wrt vanilla transformers and between modern transformers are a tiny part of what makes a model nowadays

halJordan · 2025-12-02T21:43:22 1764711802

I don't think it's fair to demand everything be open and then get mad when they open-ness is used. It's an obsessive and harmful double standard.