The new large model uses DeepseekV2 architecture. 0 mention on the page lol.
It's a good thing that open source models use the best arch available. K2 does the same but at least mentions "Kimi K2 was designed to further scale up Moonlight, which employs an architecture similar to DeepSeek-V3".
---
vllm/model_executor/models/mistral_large_3.py
```
from vllm.model_executor.models.deepseek_v2 import DeepseekV3ForCausalLM
class MistralLarge3ForCausalLM(DeepseekV3ForCausalLM):
```
"Science has always thrived on openness and shared discovery." btw
Okay I'll stop being snarky now and try the 14B model at home. Vision is good additional functionality on Large.
It's a good thing that open source models use the best arch available. K2 does the same but at least mentions "Kimi K2 was designed to further scale up Moonlight, which employs an architecture similar to DeepSeek-V3".
---
vllm/model_executor/models/mistral_large_3.py
```
from vllm.model_executor.models.deepseek_v2 import DeepseekV3ForCausalLM
class MistralLarge3ForCausalLM(DeepseekV3ForCausalLM):
```
"Science has always thrived on openness and shared discovery." btw
Okay I'll stop being snarky now and try the 14B model at home. Vision is good additional functionality on Large.