1) code + weights Apache 2.0 licensed (enough to run locally, enough to train, not enough to reproduce this version)
2) Low latency, meaning 11ms per token (so ~90 tokens/sec on 4xH100)
3) Performance, according to mistral, somewhere between Qwen 2.5 32B and Llama 3.3 70B, roughly equal with GPT4o-mini
4) ollama run mistral-small (14G download) 9 tokens/sec on the question "who is the president of the US?" (also to enjoy that the answer ISN'T orange idiot)
1) code + weights Apache 2.0 licensed (enough to run locally, enough to train, not enough to reproduce this version)
2) Low latency, meaning 11ms per token (so ~90 tokens/sec on 4xH100)
3) Performance, according to mistral, somewhere between Qwen 2.5 32B and Llama 3.3 70B, roughly equal with GPT4o-mini
4) ollama run mistral-small (14G download) 9 tokens/sec on the question "who is the president of the US?" (also to enjoy that the answer ISN'T orange idiot)