Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How large of a model can you use with your 128GB M3? Anything you can tell would be great to hear. Number of parameters, quantization, which model, etc.


I'm running 123B parameter Mistral Large with no issues. Larger models will run, too, but slowly. I wish Ollama had support for speculative decoding.


Thanks for the reply. Is that quantized? And what's the bit size of the floating point values in that model (apologies if I'm not asking the question correctly).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: