How does one effectively use something like this locally with consumer-grade har...

simonw · 2025-11-07T04:29:30 1762489770

Once the MLX community get their teeth into it you might be able to run it on two 512GB M3 Ultra Mac Studios wired together - those are about $10,000 each though so that would be $20,000 total.

Update: https://huggingface.co/mlx-community/Kimi-K2-Thinking - and here it is running on two M3 Ultras: https://x.com/awnihannun/status/1986601104130646266

oceansweep · 2025-11-07T01:38:20 1762479500

Epyc Genoa CPU/Mobo + 700GB of DDR5 ram. The model is a MoE, so you don't need to stuff it all into VRAM, you can use a single 3090/5090 to hold the activated weights, and hold the remaining weights in DDR5 ram. Can see their deployment guide for reference here: https://github.com/kvcache-ai/ktransformers/blob/main/doc/en...

tintor · 2025-11-06T21:57:03 1762466223

Consumer-grade hardware? Even at 4bits per param you would need 500GB of GPU VRAM just to load the weights. You also need VRAM for KV cache.

CamperBob2 · 2025-11-07T02:18:23 1762481903

It's MoE-based, so you don't need that much VRAM.

Nice if you can get it, of course.