> run locally for agentic coding. Nowadays I mostly use GPT-OSS-120b for this Wh...

embedding-shape · 2025-12-09T19:11:25 1765307485

RTX Pro 6000, ends up taking ~66GB when running the MXFP4 native quant with llama-server/llama.cpp and max context, as an example. Guess you could do it with two 5090s with slightly less context, or different software aimed at memory usage efficiency.

kristianp · 2025-12-09T22:06:57 1765318017

That has 96GB GDDR7 ECC, to save people looking it up.

fgonzag · 2025-12-09T19:04:50 1765307090

The model is 64GB (int4 native), add 20GB or so for context.

There are many platforms out there that can run it decently.

AMD strix halo, Mac platforms. Two (or three without extra ram) of the new AMD AI Pro R9700 (32GB of RAM, $1200), multi consumer gpu setups, etc.

FuckButtons · 2025-12-09T21:01:51 1765314111

Mbp 128gb.