Is this going to need 1x or 2x of those RTX PRO 6000s to allow for a decent KV f... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		alexellisuk 24 days ago \| parent \| context \| favorite \| on: Qwen3-Coder-Next Is this going to need 1x or 2x of those RTX PRO 6000s to allow for a decent KV for an active context length of 64-100k? It's one thing running the model without any context, but coding agents build it up close to the max and that slows down generation massively in my experience.

redrove 24 days ago | [–]

I have a 3090 and a 4090 and it all fits in in VRAM with Q4_0 and quantized KV, 96k ctx. 1400 pp, 80 tps.

segmondy 24 days ago | [–]

1 6000 should be fine, Q6_K_XL gguf will be almost on par with the raw weights and should let you have 128k-256k context.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact