Is this going to need 1x or 2x of those RTX PRO 6000s to allow for a decent KV for an active context length of 64-100k?
It's one thing running the model without any context, but coding agents build it up close to the max and that slows down generation massively in my experience.
It's one thing running the model without any context, but coding agents build it up close to the max and that slows down generation massively in my experience.