I'm trying to find out about that as well as I'm considering a local LLM for some heavy prototyping. I don't mind which HW I buy, but it's on a relative budget and energy efficiency is also not a bad thing. Seems the Ultra can do 40 tokens/sec on DeepSeek and nothing even comes close at that price point.
The DeepSeek R1 distilled onto Llama and Qwen base models are also unfortunately called “DeepSeek” by some. Are you sure you’re looking at the right thing?
The OG DeepSeek models are hundreds of GB quantized, nobody is using RTX GPUs to run them anyway…