I'm trying to find out about that as well as I'm considering a local LLM for som... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		sgt 10 months ago \| parent \| context \| favorite \| on: Llama.cpp AI Performance with the GeForce RTX 5090... I'm trying to find out about that as well as I'm considering a local LLM for some heavy prototyping. I don't mind which HW I buy, but it's on a relative budget and energy efficiency is also not a bad thing. Seems the Ultra can do 40 tokens/sec on DeepSeek and nothing even comes close at that price point.

hnfong 10 months ago [–]

The DeepSeek R1 distilled onto Llama and Qwen base models are also unfortunately called “DeepSeek” by some. Are you sure you’re looking at the right thing?

The OG DeepSeek models are hundreds of GB quantized, nobody is using RTX GPUs to run them anyway…

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact