> If OpenAI or Anthropic could squeeze the same output out of smaller GPUs and s...

cubefox · 2025-11-07T02:30:04 1762482604

> > If OpenAI or Anthropic could squeeze the same output out of smaller GPUs and servers they'd be doing it for themselves.

> First, they do this; that's why they release models at different price points.

No, those don't deliver the same output. The cheaper models are worse.

> It's also why GPT-5 tries auto-routing requests to the most cost-effective model.

These are likely the same size, just one uses reasoning and the other doesn't. Not using reasoning is cheaper, but not because the model is smaller.

gunalx · 2025-11-07T09:12:46 1762506766

But they also squesed a 80% cut in O3 at some point, supposedly purely on inference or infra optimization

anabis · 2025-11-11T00:48:19 1762822099

> delivering 97% of the performance at 10% of the cost is a distraction.

Not if you are running RL on that model, and need to do many roll-outs.