Seeing OpenAI and Anthropic go different routes here is interesting. It is worth...

sebastiennight · 2025-02-27T20:57:36 1740689856

> * OpenAI seems to be betting that you'll need an ensemble of models with different capabilities, working as a single system, to jump beyond what the reasoning models today can do.

Seems inaccurate as their most recent claim I've seen is that they expect this to be their last non-reasoning model, and are aiming to provide all capacities together in the future model releases (unifying the GPT-x and o-x lines)

See this claim on TFA:

> We believe reasoning will be a core capability of future models, and that the two approaches to scaling—pre-training and reasoning—will complement each other.

eightysixfour · 2025-02-27T21:00:38 1740690038

From Sam's twitter:

> After that, a top goal for us is to unify o-series models and GPT-series models by creating systems that can use all our tools, know when to think for a long time or not, and generally be useful for a very wide range of tasks.

> In both ChatGPT and our API, we will release GPT-5 as a system that integrates a lot of our technology, including o3. We will no longer ship o3 as a standalone model.

You could read this as unifying the models or building a unified systems which coordinate multiple models. The second sentence, to me, implies that o3 will still exist, it just won't be standalone, which matches the idea I shared above.

sebastiennight · 2025-02-27T21:04:45 1740690285

Ah, great point. Yes, the wording here would imply that they're basically planning on building scaffolding around multiple models instead of having one more capable Swiss Army Knife model.

I would feel a bit bummed if GPT-5 turned out not to be a model, but rather a "product".

chippiewill · 2025-02-28T02:57:06 1740711426

Which is intriguing in a way, because the momentum I've seen across AI over the past decade has been increasing amounts of "end-to-end"

eightysixfour · 2025-02-27T21:10:53 1740690653

For me it depends on how the models are glued together. Connected by function calling and APIs? Probably meh...

Somehow working together in the same latent space? That could be neat.

ryukoposting · 2025-02-28T02:26:27 1740709587

> know when to think for a long time or not, and generally be useful for a very wide range of tasks.

I'm going to call it now - no customer is actually going to use this. It'll be a cute little bonus for their chatbot god-oracle, but virtually all of their b2b clients are going to demand "minimum latency at all times" or "maximum accuracy at all times."

tmpz22 · 2025-02-27T22:49:25 1740696565

I worry eliminating consumer choice will drive up prices for only a nominal gain in utility for most users.

eightysixfour · 2025-02-28T00:13:56 1740701636

I'm more worries they'll push down their costs by making it harder to get the reasoning models to run, but either would suck.

billywhizz · 2025-02-28T00:18:49 1740701929

or you could read it as a way to create a moat where none currently exists...

nomel · 2025-02-27T20:52:26 1740689546

> OpenAI seems to be betting that you'll need an ensemble of models with different capabilities, working as a single system, to jump beyond what the reasoning models today can do.

The high level block diagrams for tech always end up converging to those found in biological systems.

eightysixfour · 2025-02-27T21:07:17 1740690437

Yeah, I don't know enough real neuroscience to argue either side. What I can say is I feel like this path is more like the way that I observe that I think, it feels like there are different modes of thinking and processes in the brain, and it seems like transformers are able to emulate at least two different versions of that.

Once we figure out the frontal cortex & corpus callosum part of this, where we aren't calling other models over APIs instead of them all working in the same shared space, I have a feeling we'll be on to something pretty exciting.

throw234234234 · 2025-02-27T23:42:32 1740699752

> Anthropic appears to be making a bet that a single paradigm (reasoning) can create a model which is excellent for all use cases.

I don't think that is their primary motivation. The announcement post for Claude 3.7 was all about code which doesn't seem to imply "all use cases". Code this, new code tool that, telling customers that they look forward to what they build, etc. Very little mention of other use cases on the new model announcement at all. Their usage stats they published are telling - 80%+ or more of queries to Claude are all about code. i.e. I actually think while they are thinking of other use cases; they see the use case of code specifically as the major thing to optimize for.

OpenAI, given its different customer base and reach, is probably aiming for something more general.

IMO they all think that you need an "ensemble" of models with different capabilities to optimise for different use cases. Its more about how much compute resources each company has and what they target with those resources. Anthrophic I'm assuming has less compute resources and a narrower customer base so it economically may make sense to optimise just for that.

eightysixfour · 2025-02-27T23:47:56 1740700076

That's possible, my counter point would be that if that was the case Anthropic would have built a smaller reasoning model instead of doing a "full" Claude. Instead, they built something which seems to be flexible across different types of responses.

Only time will tell.

jstummbillig · 2025-02-27T23:25:05 1740698705

It can never be just reasoning, right? Reasoning is the multiplier on some base model, and surely no amount of reasoning on top of something like gpt-2 will get you o1.

This model is too expensive right now, but as compute gets cheaper — and we have to keep in mind, that it will — having a better base to multiply with will enable things that just more thinking won't.

eightysixfour · 2025-02-28T00:08:15 1740701295

You can try for yourself with the distilled R1's that Deepseek released. The qwen-7b based model is quite impressive for its size and it can do a lot with additional context provided. I imagine for some domains you can provide enough context and let the inference time eventually solve it, for others you can't.

protocolture · 2025-02-28T08:38:05 1740731885

Ever since those kids demo'd their fact checking engine here, which was just Input -> LLM -> Fact Database -> LLM -> LLM -> Output I have been betting that it will be advantageous to move in this general direction.

wongarsu · 2025-02-27T21:43:56 1740692636

Or the other way around: smaller reasoning models that can call out to GPT-4.5 to get their facts right.

eightysixfour · 2025-02-27T21:48:13 1740692893

Maybe, I’m inclined to think OpenAI believes the way I laid it out though, specifically because of their focus on communication and EQ in 4.5. It seems like they believe the large, non-reasoning model, will be “front of house.”

Or they’ll use some kind of trained router which sends the request to the one it thinks it should go to first.