> consider the fact that both Gemini and OpenAI got gold medal level performance
Yet ChatGPT 5 imagines API functions that are not there and cannot figure out basic solutions even when pointed to the original source code of libraries on GitHub.
Which is why you run it in a coding agent loop using something like Codex CLI - then it doesn't matter if it imagines a non-existent function because it will correct itself when it tries to run the code.
Can you expand on "cannot figure out basic solutions even when pointed to the original source code of libraries on GitHub"? I have it do that all the time and it works really well for me (at least with modern "reasoning" models like GPT-5 and Claude 4.)
As a human, I sometimes write code that does not compile first try. This does not mean that I am stupid, only that I can make mistakes. And the development process has guardrails against me making mistakes, namely, running the compiler.
I feel they are mutually inclusive! I don’t think you can meaningfully create new things if you must always be 100% factually correct, because you might not know what correct is for the new thing.
True. But if the system is implemented by a country this could be implemented using the law system and insurances.
For example, when each transaction is done, both parties might keep a cryptographic proof which they are required to submit once they are online again.
Failing to submit could result in a small fine (to encourage submission) and double spending which can then be detected could result in a large fine (or even a prison sentence), for example.
There is, perhaps, a privacy issue, just like with blockchain. But it's not more of an issue than online transactions.
I didn't say you need a blockchain.
I just said a cryptographic protocol (mostly offline and unrelated to blockchain) would help to automatically and quickly detect and proof fraud.
The offline credit card system does not proof fraud but just has insurance.
I don't completely agree. Brand value is huge. Product culture matters.
But say you're correct, and follow the reasoning from there: posit "All frontier model companies are in a red queen's race."
If it's a true red queen's race, then some firms (those with the worst capital structure / costs) will drop out. The remaining firms will trend toward 10%-ish net income - just over cost of capital, basically.
Do you think inference demand and spend will stay stable, or grow? Raw profits could increase from here: if inference demand 8x, then oAI, as margins go down from 80% to 10%, would keep making $10bn or so a year in FCF at current spend; they'd decide if they wanted that to go into R&D or just enjoy it, or acquire smaller competitors.
Things you'd have to believe for it to be a true red queen's race:
* There is no liftoff - AGI and ASI will not happen; instead we'll just incrementally get logarithmically better.
* There is no efficiency edge possible for R&D teams to create/discover that would make for a training / inference breakaway in terms of economics
* All product delivery will become truly commoditized, and customers will not care what brand AI they are delivered
* The world's inference demand will not be a case of Jevon's paradox as competition and innovation drives inference costs down, and therefore we are close to peak inference demand.
Anyway, based on my answers to the above questions, oAI seems like a nice bet, and I'd make it if I could. The most "inference doomerish" scenario: capital markets dry up, inference demand stabilizes, R&D progress stops still leaves oAI in a very, very good position in the US, in my opinion.
The moat, imo, is mostly the tooling on top of the model. ChatGPT's thinking and deep research modes are still superior to the competition. But as the models themselves get more and more efficient to run, you won't necessarily need to rent them or rent a data center to run them. Alibaba's Qwen mixture of experts models are living proof that you can have GPT levels of raw inference on a gaming computer right now. How are these AI firms going to adapt once someone is able to run about 90% of raw OpenAI capability on a quad core laptop at 250-300 watts max power consumption?
I think one answer is that they'll have moved farther up the chain; agent training is this year, agent-managing-agents training is next year. The bottom of the chain inference could be Qwen or whatever for certain tasks, but you're going to have a hard and delayed time getting the open models to manage this stuff.
Futures like that are why Anthropic and oAI put out stats like how long the agents can code unattended. The dream is "infinite time".
Huge brand moat. Consumers around the world equate AI with ChatGPT. That kind of recognition is an extremely difficult thing to pull off, and also hard to unseat as long as they play their cards right.
"Brand moat" is not an actual economic concept. Moats indicate how easy/hard it is to switch to a competitor. If OpenAI does something user-adversarial, it takes two seconds to switch to Anthropic/Gemini (the exception being Enterprise contracts/lock-in, which is exactly why AI companies prioritize that). The entire reason that there are race-to-the-bottom price wars among LLM companies is that it's trivial for most people to switch to whatever's cheapest.
Brand loyalty and users not having sufficient incentive by default to switch to a competitor is something else. OpenAI has lost a lot of money to ensure no such incentive forms.
Moats, as noted in Google's "We Have no Moat, and Neither Does OpenAI" memo that made the discussion of moats relevant in AI circles, has a specific economic definition.
Switching costs only make sense to talk about for fully online businesses. The "switching cost" for McDonalds depends heavily on whether there's a Burger King nearby. If there isn't then your "switching cost" might now be a 30 minute drive, which is very much a moat.
That's not entirely true. They have a 'infinite' product moat - no one can reproduce a big mac. Essentially every AI model is now 'the same' (queue debate on this). The only way they can build a moat is by adding features beyond the model that lock people in.
The concept of ‘moat’ comes out of marketing - it was a concept in marketing for decades before Warren Buffett coined the term economic moat. Brand moat had been part of marketing for years and is a fully recognized and researched concept. It’s even been researched with fMRIs.
You may not see it, but OpenAI’s brand has value. To a large portion of the less technical world, ChatGPT is AI.
Nokia's global market share was ~50% in smartphones back in 2007. Remember that?
Comparing "brand moat" in real-world restaurant vs online services where there's no actual barrier to changing service is silly. Doubly silly when they're free users, so they're not customers. (And then there are also end-users when OpenAI is bundled or embedded, e.g. dating/chatbot services).
McDonald's has lock-in and inertia through its franchisees occupying key real-estate locations, media and film tie-ins, promotions etc. Those are physical moats, way beyond a conceptual "brand moat" (without being able to see how Hamilton Wright Helmer's book characterizes those).
I wouldn’t necessarily say so.
I guess that’s what they are trying to « pulse » people and « learn » from you instead of just providing decent unbiased answers.
In Europe, most companies and Gov are pushing for either mistral or os models.
Most dev, which, if I understand it correctly, are pretty much the only customers willing to pay +100$ a month, will change in a matter of minutes if a better model kicks in.
And they loose money on pretty much all usage.
To me a company like Antropics which mostly focus on a target audience + does research on bias, equity and such (very leading research but still) has a much better moat.
I cannot say for your work but for classification steps and data structuring it’s quite accurate and this is with regular testing. I cannot speak for your work but for mine and folks in adjacent industries, LLM are fantastic and adding a lot of value to our workflows. You’re honestly holding on to this dead idea that LLM outputs are full of hallucinations. Throwaway account with throwaway comment.
It's "quite accurate" which is not acceptable for almost all relevant tasks is a business context. Somebody needs to manually check everything. Almost no time is saved.
Talking as someone who has built many small OpenAI integrations aka wrappers in business apps.
So I know you are trolling but let’s be real. Why would I share my business measurements with you on here? Of course my experience is an an anecdote and of course the measurements I use to determine accuracy are more in depth than “quite accurate”. What I can say is for structure we beat humans on accuracy and do it extremely quick. That LLM system runs at a lower cost than some of our more traditional system.
If all you have done is built wrappers I see you have bare scratches the engineering surface so that explains it.
Yet ChatGPT 5 imagines API functions that are not there and cannot figure out basic solutions even when pointed to the original source code of libraries on GitHub.