I think it's worth double clicking here. Why did Google have significantly better search results for a long time?
1) There was a data flywheel effect, wherein Google was able to improve search results by analyzing the vast amount of user activity on its site.
2) There were real economies of scale in managing the cost of data centers and servers
3) Their advertising business model benefited from network effects, wherein advertisers don't want to bother giving money to a search engine with a much smaller user base. This profitability funded R&D that competitors couldn't match.
There are probably more that I'm missing, but I think the primary takeaway is that Google's scale, in and of itself, led to a better product.
Can the same be said for OpenAI? I can't think of any strong economies of scale or network effects for them, but maybe I'm missing something. Put another way, how does OpenAI's product or business model get significantly better as more people use their service?
You are forgetting a bit, I worked in some of the large datacenters where both Google and Yahoo had cages.
1) Google copied the hotmail model of strapping commodity PC components to cheap boards and building software to deal with complexity.
2) Yahoo had a much larger cage, filled with very very expensive and large DEC machines, with one poor guy sitting in a desk in there almost full time rebooting the systems etc....I hope he has any hearing left today.
3) Just right before the .com crash, I was in a cage next to Google's racking dozens of brand new Netra T1s, which were pretty slow and expensive...that company I was working for died in the crash.
Google grew to be profitable because they controlled costs, invested in software vs service contracts and enterprise gear, had a simple non-intrusive text based ad model etc...
Most of what you mention above was well after that model focused on users and thrift allowed them to scale and is survivorship bias. Internal incentives that directed capitol expenditures to meet the mission vs protect peoples back was absolutely a related to their survival.
Even though it was a metasearch, my personal preference was SavvySearch until it was bought and killed or what ever that story way.
In theory, the more people use the product, the more OpenAI knows what they are asking about and what they do after the first result, the better it can align its model to deliver better results.
A similar dynamic occurred in the early days of search engines.
I call it the experience flywheel. Humans come with problems, AI asistant generates some ideas, human tries them out and comes back to iterate. The model gets feedback on prior ideas. So you could say AI tested an idea in the real world, using a human. This happens many times over for 300M users at OpenAI. They put a trillion tokens into human brains, and as many into their logs. The influence is bidirectional. People adapt to the model, and the model adapts to us.. But that is in theory.
In practice I never heard OpenAI mention how they use chat logs for improving the model. They are either afraid to say, for privacy reasons, or want to keep it secret for technical advantage. But just think about the billions of sessions per month. A large number of them contain extensive problem solving. So the LLMs can collect experience, and use it to improve problem solving. This makes them into a flywheel of human experience.
Brands are incredibly powerful when talking about consumer goods.