The majority of companies listed I did not invest in, as well as in some cases areas I have not invested in at all
I am actively involved in AI and have been for a few years, so this both gives me insights and obviously conflicts. My hope is to write things that are useful vs just shilling as I would lose credibility otherwise
There is a decent (<50%, >20%) chance that frontier foundation models are less oligopoly like than it seems. The reason is that there are so many levers to pull, so much low hanging fruit.
For example:
* Read the Bloomberg GPT paper - they create their own tokenizer. For specialized domains (finance, law, medicine, etc) the vocabulary is very different and there is likely a lot to do here, where individual tokens really need to map to specific concepts and having a concept capture in several tokens makes it too hard to learn on limited domain data.
* Data - so many ways to do different data - more/less, cleaner, "better" on some dimension.
* Read the recent papers on different decoding strategies - there seems to be a lot to do here.
* Model architecture (SSM etc). If you speak to people who aren't even researchers, they have 10 ideas around architecture and some of them are decent sounding ideas - lots of low hanging fruit.
* System architecture - ie likely to see more and more "models" served via API which are actually systems of several model calls, and there is a lot to do here.
* Hardware, lower precision etc likely to make training much cheaper
It's reasonably likely (again, guessing < 50% > 20%) that this large set of levers to pull become ways to see constant leap-frogging for years and years. Or, at least they become choices/trade-offs rather than strictly "better".
I agree this is a potential outcome. One big question is generalizability versus niche models. For example, is the best legal model a frontier model + a giant context window + RAG? Or is it a niche model trained or fine tuned for law?
Right now at least people seem to decouple some measures of how smart the model is from knowledge base, and at least for now the really big models seem smartest. So part of the question is well is how insightful / synthesis centric the model needs to be versus effectively doing regressions....
Frontier model + rag is good when you need cross-discipline abilities and general knowledge, niche models are best when the domain is somewhat self contained (for instance, if you wanted a model that is amazing at role playing certain types of characters).
The future is model graphs with networked mixtures of experts, where models know about other models and can call them as part of recursive prompts, with some sort of online training to tune the weights of the model graph.
> The future is model graphs with networked mixtures of experts, where models know about other models and can call them as part of recursive prompts, with some sort of online training to tune the weights of the model graph.
What's the difference between that and combining all of the models into a single model? Aren't you just introducing limitations in communication and training between different parts of that über-model, limitations that may as well be encoded into the single model if they're useful? Are you just partitioning for training performance? Which is a big deal, of course, but it just seems like guessing the right partitioning and communication limitations is not going to be straightforward compared to the usual stupid "throw it all in one big pile and let it work itself out" approach.
The limitation is the amount of model you can fit on your hardware, and also sometimes information about one domain can incorrectly introduce biases in another which are very hard to fix, so training on one domain only will produce much better results.
Yup, it's unclear. The current ~consensus is "general purpose frontier model + very sophisticated RAG/system architecture" for legal as an example. I'm building something here using this idea and think its 50/50 (at best) I'm on the right path. It's quite easy to build very clever sounding but often wrong insights into various legal agreements (m&a docs for example). When looking at the tokenization, the training data, decode, architecture (lots of guesses) of the big models, there are a lot of things where the knobs seem turned slightly incorrectly for the domain.
Some of the domains are so large that a specialized model might seem niche but the value prop is potentially astronomical.
I haven't listen to your great podcasts so hard to say what is not covered.
Architectures matter a lot less than data. "Knowledge" and "reasoning" in LLMs is a manifestation of instruct-style data. It won't matter how much cheaper training gets if there is limited instruct data for use cases.
How do you make 100k context window data for example? Still need thousands of people. Same with so-called niches.
Maybe it turns out to be a complex coordination problem to share the data. That's bad for equity investors and giant companies. Anyway, all of this would cost less than the moon landing so it's practicable, you don't need cheaper, you just need risk-taking.
The obviousness of the path from here to there means it's not about innovation. It's all about strategy.
If Google could marshal $10b for Stadia it could spend $10b on generating 100k context window instruct style data and have the best model. It could also synthesize videos from Unity/Unreal for Sora-style generation. It would just be very hard in an org with 100,000+ people to spend $10b on 10 developers and 10,000 writers compared to 400 developers and 3,600 product managers and other egos. At the end of the day you are revisiting the weaknesses that brought Google and other big companies to this mess in the first place.
Anyway I personally think the biggest weakness with ChatGPT and the chat-style UX is that it feels like work. Netflix, TikTok, etc. don't feel like work. Nobody at Google (or OpenAI for that matter) knows how to make stuff that doesn't feel like work. And you can't measure "fun." So the biggest thing to figure out is how much technical stuff matters in a world where people can be poached here and there and walk out with the whole architecture in their heads, versus the non-technical stuff that takes decades of hard-worn personal experiences and strongly held opinions like answers to questions "How do you make AI fun?"
People go for this level of obviousness and it doesn't work. I have no doubt that a meme text game will find some level of literal objective success. But it will still suck. Meme games are a terrible business, both in terms of profits and equity.
This also speaks to why OpenAI, Google and the other developers will struggle to create anything that feels like fun: they will chase obvious stuff like this, they will think its similar to all problems. And in reality, you don't need any testing or data or whatever to know that people hate reading in video games, the best video game writing is worse than the average movie's screenplay, that most dialogue is extremely tedious, so why are you going to try to make it even worse by making it generated by an AI?
Arguably one of the earliest consumer use cases that found footing was AI girlfriend/boyfriend. Large amounts of revenue spread across many small players are generated here but it's glossed over due to the category.
Given that how widespread romance scam schemes already are (the "market" is at least $0.5 billion/year), I would expect any reasonably functioning AI girlfriend/boyfriend model to be massively (ab)used also against unwilling/unwitting "partners".
I think one related area we'll start seeing more of in the future is "resurrected" companions. You have a terminally ill family member, so you train a model on a bunch of video recordings of them, then you can talk to "them" after they've shuffled off this mortal coil.
Do you think there is the possibility of consumer or end-user apps collecting enough specialized data to move downwards on your graph to infra and foundational models?
I forgot to list Bitcoin as another one that launched with no VC money :)
It is remarkable how many of the tech companies that have lasted the longest and got the biggest started off so lean... (although BTC of course is not a company)
I think the difference this time is the types of capabilities provided by transformers vs prior waves of AI are sufficiently different to allow many more types of startups to emerge, as well as big changes in some types of enterprise software by incumbents - in ways that were not enabled by pre-existing ML approaches.
The jury is still out on how useful these additional capabilities provided by transformers are. The question is what is the degree to which it’s possible to reduce the frequency and severity of hallucinations. If that degree is limited without major changes in architecture or major new breakthroughs, then usefulness of gpt-4 style models will be limited. And we just don’t know the answer yet. So far, usefulness of gpt-4 is real but extremely limited. Another issue is this approach means models are costly to train and don’t easily incorporate latest info about the world. In short, it’s way too early to hype this up.
100% agree the theory on AI is old and actually dates back to the early days of "cybernetics". But the real difference is at what point do we considered it sufficiently reduced to practice? I chose GPT-3 but undoubtedly people can point to earlier examples as glimpses of what was coming for sure.
The bar for what is "AI" keeps moving. For example plane autopilots would be "AI" in the 1980s, the ability for a machine to win at chess, go, and other games etc.
As a non-expert in the field I was hesitant at the time to disagree with the legions of experts who last year denounced Blake Lemoine and his claims. I know enough to know, though, of the AI effect <https://en.wikipedia.org/wiki/AI_effect>, a longstanding tradition/bad habit of advances being dismissed by those in the field itself as "not real AI". Anyone, expert or not, in 1950, 1960, or even 1970 who was told that before the turn of the century a computer would defeat the world chess champion would conclude that said feat must have come as part of a breakthrough in AGI. Same if told that by 2015 many people would have in their homes, and carry around in their pockets, devices that can respond to spoken queries on a variety of topics.
To put another way, I was hesitant to be as self-assuredly certain about how to define consciousness, intelligence, and sentience—and what it takes for them to emerge—as the experts who denounced Lemoine. The recent GPT breakthroughs have made me more so.
You should check out the WaPo article that originally published his concerns. He frequently makes many errors audibly with a reporter who is trying rather hard to see his point of view. I’m not trying to be rude, but he came off like kind of a sucker that would fall for a lot of scammer tactics. There were usually some form of strangeness such as him deciding when the content limit of the conversation began and ended. Further, he asks only leading questions, which would be fine if transformers didn’t specifically train to output the maximum likelihood text tokens from the distribution of their training set, which was internet text created by humans.
He was frequently cited as an engineer but I don’t think he actually had a strong background in engineering but rather in philosophy.
Chess is featured in Peter Norvig's "Artificial Intelligence: A Modern Approach" dating back to the 1st edition (1995) and at least up until the 3rd edition (2009). Algorithms such as alpha-beta pruning were definitely considered AI at the time.
The MIT AI Group, including Marvin Minsky, were the mainstream of AI more than 50 years ago, and begat the MIT AI Lab. They and everyone else at the time called their work AI.
Definitely not my intention to forgot or denigrate the past. Obviously all this exists due to deep learning and prior architectures. What I have been running into is many people and companies are interpreting this as "just more of the same" for prior ML waves, when really this is an entirely new capability set.
To the (bad) analogy on cars versus planes - both have wheels and can drive on the ground, but planes open up an entirely new dimension / capability set that can transform transportation, logistics, defense and other areas that cars were important, but different enough in.
That's the big difference in this round. Before you had to have the ML expertise and the expertise to understand the implication of say a MNIST classifier example. Now anyone can "get" it because you're prompting and getting inference back in English. Underneath the fundamentals aren't all that different though, it has the same novelty factor and the same limitations apply.
I think the fundamentals are radically different, just due to the ease of applying this stuff.
I used to be able to train and deploy a ML model to help solve a problem... if I put aside a full week to get that done.
Now I tinker with LLMs five minutes at a time, or maybe for a full hour if I have something harder - and get useful results. I use them on a daily basis.
I agree that most startups fail. Only a small handful of companies at a given moment are the "breakouts" that are clearly working, and those are the most derisked to join (although not risk free for certain).
I think the entire industry forgot about startup risk during COVID, and unfortunately it is coming rushing back now with the changing environment....