Isn’t it obvious that the confidently wrong problem will never go away because all of this is effectively built on a statistical next token matcher? Yeah sure you can throw on hacks like RAG, more context window, but it’s still built on the same foundation.
It’s like saying you built a 3D scene on a 2D plane. You can employ clever tricks to make 2D look 3D at the right angle, buts it’s fundamentally not 3D, which obviously shows when you take the 2D thing and turn it.
It seems like the effectiveness plateau of these hacks will soon be (has been?) reached and the smoke and mirrors snake oil sales booths cluttering Main Street will start to go away. Still a useful piece of tech, just, not for every-fucking-thing.
As a layman, this too strikes me as the problem underlying the "confidently wrong" problem.
The author proposes ways for an AI to signal when it is wrong and to learn from its mistakes. But that mechanism feeds back to the core next token matcher. Isn't this just replicating the problem with extra steps?
I feel like this is a framing problem. It's not that an LLM is mostly correct and just sometimes confabulates or is "confidently wrong". It's that an LLM is confabulating all the time, and all the techniques thrown at it do is increase the measured incidence of LLM confabulations matching expected benchmark answers.
It seems obvious to me, but there was a camp that thought, at least at one time, that probabilistic next token could be effectively what humans are doing anyways, just scaled up several more orders of magnitude. It always felt obvious to me that there was more to human cognition than just very sophisticated pattern matching, so I'm not surprised that these approaches are hitting walls.
There are people convinced that if we throw a sufficient amount of training data and VC money at more hardware, we'll overcome the gap.
Technically, I can't prove that they're wrong, novel solutions sometimes happen, and I guess the calculus is that it's likely enough to justify a trillion dollars down the hole.
Whenever I try to tell people about the myth of the objective they look at me like I'm insane. It's not very popular to tell people that their best laid plans are actually part of the problem.
I would suspect that any next step comes with a novel implementation though, not just trying to scale the same shit to infinity.
I guess the bitter lesson is gospel now, which doesn't sit right with me now that we're past the stage of Moore's Law being relevant, but I'm not the one with a trillion dollars, so I don't matter.
I’d say it was worth throwing down some cash for, because we get cool new things by full-assing new ideas. But… yeah… a TRILLION dollars is waaaay too far.
This is the best analogy I've read to explain what's going on and takes me back to the days of Doom and how it was so transformative at the time. Perhaps in time the current generation will be viewed as the Doom engine as we await the holy grail of full 3D in Quake.
It's easy to solve if they modify they training to remove some weight from Stack Overflow and add more weight to Yahoo! Answers :) .
I remember a few years ago, we were planing to make some kind of math forum for students in the first year of the university. My opinion was that it was too easy to do it wrong. On one way you can be like Math Overflow were all the questions are too technical (for first year of the university) and all the answers are too technical (first year of the university). On the other way, you can be like Yahoo! Answers, where more than half of the answers were "I don't know", with many "I don't know" per question.
For the AI, you want to give it some room to generalize/bullshit. It one page says that "X was a few months before Z" and another page says that "Y was a few days before Z", than you want an hallucinated reply that says that "X happened before Y".
On the other hand, you want the AI to say "I don't know.". They just gave too little weight to the questions that are still open. Do you know a good forum where people post questions that are still open?
> For the AI, you want to give it some room to generalize/bullshit.
Totally! In my mind I’ve been playing with the phrase: it’s good at _fuzzy_ things. For example IMO voice synthesis before and after this wave of AI hype is actually night and day! In part, to my fuzzy idea, because voice synthesis isn’t factual, it’s billions of little data points coming together to emulate sound waves, which is incredibly fuzzy. Versus code, which is pointy: it has one/few correct forms, and infinite/many incorrect forms.
I mean if it's trained on things like Reddit then it's just reflecting its training data. I asked a question on reddit just yesterday and the only response I got was confidently wrong. This is not the first time it has happened.
It’s like saying you built a 3D scene on a 2D plane. You can employ clever tricks to make 2D look 3D at the right angle, buts it’s fundamentally not 3D, which obviously shows when you take the 2D thing and turn it.
It seems like the effectiveness plateau of these hacks will soon be (has been?) reached and the smoke and mirrors snake oil sales booths cluttering Main Street will start to go away. Still a useful piece of tech, just, not for every-fucking-thing.