Isn’t it obvious that the confidently wrong problem will never go away because a...

raynr · 2025-08-22T15:56:58 1755878218

As a layman, this too strikes me as the problem underlying the "confidently wrong" problem.

The author proposes ways for an AI to signal when it is wrong and to learn from its mistakes. But that mechanism feeds back to the core next token matcher. Isn't this just replicating the problem with extra steps?

I feel like this is a framing problem. It's not that an LLM is mostly correct and just sometimes confabulates or is "confidently wrong". It's that an LLM is confabulating all the time, and all the techniques thrown at it do is increase the measured incidence of LLM confabulations matching expected benchmark answers.

jdbernard · 2025-08-23T01:17:25 1755911845

It seems obvious to me, but there was a camp that thought, at least at one time, that probabilistic next token could be effectively what humans are doing anyways, just scaled up several more orders of magnitude. It always felt obvious to me that there was more to human cognition than just very sophisticated pattern matching, so I'm not surprised that these approaches are hitting walls.

yifanl · 2025-08-22T14:34:34 1755873274

There are people convinced that if we throw a sufficient amount of training data and VC money at more hardware, we'll overcome the gap.

Technically, I can't prove that they're wrong, novel solutions sometimes happen, and I guess the calculus is that it's likely enough to justify a trillion dollars down the hole.

gavinray · 2025-08-22T14:37:12 1755873432

There's a guy, Ken Stanley, who wrote the NEAT[0]/HyperNEAT[1] algorithms.

His big idea is that evolution/advancements don't happen incrementally, but rather in unpredictable large leaps.

He wrote a whole book about it that's pretty solid IMO: "Why Greatness Cannot Be Planned: The Myth of the Objective."

[0] https://en.wikipedia.org/wiki/Neuroevolution_of_augmenting_t... [1] https://en.wikipedia.org/wiki/HyperNEAT

delichon · 2025-08-22T14:48:17 1755874097

https://en.wikipedia.org/wiki/Saltation_(biology)

gavinray · 2025-08-22T15:06:27 1755875187

Neat (no pun intended), TIL there's a word for this

devin · 2025-08-22T15:28:59 1755876539

Whenever I try to tell people about the myth of the objective they look at me like I'm insane. It's not very popular to tell people that their best laid plans are actually part of the problem.

yifanl · 2025-08-22T14:46:12 1755873972

I would suspect that any next step comes with a novel implementation though, not just trying to scale the same shit to infinity.

I guess the bitter lesson is gospel now, which doesn't sit right with me now that we're past the stage of Moore's Law being relevant, but I'm not the one with a trillion dollars, so I don't matter.

corytheboyd · 2025-08-22T16:44:17 1755881057

I’d say it was worth throwing down some cash for, because we get cool new things by full-assing new ideas. But… yeah… a TRILLION dollars is waaaay too far.

kovacs · 2025-08-22T16:08:33 1755878913

This is the best analogy I've read to explain what's going on and takes me back to the days of Doom and how it was so transformative at the time. Perhaps in time the current generation will be viewed as the Doom engine as we await the holy grail of full 3D in Quake.

corytheboyd · 2025-08-22T16:40:33 1755880833

I guess technically 3D on computers is still clever 2D, but let’s not break the metaphor down too hard lol. Love the Doom/Quake comparison!

gus_massa · 2025-08-22T14:54:51 1755874491

It's easy to solve if they modify they training to remove some weight from Stack Overflow and add more weight to Yahoo! Answers :) .

I remember a few years ago, we were planing to make some kind of math forum for students in the first year of the university. My opinion was that it was too easy to do it wrong. On one way you can be like Math Overflow were all the questions are too technical (for first year of the university) and all the answers are too technical (first year of the university). On the other way, you can be like Yahoo! Answers, where more than half of the answers were "I don't know", with many "I don't know" per question.

For the AI, you want to give it some room to generalize/bullshit. It one page says that "X was a few months before Z" and another page says that "Y was a few days before Z", than you want an hallucinated reply that says that "X happened before Y".

On the other hand, you want the AI to say "I don't know.". They just gave too little weight to the questions that are still open. Do you know a good forum where people post questions that are still open?

corytheboyd · 2025-08-22T16:58:35 1755881915

> For the AI, you want to give it some room to generalize/bullshit.

Totally! In my mind I’ve been playing with the phrase: it’s good at _fuzzy_ things. For example IMO voice synthesis before and after this wave of AI hype is actually night and day! In part, to my fuzzy idea, because voice synthesis isn’t factual, it’s billions of little data points coming together to emulate sound waves, which is incredibly fuzzy. Versus code, which is pointy: it has one/few correct forms, and infinite/many incorrect forms.

aidenn0 · 2025-08-22T15:39:36 1755877176

I mean if it's trained on things like Reddit then it's just reflecting its training data. I asked a question on reddit just yesterday and the only response I got was confidently wrong. This is not the first time it has happened.