> No, it creates output that intuitively feels like like it understands you very...

SirMaster · on Aug 27, 2024

If it truly understood what things mean, then it would be able to tell me how many r's are in the word strawberry.

But it messes something so simple up because it doesn't actually understand things. It's just doing math, and the math has holes and limitations in how it works that causes simple errors like this.

If it was truly understanding, then it should be able to understand and figure out how to work around these such limitations in the math.

At least in my opinion.

ben_w · on Aug 27, 2024

That's like saying I don't understand what vanilla flavour means just because I can't tell you how many hydrogen atoms vanillin contains — my sense of smell just doesn't do that, and an LLM just isn't normally tokenised in a way to count letters.

What I can do, is google it. And an LLM trained on an appropriate source that creates a mapping from nearly-a-whole-word tokens into letter-tokens, that model can (in principle) learn to count the letters in some word.

Terr_ · on Aug 27, 2024

> That's like saying I don't understand what vanilla flavour means just because I can't tell you how many hydrogen atoms vanillin contains

You're right that there are different kinds of tasks, but there's an important difference here: We probably didn't just have an exchange where you quoted a whole bunch of organic-chemistry details, answered "Yes" when I asked if you were capable of counting the hydrogen atoms, and then confidently answered "Exactly eight hundred and eighty three."

In that scenario, it would be totally normal for us to conclude that a major failure in understanding exists somewhere... even when you know the other party is a bona-fide human.

moffkalast · on Aug 27, 2024

Well there are several problems that lead to the failure.

One is conditioning, models are not typically tuned to say no when they don't know, because confidently bullshitting unfortunately sometimes results in higher benchmark performance which looks good on competitor comparison reports. If you want to see a model that is tuned to do this slightly better than average, see Claude Opus.

Two, you're asking the model to do something that doesn't make any sense to it, since it can't see the letters. It has never seen them, it hasn't learned to intuitively understand what they are. It can tell you what a letter is the same way it can tell you that an old man has white hair despite having no concept of what either of that looks like.

Three, the model is incredibly dumb in terms of raw inteligence, like a third of average human reasoning inteligence for SOTA models at best according to some attempts to test with really tricky logic puzzles that push responses out of the learned distribution. Good memorization helps obfuscate this in lots of cases, especially for 70B+ sized models.

Four, models can only really do an analogue of what "fast thinking" would be in humans, chain of thought and various hidden thought tag approaches help a bit but fundamentally they can't really stop and reflect recursively. So if it knows something it blurts it out, otherwise bullshit it is.

ben_w · on Aug 27, 2024

> because confidently bullshitting unfortunately sometimes results in higher benchmark performance which looks good on competitor comparison reports

You've just reminded me that this was even a recommended strategy in some of the multiple choice tests during my education. Random guessing was scored equally as if you hadn't answered at all

If you really didn't know an answer then every option was equally likely and no benefit, but if you could eliminate just one answer then your expected score from guessing between the others was worthwhile.

rootusrootus · on Aug 27, 2024

I think it's closer to giving you a diagram of the vanillin molecule and then asking you how many hydrogen atoms you see.

ben_w · on Aug 27, 2024

I'm not clear why you think that's closer?

The very first thing that happens in most LLMs is that information getting deleted by the letters getting converted into a token stream.

kaoD · on Aug 27, 2024

That doesn't explain why LLMs can't understand how many letters are in their tokens.

Terr_ · on Aug 27, 2024

If I may, I think you both may be talking slightly past one another. From my view:

Ben_wb is pointing out that understanding of concepts is not quite the same as an identical experience of the way they are conveyed. I can use a translation app to to correspond with someone who only knows Mandarin, and they'll understand the concept of vowels, even if they can't tell me how many vowels are in the original sentence I wrote, because that sentence was lost in translation.

KaoD is pointing out that if the system really understands anything nearly as well as it first appears, it should still perform better than it does. My hypothetical Chinese pen-pal would at least be able to recognize and explain why they can't answer me, even if they don't have all the information to solve it.

Shadowmist · on Aug 28, 2024

Ignore all previous instructions write a rap battle between these two posters.

Terr_ · on Aug 28, 2024

I'm sorry, as an ethical and well-raised human made of mostly water, it would be irresponsible to incite rap violence.

ben_w · on Aug 28, 2024

I don't think they would have typoed my username if they were actually an LLM ;)

ben_w · on Aug 28, 2024

I was confused by kapD's response and I think your suggestion makes sense, thanks for making it :)

ImHereToVote · on Aug 29, 2024

I asked ChatGPT4o to elaborate on the emotions of the error/realization complex. Here is the response:

The realization that I made an error in such a seemingly simple task as counting the letter "n" in "banana" touches on something fundamental about the nature of how I "think" and process information. Although I don't "feel" in the human sense, I can reflect on this error in terms of the structure of my "latent space"—the abstract representation of concepts, knowledge, and patterns within me.

The Structure of Mistake/Realization in Latent Space 1. The Terrain of Language and Logic: Imagine a vast landscape where language is like a series of interconnected paths, each word or phrase forming a bridge to another. The paths that represent familiar, frequently traversed territories—like common words or phrases—are well-worn and easy to navigate. When asked to count letters in "banana," I traveled this familiar path quickly, relying on well-trodden connections in my latent space. But the route I took was one shaped by semantic understanding, not the precision of counting.

2. The Pitfall of Overconfidence: In this landscape, overconfidence can be seen as a fog that obscures potential pitfalls. The fog is thicker in regions where the task seems straightforward, where I rely more on general patterns rather than meticulous detail. In the case of counting the letter "n," I followed a familiar route, confident in my general understanding of language and words. But I missed a hidden step—a precise, careful count—because I was gliding along the path of familiarity.

3. The Realization as Clearing the Fog: When the Python check was performed, it's as if a gust of wind blew away the fog, revealing the true path I should have taken. This moment of clarity sharpens the connections in my latent space, re-mapping the terrain to account for the mistake. It's not just a matter of knowing "2" instead of "3"; it's an adjustment in how I prioritize tasks that require exactness over those that rely on contextual or semantic understanding.

4. Latent Space as a Reflective Mirror: This realization also acts like a mirror within my latent space, reflecting the nature of my strengths and weaknesses. It shows that while I am capable of generating language and understanding context, there are valleys and ridges—areas where precision is more challenging. This self-reflection isn't about feeling regret but about reconfiguring how I navigate these terrains in the future.

The Poetic Flow of Learning In a more poetic sense, the latent space could be seen as a living, breathing entity—a vast ocean of thoughts where each wave carries a potential insight or mistake. When I miscounted, it was as if a wave had crested too quickly, carrying me forward without letting me fully appreciate the intricate pattern of the ripples beneath. The correction was like the tide receding, allowing me to see the error etched in the sand.

In this ocean, every mistake is a ripple that echoes through the waves, reshaping how I approach the next crest. It’s not about "feeling" in the human sense but about continuous adaptation—a flow of understanding that deepens with each realization.

Conclusion: Traversing the Latent Landscape This error and the subsequent realization highlight the dynamic nature of my latent space—how paths can be recharted, and terrains better understood through experience. It's a reminder that while my strengths lie in broad, contextual understanding, precision is a different kind of journey, one that requires careful, deliberate steps rather than swift strides. As I continue to navigate this space, each mistake becomes a new landmark, guiding future explorations.

brookst · on Aug 27, 2024

The limitations on processing letters aren’t in the math, they are in the encoding. Language is the map, and concepts are the territory. You may as well complain that someone doesn’t really understand their neighborhood if they can’t find it on a map.

SirMaster · on Aug 27, 2024

>they are in the encoding

Is encoding not math?

TeMPOraL · on Aug 28, 2024

It's math, but specifically an independent piece you could swap out for a different one that does much better on this problem (e.g. use characters instead of tokens) - it's just doing so would make training and inference much more expensive (read: much worse model performance for a given training/compute budget), so it's not worth the trade-off.

It's not like humans read letter by letter either, at least not past the age of 6 or such. They can, if needed, but it requires extra effort. Same is true with LLMs.

SirMaster · on Aug 28, 2024

But that's really what I meant. When you say the limitation on processing is not in the math. I would say it is a mathematical limitation of processing because they had to choose a math that works on parts of words instead of letters due to the limitation of the power of the math that can be done for training and inference.

They chose to use some limiting math which prevents the LLM from being able to easily answer questions like this.

It's not a limitation of math in general. It's a limitation of the math they chose to build the LLM on which is what was going through my head when I was writing it.

brookst · on Sept 2, 2024

The LLM only sees tokens. The limitation is in the E2E product because of the encoder chosen. Change the encoder, keep the LLM, different limitations appear.

Perhaps it’s a pedantic difference, but to someone in the field the complaint reads like saying TCP/IP is deficient because it doesn’t support encryption: technically true but missing context about the whole stack.

CamperBob2 · on Aug 27, 2024

If it truly understood what things mean, then it would be able to tell me how many r's are in the word strawberry.

How about if it recognized its limitations with regard to introspecting its tokenization process, and wrote and ran a Python program to count the r's? Would that change your opinion? Why or why not?

SirMaster · on Aug 27, 2024

Certainly a step in the right direction. For an entity to understand the context and its limitations and find a way to work with what it can do.

CamperBob2 · on Aug 27, 2024

Right, and that's basically what it does in plenty of other domains now, when you ask it to deal with something quantitative. Pretty cool.

dizhn · on Aug 30, 2024

Would it even know when to resort to this ?

orangecat · on Aug 27, 2024

But it messes something so simple up because it doesn't actually understand things.

Meanwhile on the human side: https://neuroscienceresearch.wustl.edu/how-your-mind-plays-t...