> That's like saying I don't understand what vanilla flavour means just because ...

moffkalast · on Aug 27, 2024

Well there are several problems that lead to the failure.

One is conditioning, models are not typically tuned to say no when they don't know, because confidently bullshitting unfortunately sometimes results in higher benchmark performance which looks good on competitor comparison reports. If you want to see a model that is tuned to do this slightly better than average, see Claude Opus.

Two, you're asking the model to do something that doesn't make any sense to it, since it can't see the letters. It has never seen them, it hasn't learned to intuitively understand what they are. It can tell you what a letter is the same way it can tell you that an old man has white hair despite having no concept of what either of that looks like.

Three, the model is incredibly dumb in terms of raw inteligence, like a third of average human reasoning inteligence for SOTA models at best according to some attempts to test with really tricky logic puzzles that push responses out of the learned distribution. Good memorization helps obfuscate this in lots of cases, especially for 70B+ sized models.

Four, models can only really do an analogue of what "fast thinking" would be in humans, chain of thought and various hidden thought tag approaches help a bit but fundamentally they can't really stop and reflect recursively. So if it knows something it blurts it out, otherwise bullshit it is.

ben_w · on Aug 27, 2024

> because confidently bullshitting unfortunately sometimes results in higher benchmark performance which looks good on competitor comparison reports

You've just reminded me that this was even a recommended strategy in some of the multiple choice tests during my education. Random guessing was scored equally as if you hadn't answered at all

If you really didn't know an answer then every option was equally likely and no benefit, but if you could eliminate just one answer then your expected score from guessing between the others was worthwhile.