“Hallucination” has been in the training data much earlier than even llms. The e...

creatonez · on Aug 27, 2024

> The easiest way to control this phenomenon is using the “hallucination” tokens, hence the construction of this prompt.

That's what I'm getting at. Hallucinations are well known about, but admitting that you "hallucinated" in a mundane conversation is a rare thing to happen in the training data, so a minimally prompted/pretrained LLM would be more likely to say "Sorry, I misinterpreted" and then not realize just how grave the original mistake was, leading to further errors. Add the word hallucinate and the chatbot is only going to humanize the mistake by saying "I hallucinated", which lets it recover from extreme errors gracefully. Other words, like "confabulation" or "lie", are likely more prone to causing it to have an existential crisis.

It's mildly interesting that the same words everyone started using to describe strange LLM glitches also ended up being the best token to feed to make it characterize its own LLM glitches. This newer definition of the word is, of course, now being added to various human dictionaries (such as https://en.wiktionary.org/wiki/hallucinate#Verb) which will probably strengthen the connection when the base model is trained on newer data.