I think there’s an equivocation being accidentally made between n-gram models, and Markov processes, because “Markov chain” is used to mean both things.
N-gram models are not useful in many of the ways LLMs are.
N-gram models are very limited.
On the other hand, basically any process can be considered a Markov process if you make the state include enough.
So, calling both the very-limited n-gram models and the nearly-unlimited Markov processes, by the same name “Markov chain” is just, super confusing.
N-gram models are not useful in many of the ways LLMs are.
N-gram models are very limited.
On the other hand, basically any process can be considered a Markov process if you make the state include enough.
So, calling both the very-limited n-gram models and the nearly-unlimited Markov processes, by the same name “Markov chain” is just, super confusing.