That’s a good theory. It latches onto something random, and once it’s off down t...

That’s a good theory.

It latches onto something random, and once it’s off down that path it can’t remember what it was asked to do and so its task is entirely reduced to next-word prediction (without even the addition of the usual specific context/inspiration from an intitial prompt). I guess that’s why it tends to leak training data. This attack is a simple way to say ‘write some stuff’ without giving it the slightest hint what to actually write.

(Saying ‘just write some random stuff’ would still in some sense be giving it something to go on; a huge string of ‘A’s less so.)