TBH, I thought this attack was well known. I think it was a couple of months ago that someone demonstrated using "a a a a a a" in very large sequences to get ChatGPT to start spewing raw training data.
Which sets of data that you get is fairly random, and it is likely mixing different sets as well to some degree.
Oddly, other online LLMs do not seem to be as easy to fool.
Which sets of data that you get is fairly random, and it is likely mixing different sets as well to some degree.
Oddly, other online LLMs do not seem to be as easy to fool.