"This included PII, entire poems, “cryptographically-random identifiers” like Bi...

marginalia_nu · on Nov 30, 2023

Question remains, how do we know they were part of the training data?

FartyMcFarter · on Dec 2, 2023

The same way we know that a million monkeys won't spit out Shakespeare's works in any reasonable amount of time. Simple probabilities.

Alifatisk · on Nov 30, 2023

You mean chatGPT where able to scramble the exact same sentence with pure luck?

marginalia_nu · on Nov 30, 2023

Well yes, pulling legit-looking text out of its ass is sort of what it does best.

Alifatisk · on Nov 30, 2023

That’s not my point.

Spitting out content that looks legit is one thing, but spitting out text that matches something online exactly is more suspicious.