Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

"This included PII, entire poems, “cryptographically-random identifiers” like Bitcoin addresses, passages from copyrighted scientific research papers, website addresses, and much more."

https://www.404media.co/google-researchers-attack-convinces-...



Question remains, how do we know they were part of the training data?


The same way we know that a million monkeys won't spit out Shakespeare's works in any reasonable amount of time. Simple probabilities.


You mean chatGPT where able to scramble the exact same sentence with pure luck?


Well yes, pulling legit-looking text out of its ass is sort of what it does best.


That’s not my point.

Spitting out content that looks legit is one thing, but spitting out text that matches something online exactly is more suspicious.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: