It will not bite them back, because they will argue that what was put in the dataset was "easy" and not worthy of copyright anyway. Then, they'll tout how smart their "AI" is. Two birds with one stone.
If you can induce ChatGPT to repeat a copyright work, I wonder if that’s definitive proof of copyright infringement.
It has no problem reproducing song lyrics for me, which I’m pretty sure are absolutely copyright. Interestingly, it’ll start to reproduce a famous book for me, and then cut off in such a way as to suggest there’s a hard filter to stop it proceeding