That's a matter of perspective. AI's do not make a copy of the source material. It very much just adjusts their internal weights, which from a broadminded perspective, can be seen as simple inspiration, and not copying.
Of course, just like a human artist, it could probably closely approximate the source material if it wanted to, but it would still be its own approximation, not an exact duplicate. As for plagiarism, we all have to be careful to rephrase things we've read elsewhere, and perhaps AI's need to be trained to do this a bit better... but it doesn't change the underlying fact that they're learning from the text they read, not storing a verbatim copy (at least, no more so than a human reader with a good memory)
> AI's do not make a copy of the source material. It very much just adjusts their internal weights, which from a broadminded perspective, can be seen as simple inspiration, and not copying.
I think the term "AI" is one of the most loaded and misleading to come up in recent discourse. We don't say that relational databases "pack and ship" data, or web clients "hold a conversation" with each other. But for some reason we can say that LLMs and generative models "get inspired" by the data they ingest. It's all just software.
In my own opinion I don't think the models can copy verbatim except in cases of overfitting, but people like the author of the post have a right to feel that something is very wrong with the current system. It's the same principle of compressing a JPEG of the Mona Lisa to 20% and calling that an original work. I believe the courts don't care that it's just a new set of numbers, but instead want to know where those numbers originated from. It is a color of bits[1] situation.
When software is anthropomorphized, it seems like a lot of criticisms against it are pushed aside. Maybe it is because if you listen to the complaints and stop iterating on something like AI, it's like abandoning your child before their potential is fully realized. You see glimpses of something like yourself within its output and become (parentally?) invested in the software in a way beyond just saying "it's software." I feel as if people are getting attached to this kind of software unlike they would to a database, for example.
A thought experiment I have is whenever the term "AI" appears, mentally replace it with the term "advanced technology." The seeming intent behind many headlines changes with this replacement. "Advanced developments in technology will displace jobs." The AI itself isn't the one coming for people.
My own perspective is that humans do not have an exclusive right to intelligence, or ultimately to personhood. I am not anthropomorphizing when I defend the rights of AI. Instead, I am doing so in the abstract sense, without claiming that the current technology should be rightly classified as AI or not. But since the arguments are being framed against the rights of AI to consume media, I think the defense needs to be framed in the same way.
> Based on the comment you replied to, it seems they are indeed producing verbatim copies.
Well i'll leave it to the legal system to decide if that's true.
But in any case, that's no different from a human with a photographic memory doing the same thing after reading a paragraph. We don't blame them for their superior memory, or being inspired by the knowledge. We don't claim they've violated copyright because their memory contains an exact copy of what they read.
We may still demand that they avoid reproducing the exact words they've read, even though they are capable of it -- which is fine. We can demand the same of AI's. All I object to, is the idea that a smart AI, with a great memory is guilty of something just by reading or viewing content that was willingingly shared online.
If I tell you water is wet and the sky is blue, will you be waiting for a court case to grind through the appeals process on that as well? The examples in the filing were unambiguous. You can go look it up and see them, they were also cited in all the news articles I saw about it. The AI regurgitated many paragraphs of text with very, very few small modifications.
The issue at hand is not if some words were copied; it's a legal issue of whether that constitutes legal or otherwise fair use or not. And I'm not a lawyer, and am happy to wait for the court to decide.
But to be honest, I don't really care one way or the other, since it doesn't get to the heart of the matter as I see it. To my mind, it's no different than a human with a good memory doing the same thing.
Specifically, it isn't the consumption of the media that is the problem, or even remembering it very well. Rather, it is the public verbatim reproduction of it, which a human might inappropriately do as well. AI's need to be trained to avoid unfair use of source material, but I don't think that means they should be prohibited from consuming public material.
Of course, just like a human artist, it could probably closely approximate the source material if it wanted to, but it would still be its own approximation, not an exact duplicate. As for plagiarism, we all have to be careful to rephrase things we've read elsewhere, and perhaps AI's need to be trained to do this a bit better... but it doesn't change the underlying fact that they're learning from the text they read, not storing a verbatim copy (at least, no more so than a human reader with a good memory)