For anyone familiar with the legal landscape, except for scenarios where AI products make reproductions of their training material, why isn't this covered under fair use?
Don't humans basically do the same thing when attempting to create new music — they derive a lifetime of inspiration from the works of others?
One of the pillars of fair use is whether it disrupts the market for the original work. The explicit goal of gen ai is to replace artists and their original work.
I thought the explicit goal of AI was to create systems that can do tasks that typically require human intelligence. That includes beneficial things like finding cures for diseases, technology innovation, etc … Wouldn’t it be a shame to limit this growth potential to protect friggin’ YouTubers?
Maybe go after the application, not the technology? Someone uses AI to explicitly plagiarize an artist’s content? Sure, go ahead & sue! But limiting the growth potential of a whole class of technology seems like a bad idea, a really bad idea actually if your military enemy had made that same technology a top priority for the next years …
If I train a gen AI on the full works of Pablo Picasso, and ask it to create new works, have I disrupted the market for the original works of Pablo Picasso?
If I train people to draw anime from a book on how to draw anime, and ask them to start drawing work related to Bleach (e.g.), have I disrupted the market for the original works of Bleach?
No, humans are not Python programs running linear algebra libraries, sucking in mountains of copyrighted data to use at massive scale in corporate products. The fact that this question comes up in EVERY thread about this is honestly sad.
It’s like fishing. We have laws for that not because one person with a pole and a line can catch a handful of fish. It’s because that eventually evolved into a machine with many lines and nets harvesting too many fish. They’re both “fishing” but once you reach a scale beyond what one person can reasonably do the intent and effect becomes completely different.
I'm asking about the law. There's a continuously mounting discussion about how existing case law will apply to ML, and to what degree there is liability. I make it very clear that I am interested in hearing from people who are intimate with the legal landscape.
Is it that disgusting to you to discuss the law that you want to derail it by talking about how sad it is to ask?
And there’s no legislation or settled case law yet that says if you build a sufficiently complex computer program to rearrange copyrighted works that the output can be treated like an original work from a human author.
There is a continuum between a human who has heard a lifetime of music and later wrote similar but different music inspired by what they have heard, and a copy machine that directly reproduces (with minor degradation) a copywritten work.
The question is where on that spectrum does current AI training lie, and where is the cutoff between fair-use and unauthorized commercial use of copywritten works.
Today's AIs are not the same as a human creating original work. Even humans have to be careful to not reproduce existing works too closely or they also get blamed for plagiarism.
Because AI often replicates things much more closely than what fair use would constitute, and doesn't label sources like when you are quoting. And it's generally harmful for humanity, too.
I'm specifically interested in situations where ML products do not do simple reproductions of copyrighted material. I'm aware that it's difficult to even know the space of output and to "align" the model correctly.
Are we normally required to label sources when referencing other copyrighted materials, whether in songs or movies or otherwise?
Depends on how much of the source you used (which is in line with how fair use works, unless you're developing parody). Given that AI is using the entire source: yes.
As we know with scraping cases, the amount data and time also may play a role in determining fair use (think in terms of buffet ettiquite. "all you can eat" does not in fact mean "you can eat it all by yourself"). Funnily enough LinkedIn (owned by Microsoft) did argue successfully in court against scraping a website.
There are lots of things that are not simple reproductions that are not fair use.
If I take ten of your copywritten photographs and stack them on top of each other in Photoshop with transparency, the output is not a simple reproduction of your work. If I sold that for commercial purposes you would be upset with me and likely have a copyright case.
That's an obvious example, but my point is there aren't super clean-cut definitions for these things, and it's not settled case law yet which side current AI training and content generation falls under.
There is tons of human made art in contemporary art museums that is copywritten stuff re-arranged.
Most famously might be Andy Warhol's Campbell's Soup cans. You can find plenty more though. Product labels, magazine covers, pasted together as "art" even thought each part is copywritten.
People do make the claim that this ought to be considered "fair use" under the law. There are an numbers of prominent cases were AI companies are getting sued, and we'll see if this defense actually works.
If the question is about what is "fair" I don't see how you could be surprised that artists, journalists, musicians, Youtube creators, would object to huge tech companies using their stuff without permission to replace them. It is entirely to be expected that many people find this unfair.
If enough people file this to unfair and outragous, even if the courts found the "fair use" arguement cogent, the laws could be changed.
The answer is, we don't know yet, LLMs are too new. There are multiple lawsuits creaking to life on this topic and the defendants will no doubt claim fair use, but legal experts seem to think it could go either way. Training language models on publicly-available data for academic research has been successfully defended as fair use in court, but there is plenty of precedent in the law where something can be fair use in a research context but infringement if done for profit, which could very well be the case again here.
It seems like every autocompleter or recommendation feature was trained on data obtained this way. The form of the output is very different and perceived very differently. I imagine Pandora had to train their recommendation on recorded music and then using that entire body of knowledge chooses a song to stream and pay royalties.
Every music service has something like this, are they delivering just the value of the streamed music? Great then they only owe those royalties. Are they delivering the value of EVERY song they trained on every time a new song is chosen? I sure wasn’t asking that question until generative results became the product.
Well, an “AI” has accessed those items because it was forced (strictly speaking) by a human to do so. Thus, the human doing that is not making “fair” use of those sources.
Don't humans basically do the same thing when attempting to create new music — they derive a lifetime of inspiration from the works of others?