I don’t understand why this is controversial, I feel like I am missing something, but it seems that training should be throughly fair use, where the outputs may violate copyright.
I think can see where some of the points made in the article come from outside an "obviously a shakedown" light. Specifically since it references Stability AI's Stable Diffusion, where training = distribution of the model after (not just ability to use). Even without that though... I think there is still a decent amount of controversy to be had, no matter how clear I feel the answer should be. Ultimately I think it boils down to questions like "who decides when it stops being generic creation from input bytes to decompression of copyrighted data from training" and "where is the line between copying material and deriving generic insights from it to be drawn or measured".
I don't trust anyone who claims to be absolutely certain on these types of questions. Same for "when does it stop being prediction and start being intelligence" or how many steps between there might need to be. I do trust those who feel one way is overall more advantageous for society even though they aren't precisely sure about the details though. Personally I've always had trouble with believing existing copyright laws are at the right balance for what's good for society anyways so I tend to lean towards the "let the data be used for training" side of things too. I just think the questions are even fuzzier than normal copyright questions rather than clearer.
The claim is around failing fair-use by a substantial portion of the work redistributed for commercial gain at the expense of the rights holder.
The "trained on" isn't the law being broken here, that's the scare campaign. If it wins it's just the smoking gun that helps prove the other points easier, you'll still be able to train on anything.
But if the portion of some work thought to be there is not actually in the final product (prompt engineering a reproduction by detailed description shouldn't count as that is tracing with words) and it's just some style mix (as most AI engineer's say) then I don't think anything will come of this.
This all feels like a shakedown.