If the material which OpenAI is trained on is itself not subject to copyright pr...

If the material which OpenAI is trained on is itself not subject to copyright protections, then other LLMs trained on OpenAI should also not be subject to any copyright restrictions.

You can't have both ways... If OpenAI wants to claim that the AI is not repeating content but 'synthesizing it' in the same was as a human student would do... Then I think the same logic should extend to DeepSeek.

Now if OpenAI wants to claim that its own output is in fact copyright-protected, then it seems like it should owe royalty payments to everyone whose content was sourced upstream to build its own training set. Also, synthetic content which is derived from real content should also be factored in.

TBH, this could make a strong case for taxing AI. Like some kind of fee for human knowledge and distributed as UBI. The training data played a key part in this AI innovation.

As an open source coder, I know that my copyrighted code is being used by AI to help other people produce derived code and, by adapting it in this way, it's making my own code less relevant to some extent... In effect, it could be said that my code has been mixed in with the code of other open source developers and weaponized against us.

It feels like it could go either way TBH but there needs to be consistency.