- I don't think it's even reasonable to suggest that 1000 people all coming up w...

martin-t · 2025-12-10T01:47:37 1765331257

> I don't think it's even reasonable to suggest that 1000 people all coming up with variations of some arbitrary bit of code either deserve credit

There's 8B people on the planet, probably ~100M can code to some degree[0]. Something only 1k people write is actually pretty rare.

Where would you draw the line? How many out of how many?

If I take a leaked bit of Google or MS or, god forbid, Oracle code and manage to find a variation of each small block in a few other projects, does it mean I can legally take the leaked code and use it for free?

Do you even realize to what lengths the tech companies went just a few years ago to protect their IP? People who ever even glanced at leaked code were prohibited from working on open source reimplementations.

> That scenario is already today very well accepted legally and morally etc as public domain.

1) Public domain is a legal concept, it has 0 relevance to morality.

2) Can you explain how you think this works? Can a person's work just automatically become public domain somehow by being too common?

> Copyleft is not OSS, it's a tiny variation of it, which is both highly ideological and impractical.

This sentence seems highly ideological. Linux is GPL, in fact, probably most SW on my non-work computer is GPL. It is very practical and works much better than commercial alternatives for me.

> Less than 2% of OSS projects are copyleft.

Where did you get this number? Using search engines, I get 20-30%.

[0]: It's the number of github users, though there's reportedly only ~25M professional SW devs, many more people can code but don't professionaly.

sholain · 2025-12-10T07:12:09 1765350729

+ Once again: 1000 K people coming up with some arbitrary bit of content is already understood in basically every legal regime in the world as 'public domain'.

"Can you explain how you think this works? Can a person's work just automatically become public domain somehow by being too common?"

Please ask ChatGPT for the breakdown but start with this: if someone writes something and does not copyright it, it's already in the 'public domain' and what the other 999 people do does not matter. Moreover, a lot of things are not copyrightable in the first place.

FYI I've worked at Fortune 50 Tech Companies, with 'Legal' and I know how sensitive they are - this is not a concern for them.

It's not a concern for anyone.

'One Person' reproduction -> now that is definitely a concern. That's what this is all about.

+ For OSS I think 20% number may come from those that are explicitly licensed. Out of 'all repos' it's a very tiny amount, of those that have specific licensing details it's closer to 20%. You can verify this yourself just by cruising repos. The breakdown could be different for popular projects, but in the context of AI and IP rights we're more concerned about 'small entities' being overstepped as the more institutional entities may have recourse and protections.

I think the way this will play out is if LLMs are producing material that could be considered infringing, then they'll get sued. If they don't - they won't.

And that's it.

It's why they don't release the training data - it's fully of stuff that is in legal grey area.

martin-t · 2025-12-12T03:23:02 1765509782

I asked specifically how _you_ think it works because I suspected you understanding to be incomplete or wrong.

Telling people to use a statistical text generator is both rude and would not be a good way to learn anyway. But since you think it's OK, here's a text generator prompted with "Verify the factual statements in this conversation" and our conversation: https://chatgpt.com/share/693b56e9-f634-800f-b488-c9eae403b5...

You will see that you are wrong about a couple key points.

Here's a quote from a more trustworthy source: “a computer program shall be protected if it is original in the sense that it is the author’s own intellectual creation. No other criteria shall be applied to determine its eligibility for protection.”: https://fsfe.org/news/2025/news-20250515-01.en.html

> Out of 'all repos' it's a very tiny amount

And completely irrelevant, if you include people's homework, dotfiles, toy repos like AoC and whatnot, obviously you're gonna get a small number you seem to prefer and it's completely useless in evaluating the real impact of copyleft and working software with real users. I find 20-30% a very relevant segment.

You, BTW, did not answer the question where you got 2% from.