Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Sorry I did not understand that :-).

My point was that it is different: when humans read a book, they don't train a machine learning model. They can't read as many books as a machine, at the same speed, and they can't remember nearly as much as what a machine can.

Humans and computers are fundamentally different, and it matters. You can't conclude that because it works for one, it will fork for the other.



> Sorry I did not understand that :-)

You seemed to be saying that the differences I listed (quicker and more specific feedback) were the only differences. Those are both positive.

I was saying that some people may think there are negative differences as well.


Right. Yeah I did not express myself clearly, sorry :). You were saying "how is it different other than X and Y?", and I wanted to say that X and Y are already enough for me to consider them different.

I am actually on the side that LLMs are a big problem for copyright, and I don't want my code and blog posts to be used in their training dataset without my consent. To me, at this scale, it's not fair use. IMO it's a bit like if Facebook said that it is fair use to leverage metadata about their users, because "someone who sees you in a public space talking to a friend knows that you are talking with that person, and it is the same for Facebook on social media". My problem is not that Facebook knows that I sent a message to a friend now, but rather that they know who writes to whom and when, at scale.

Similarly my problem is not that somebody could read my blog post, learn from it, and write another blog post. My problem is that LLMs automatically train on all written material they want on the Internet, at scale, and without acknowledging that all that material has a lot of value (and is copyrighted).

I think fair use should somehow consider the scale.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: