Hacker Newsnew | past | comments | ask | show | jobs | submit | gooosle's commentslogin

So... it would be a lot cheaper to just buy all of the books?


Yes, much.

And they actually went and did that afterwards. They just pirated them first.


What is the HN term for this? "Bootstrapping" your start up? Or is it "growth-hacking" it?


The latter (I know you're joking, but...)

Bootstrapping in the startup world refers to starting a startup using only personal resources instead of using investors. Anthropic definitely had investors.


Bookstrapping


Where can I find source that says Anthropic bought the pirated books afterwards? I haven't seen this in any official document.

Also, do we know if the newer models were trained without the pirated books?


> Where can I find source that says Anthropic bought the pirated books afterwards? I haven't seen this in any official document.

https://storage.courtlistener.com/recap/gov.uscourts.cand.43...

> Also, do we know if the newer models were trained without the pirated books?

I'm pretty sure we do but I couldn't swear to it or quickly locate a source.


Thanks for the link.

Among several places where judge mentions Anthropic buying legit copies of books it pirated, probably this sentence is most relevant: "That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft but it may affect the extent of statutory damages."

But document does not say Anthropic bought EVERY book it pirated. Other sections in the document also don't explicitly say that EVERY pirated book was later purchased.

I stopped using Claude when this case came to light. If the newer Claude models don't use pirated books, I can resume using it.

When you say, "I'm pretty sure we do...", do you mean that pirated books were used, or were they not used?


> But document does not say Anthropic bought EVERY book it pirated

Yeah, I wouldn't make this exact claim either. For instance it's probably safe to assume that the pirate datasets contain some books that are out of circulation and which Anthropic happened not to get a used copy of.

They did happen to get every book published by any of the lead plaintiffs though, as a point towards them probably having pretty good coverage. And it does seem to have been an attempt to purchase "all" the books for reasonable approximate definitions of "all".

> When you say, "I'm pretty sure we do...", do you mean that pirated books were used, or were they not used?

I'm pretty sure pirated books were not used, but not certain, and I really don't remember when/why I formed that opinion.


That might be practically impossible given the number of rights holders worldwide


The permission to buy them was already settled by Google Books in the 00's.


They did, but only after they pirated the books to begin with.


Few. This settlement potentially weakens all challenges to the use of copyrighted works in training LLM's. I'd be shocked if behind closed doors there wasn't some give and take on the matter between Executives/investors.

A settlement means the claimants no longer have a claim, which means if they're also part of- say, the New York Times affiliated lawsuit- they have to withdraw. A neat way of kneecapping a country wide decision that LLM training on copy written material is subject to punitive measures don't you think?


That's not even remotely true. Page 4 of the settlement describes released claims which only relate to the pirating of books. Again, the amount of misinformation and misunderstanding I see in copyright related threads here ASTOUNDS.


Did you miss the "also" how about "adjacent"? I won't pretend to understand the legal minutia, but reading the settlement doesn't mean you do either.

In my experience&training in a fintech corp- Accepting a settlement in any suit weakens your defense- but prevents a judgement and future claims for the same claims from the same claimants (a la double jeopardy). So, again- at minimum- this prevents an actual judgement. Which, likely would be positive for the NYT (and adjacent) cases.


I'm not sure how your confusion about what's going on is being projected to me. What about "also" what about "adjacent"?

>In my experience&training in a fintech corp- Accepting a settlement in any suit weakens your defense- but prevents a judgement and future claims for the same claims from the same claimants (a la double jeopardy). So, again- at minimum- this prevents an actual judgement. Which, likely would be positive for the NYT (and adjacent) cases.

Okay? I'm an IP litigator and you clearly have no idea what you're talking about. The only thing left to try in this case was the book library piracy. Alsup's fair use decision is just as relevant and is not mooted by the settlement and will be cited by anyone that thinks its favorable to them.


'sacrifice' what exactly lol?

Going the athlete route is generally easier and requires less time investment than going the academic route.


No, I don't think so.

Including practice squad players (who are paid peanuts), the NFL has about 2,240 roster spots. About a million US high school students play football any given year. The average NFL career is 3.3 years.

So from 3.3 graduating classes of a million high school football players, you'd expect somewhere around 0.07% to make it into the NFL. Fewer even will have something resembling a successful career.

Then look at how many high paying "smart person" jobs there are. There are about a million doctors in the US, two million engineers, and four million computer professionals.


Your logic/math doesn't make any sense.

Almost every student studies math, almost none of those ever become a mathematician - who are compensated much less than NFL players.

> There are about a million doctors in the US

Comparing # of doctors to # of NFL players is a very false equivalence. Try comparing # of athletes in all sports combined to number of doctors - that would be more reasonable. Or compare the number of brain surgeons to the number of NFL players - and the difficulty/time in becoming either.

Being a doctor is a much more stressful and difficult job, which requires more years of training/education and provides far more real value for society.


What point are you arguing here? That it’s more difficult to be a doctor than an NFL player? That is highly subjective depending on what one finds “difficult”. For instance my dentist in SF and even primary care or dermatologists don’t have very stressful jobs and although they had many years of schooling, they didn’t have to subject their bodies to the intense and unrelenting physical rigor of first HS football, THEN college football and then the NFL. I would say for low level or intense stress maybe surgeons, oncologists, anesthesiologists or cardiologists have more stress day to day but that’s just my subjective opinion as well, as to them it may be as easy as flying a kite.

So what do you consider difficult? Having a linebacker smash into you at a full sprint over and over in practice and then not choking in a real game? Or studying relentlessly, writing grants, doing essentially free work as a resident for years while being on call. They are just stressful in different ways, but again, different humans have more fitness for one or the other.


One of my best friends is now in a wheelchair for life, thanks to high school football.

You are always taking a risk. Sometimes it's just that it won't work out financially. Sometimes it's more serious than that. It's a small risk, but non-zero. Even successful football players suffer from much higher rates of mental health issues, among other poor health outcomes.


Just because someone takes risks, doesn't mean they deserve exorbitant compensation for that. Lots of people take risks all of the time for much less. Lots of sports, and even non-sport jobs are much more dangerous than playing in the NFL, and compensated much less.

Complaining about NFL players not making enough money is just funny to me.


I didn't say they deserved anything. You said that they don't sacrifice anything. They absolutely do. This is completely independent from a compensation discussion.

I do happen to think that players should probably make more, for the same reason that I think all workers should probably make more, not because of any particular risk reason.


> You said that they don't sacrifice anything.

No, I clearly didn't say that.


Sorry, I didn't realize that you were different than the person I originally replied to, who said

> 'sacrifice' what exactly lol?


> Think about these fantasy sports gambling apps, your coworkers who distract themselves from the drudgery of work by discussing sports, the financial institutions that take a transaction fee from money being passed around, the infinite scroll of content on social media, all because of athletes.

I don't see how you can claim gambling, and pointless discussion about some arbitrary game you (they) don't even play are positive. Financial transactions for the sake of financial transactions are also completely pointless.


> I don't see how you can claim gambling, and pointless discussion about some arbitrary game you (they) don't even play are positive.

By choosing different basis vectors? Not everyone's values match yours.


Your argument amounts to a meaningless tautology - 'everything that exists is good and valuable'.

Yeah, maybe, but that's neither useful nor interesting.

'Heroin addition is good and valuable to society - if you disagree it's because the addict's values just don't match yours'


> Your argument amounts to a meaningless tautology - 'everything that exists is good and valuable'.

It's unclear how this is related to what I said.

> 'Heroin addition is good and valuable to society - if you disagree it's because the addict's values just don't match yours'

What does it mean for something to be "good and valuable to society"? What is the "society" that is passing absolute judgement here? I think of society as a collection of people, and collections don't have values, individuals do.

Is it surprising the the values of someone choosing to take actions you consider repulsive are different than yours?


The main discussion point of this comment chain is around the practical benefit to society of the NFL.

Coming in and saying 'we can't judge the practical societal value of anything because groups of people don't have values' is both incorrect and does not argue either for or against NFL as having a practical value, or introduce any new argument or data into the discussion.

> repulsive

Spare me the poetics, you're the only one to talk about repulsiveness in this comment chain so far.


I wasn't really arguing about benefit to society though. I just said the gears of the economy turn on the back of such "distractions".

Benefit to society becomes a philosophical argument. Personally I don't value most forms of entertainment, gambling, etc. Humans only need food and whatever basic needs there are. I enjoy classical music but I would even argue that music is just noise at the end of the day. On a scale of heroin to Chopin, I'd put the NFL closer to Chopin.

Nevertheless, these seemingly "worthless" forms of sense-stimuli are supporting a huge portion of our livelihoods at the moment.

By the way Saquon Barkley can squat 600lbs. Surely that's of value, no?


> Nevertheless, these seemingly "worthless" forms of sense-stimuli are supporting a huge portion of our livelihoods at the moment.

'supporting' in what way?

> By the way Saquon Barkley can squat 600lbs. Surely that's of value, no?

It could be of value to him, not really of value to others or society at large.


How dare you steal these hn comments by copying them over to your PC using your browser? Thief!


It's more like telling a German name from an Austrian name.


Russia really isn't that similar to the rest of the Eastern Bloc. Even in language, the other Slavic languages are pretty close to one another, whereas Russian is pretty different. Same for culture etc. Russia is trying very hard to create the impression, that it's all the same, in the hopes of eventually occupying neighbors again, but it really isn't.


Russian is a Slavic language and very similar to other Slavic languages especially Ukrainian and Belarusian. Far closer than German and French. I have family across 5 different Slavic countries and hear various Slavic languages regularly. You're just spewing complete bs for political reasons apparently.


I speak most of the languages you listed on a conversational level, but sure, it's easier to cry about astroturfing than consider that you might be wrong.


Cool story bud.


Gemini 2.5 pro free limit is 100 requests per day.

https://ai.google.dev/gemini-api/docs/rate-limits


I'm getting consistently good results with Gemini CLI and the free 100 requests per day and 6 million tokens per day.

Note that you'll need to either authorize with a Google Account or with an API key from AI Studio, just be sure the API key is from an account where billing is disabled.

Also note that there are other rate limits for tokens per request and tokens per minute on the free plan that effectively prevent you from using the whole million token context window.

It's good to exit or /clear frequently so every request doesn't resubmit your entire history as context or you'll use up the token limits long before you hit 100 requests in a day.


Doesn't it swap to a lower power model after that?


Not automatically but you can switch to a lower power model and access more free requests. I think Gemini 2.5 Flash is 250 requests per day.


You planned and wrote a feature yesterday that would have taken yourself 2 whole days? And you already got it reviewed and deployed it and know that 'it works flawlessly'?

....

That reminds me of when my manager (a very smart, very AI-bullish ex-IC) told us about how he used AI to implement a feature over the weekend and all it took him was 20 mins. It sounds absolutely magical to me and I make a note to use AI more. I then go to review the PR, and of course there are multiple bugs and unintended side-effects in the code. Oh and there are like 8 commits spread over a 60 hour window... I manually spin up a PR which accomplishes the same thing properly... takes me 30mins.


This sounds like a positive outcome? A manager built a proof-of-concept of a feature that clearly laid out and fulfilled the basic requirements, and an engineer took 30 mins to rewrite it once it's been specified.

How long does it typically take to spec something out? I'd say more than 20 mins, and typical artifacts to define requirements are much lossier than actual code - even if that code is buggy and sloppy.


Not at all.

What was claimed was that a complete feature was built in record time with AI. What was actually built was a useless and buggy piece of junk that wasted reviewer time and was ultimately thrown out, and it took far longer than claimed.

There were no useful insights or speed up coming out of this code. I implemented the feature from scratch in 30 mins - because it was actually quite easy to do manually (<100 loc).


This seems more of a process problem than a tooling problem. Without specs on what the feature was, I would be inclined to say you manager had a lapse in his "smartness", there was a lot of miscommunication on what was happening, or you are being overly critical over something that "wasted 30 minutes of your time". Additionally, this seems like a crapshoot work environment...there seems to be resentment for the manager using AI to build a feature that had bugs/didn't work...whereas ideally you two sit down and talk it out and see how it could be managed better next time?


Not at all, there is no resentment - that's your imagination. There is nothing about what I described that indicates that it's a bad work environment - I quite like it.

You're bringing up various completely unrelated factors seemingly as a way of avoiding the obvious point of the anecdotal story - that AI for coding just isn't that great (yet).


How do you know whether your TLA+ model is accurate?


With one client I have, we know the TLA+ model is accurate because we're extracting tests directly from the spec. It's kind of a riff on what MongoDB does in this paper: https://arxiv.org/abs/2006.00915


The difference with Russia is that they are much worse at hiding their corruption and censorship.


Russia doesn't bang the drum of "free speech" ad nauseam the way US social media magnates do.


Strictly speaking, Russia has quite explicit free speech protections in its constitution. So much so that it separately covers freedom of speech and freedom of press, and in regard to the latter straight up says "censorship is prohibited".

Whenever this topic comes up, the government just nods at the document, as if it had any relation to the real world.


True. I was born in Russia and to be honest I wish Russia would at least "bang the drum of free speech" as well. If you pretend to have some values you actually make people start to believe in them a bit


I'm pretty sure Russia still preaches a lot of admirable things that it doesn't actually practice. Talk is cheap yet people will put stock in it anyway.


Sure, the 'free speech' propaganda is a conscious part of the (better/more effective) public opinion manipulation playbook.


In Canada you can just steal public funds if you're in the government with complete immunity. Set up a government program to support X (ex - greeness, gender education in congo, studying a random worm somewhere in asia, etc), then just transfer the funds directly to your own companies, and the companies of your friends and family [1]. We had like 5 major corruption scandals in the last 2 years - basically zero repercussions for those involved.

If you pick the right reason/name for X - anyone who crticises you can also automatically be labelled racist, dumb, fascist or whatever as well.

[1] https://en.m.wikipedia.org/wiki/Sustainable_Development_Tech...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: