This is a bad-faith argument, but even if I were to indulge it: human artists can/do get sued for mimicing the works of others for profit, which AI precisely does. Secondly, many of the works in question have explicit copyright terms that prohibit derivative works. They have built a multi-billion dollar industry on scaled theft. I don't see a more charitable interpretation.
You can't call something a bad-faith argument just because you disagree with it. I mean, you can, but it's not at all convincing.
As I said, if AI companies reproduce copyrighted works, they should be sued, just like a human artist would be. I haven't experienced that in my interactions with LLMs, but I've never really tried to achieve that result either. I don't really pirate anymore, but torrents are a much easier and cheaper way to do copyright infringement than using an AI tool.
LLMs don’t have to be able to mimic things. And go ahead and sue OpenAI and Anthropic! It won’t bother me at all. Fleece those guys. Take their money. It won’t stop LLMs, even if we bankrupted OpenAI and Anthropic.
This argument warrants introspection for "crusty devs", but also has holes. A compiler is tightly engineered and dependable. I have never had to write assembly because I know that my compiled code 100% represents my abstract code and any functional problems are in my abstract code. That is not true in AI coding. Additionally, AI coding is not just an abstraction over code, but an abstraction over understanding. When my code compiles, I don't need to worry that the compiler misunderstood my intention.
I'm not saying AI is not a useful abstraction, but I am saying that it is not a trustworthy one.
Exactly. All of the people in comments here thinking this has any impact on right to repair or open source are thoroughly kidding themselves. Lawmakers don't get out of bed in the morning to fight for nerds or the working class.
Fully agree. Also inherent to the design is distillation and interpolation...meaning that even with perfect data and governing so that outputs are deterministic, the outputs will still be an imperfect distillation of the data, interpolated into a response to the prompt. That is a "bug" by design
Ignoring how ridiculous and impractical this idea is, it fails to capture some of the most important skills in being a developer. Framing real-world problems as code problems. Anticipating design issues. Knowing the right trade-off between solution correctness, complexity,and effort. Mentoring and accelerating others. This is barely different than leetcode interviews.
Yes, but it does limit the impact of the attack. It means that this type of poisoning relies on situations where the attacker can get that rare token in front of the production LLM. Admittedly, there are still a lot of scenarios where that is possible.
If you know the domain the LLM operates in it’s probably fairly easy.
For example let’s say the IRS has an LLM that reads over tax filings, with a couple hundred poisoned SSNs you can nearly guarantee one of them will be read. And it’s not going to be that hard to poison a few hundred specific SSNs.
Same thing goes for rare but known to exist names, addresses etc…
But to what end? The fact that humans don't use the poisoned token means no human is likely to trigger the injected response. If you choose a token people actually use, it's going to show up in the training data, preventing you from poisoning it.
It's more feasible to think of the risks in one narrow context/use case.
It's far less feasible to identify all the risks across all contexts and use cases.
If we rely on the LLMs interpretation of the context to determine whether or not the user can access certain data or certain functions, and we don't have adequate fail-safes in place, then one general risk of poisoned training data is that users can leverage the trigger phrase to elevate permissions.
If we wanted to subsidize internet for rural and low-income communities responsibly, we could invest in fiber and other solutions, and control the externalities (this is exactly the ReConnect program is). Starlink is not that, it is a classic case of privatizing profits by socializing hidden externalities, in this case to the entire world. Externalities in the form of pollution that will cost us all more than fiber in the long run. Funny story though, Starlink was awarded a $900M subsidy to provide rural USA internet access. In the end, that money was not given because the FCC found that Starlink "failed to demonstrate that the providers could deliver the promised service.". So no, it is not about screwing rural people, it's about not getting taken advantage of by fat cats and grifters like Elon.
> If we wanted to subsidize internet for rural and low-income communities responsibly, we could invest in fiber and other solutions, and control the externalities
Running cables across out land is less impactful than lofting satellites?
Per the article, Starlink runs 8k satellites with an average life of 5 years. They launch in payloads of 20-40 satellites. That's 50+ launches per year if everything goes perfectly. About a million pounds of kerosene per launch. Plus everything else that goes into the rockets and satellites. Then the pollution impact from the launches and reentries. Then the eventual need to clean up LOE to avoid Kessler Syndrome. So yeah, well understood ground tech may be cheaper over the lifecycle. At a minimum, it should be a reasoned choice, not environmental debt pawned off by the richest man in the world.
...but sure, for the sake of argument, maybe it's only a quarter million lbs of kerosene 50 times a year, upper atmospheric pollution, and LEO crowding that gets solved by HN comments. ...instead of a dumb cable that doesn't come with a side of funding a billionaire neo-nazi. My bad.
The last mile problem is difficult and expensive. I think satellites are a good solution to it. As for SpaceX fucking up that contract, that sucks and is no good.
Ok, not peeking at the comments yet, but I am going to predict the "put more AI on it" people will recommend solving it by putting more AI on it. Please don't disappoint!
reply