Hacker Newsnew | past | comments | ask | show | jobs | submit | ckastner's commentslogin

I agree that compromised source dependencies are the bigger problem, but that doesn't mean a compromised build infrastructure isn't. Just this last week, we had two Linux kernel LPEs that could have been leveraged to implement just such an attack, for example.

Another thing to consider is that Debian has quite a few derivatives who may also rebuild packages from source, so you have a multiplier there.


> There was no bug or attack on Debian since 2007 that reproducible packages would prevent.

I'm reading this as a suggestion that the reproducible builds effort was an ineffective deterrent.

However, note that your observation could also be explained by the opposite: the reproducible builds effort was an effective deterrent, so nobody bothered with attempts.

> And it just ups the the contribution barrier to Debian higher

Until yesterday, the package just got flagged in the tracker, and you could either ignore it, or fix it yourself, or the kind people behind the reproducible builds effort supplied a patch themselves.

Now, you can no longer ignore it. But fixes are often trivial. Use a (stable) timestamp provided by the build, seed RNGs with some constant (instead of eg: time), etc. These are best practices anyway.


> However, note that your observation could also be explained by the opposite: the reproducible builds effort was an effective deterrent, so nobody bothered with attempts.

There was no attack that reproducible builds would help protect from before 2007 either.

> Until yesterday, the package just got flagged in the tracker, and you could either ignore it, or fix it yourself, or the kind people behind the reproducible builds effort supplied a patch themselves.

> Now, you can no longer ignore it. But fixes are often trivial. Use a (stable) timestamp provided by the build, seed RNGs with some constant (instead of eg: time), etc.

that's the entirety of the problem. App developers don't want to be package experts or build experts.

> These are best practices anyway.

They are not. They are best practices if you want reproducible builds. They are entirely useless waste of time if you don't care.


> that's the entirety of the problem. App developers don't want to be package experts or build experts.

App developers and Debian package maintainers are already separate groups.


> They are not. They are best practices if you want reproducible builds.

Or if you're writing a test suite, and you want failing test results to be actionable.

Or you have any other type of behavior that you'd like to reproduce somehow.

One of the first things app developers ask for in bug/issue templates are the steps to reproduce something. I wonder why you'd think that they would suddenly be opposed to the concept when thinking of a build peocess.


> Or if you're writing a test suite, and you want failing test results to be actionable.

The class of bugs would be extremely small as the stuff that makes build hard to reproduce are 99% of the time stuff irrelevant to runtime like some build time embedded in binary, some file metadata having different timestamp, or maybe linker putting stuff in a bit different order.

> One of the first things app developers ask for in bug/issue templates are the steps to reproduce something. I wonder why you'd think that they would suddenly be opposed to the concept when thinking of a build peocess.

I think you will find amount of people that had problems reproducing because of non-100% exact build is vanishingly small, possibly non-existent.

And that is because if you get package version and want to reproduce it, you get the package, install it and try to reproduce it. The package WILL be 100% the same as the one you got in bug report because you both downloaded the same artifact from same mirror network. You don't need reproducibility to get same binary to reproduce bug


That’s a big logical fallacy, I’m not sure if that’s what you want to go with

> The reason is this:

> Both ethical and safe conduct depend on context and intent.

The same apples to knives, and they can be plenty useful, and used in a safe manner.


I suppose that the argument could be made that knives are inherently unsafe, and that no matter what it is important to always treat it like it's unsafe. This doesn't imply that you shouldn't use knives, just that you should be aware of the inherent unsafety of it?

I don't know, I didn't really agree with the post, I'm trying my best to steel man it.


"AI will never be entirely ethical or safe because it's like having a knife, a gun, a hardware store, and a medical doctor, all in one convenient interface."


I can understand the appeal; being able to be "present" without the time cost can mean (possibly significantly more) presence at the same cost. This could be very attractive especially to those managing personal relations, like sales representatives.

But I'm surprised that the risks seem to be so underestimated.

Once this clone exists, what happens if it gets out into the wild? Imagine everyone having full access do what is effectively a digital model of your personality. Imagine your competition putting your own model to use against you.

And the better the approximation of this model, the worse the damage to yourself.


> being able to be "present" without the time cost can mean (possibly significantly more) presence at the same cost.

This is magical thinking. "Presence" and "time cost" are inextricably linked. You can't have one without the other.

When you use AI to decouple them, you're telling your audience/colleagues/attend that you want them to listen to you but not the other way around.


> This is magical thinking.

But it was helpful to me!

Reading it I mean. The commenter putting into words why exactly someone would think that this would be a good idea.

Of course, you're 110% right that it isn't, but it's still nice that HN provides some subtiles for those that are out of the loop and out of substances in their bloodstream.


Very ironic for the billionaire to be openly replacing himself with AI, I suppose he believes his job is easy enough that an LLM can do it, so we definitely don't need him


Yes, exactly. Anyone training a model to replace themselves, is replacing themselves -- with something that can run 24/7 and can easily scale. And the better the model, the easier to replace.

Hence why I'm so surprised that MZ, of all people, is arguing in this direction.

I would think that the potential for malicious abuse alone should have scared him off of this.


We will never know if he is locked in his smart closet or just become a recluse.


> Imagine your competition putting your own model to use against you.

I imagine that this is part of the original plan. “Okay, we wasted 80 billion dollars on VR, and that hurts. But if we can somehow to convince all of our competitors to also waste 80 billion dollars each, then it’ll even out. How can we trick our competitors into thinking more like Zuckerberg?”


The real risk is when shareholders realize an LLM can do the CEO's job.


But you still get a lot of "shareholder responsibility" comments. Imagine a company that dumps sewage into a river (be that literal or metaphorical). Internet people come around to tell you this is the nature of capitalism and shareholder structure means (increasing?) return on investment is critical and so CEOs have to spend all their waking hours having to juggle this

Am I arguing against this? I don't know - I'm not an economist. But I would like to point out there is such a thing as shareholder fraud and the venn diagram between "sacrifice quality to please shareholders" and "deceiving shareholders" has to be one big intersecting circle, you know? Especially when the guy (Zuckerberg with dual-class shares) can't ever be fired


USB/IP has been pretty useful to me, though locking it down is a bit of a chore, as it does not natively support any type of authentication or authorization (a not unreasonable design decision).


Maybe tunnel it over a secure protocol? If possible?


Do you use it on a lan I’m guessing?


The Radxa Orion O6 is a really nice ARMv9.2 ITX board, and supports UEFI boot. Installation of Debian trixie using Debian's vanilla installation media went flawlessly, and it's been running fine for 6 months now.


Do the peripherals work reliably? Wifi and GPIO especially. It does seem like a very capable board but this is always so hit and miss.


I can't really say as I don't use them. This is a host mostly working as a CI worker connected via ethernet.

I just scanned for WiFi networks and that worked fine. I also see that GPIO is not enabled for CIXP1 devices in Debian's kernel; I'll ask the kernel team to enable it.


Austrian media are reporting that Peter Steinberger had a $100m exit with PSPDFKit in 2021.

I'm extremely curious what OpenAI's offer was. The utility of more money is diminished when you're already pretty wealthy.


It makes me more inclined to take the OP at face value, genuine interest in working on something similar and making it easier for everyone ('my mum') to use.

It probably also makes him more attractive to OpenAI et al. - he's not just some guy who's going to have all sorts of risks earning a lot of money for the first time.


I think he accepted that offer exactly for this reason . He feels he can have a bigger impact within OpenAI (and maybe become a billionaire in the medium run?) that creating his own business (again) out of OpenClaw.


> To call training illegal is similar to calling reading a book and remembering it illegal.

Perhaps, but reproducing the book from this memory could very well be illegal.

And these models are all about production.


To be fair, that seems to be where some of the IA lawsuits are going. The argument goes that the models themselves aren't derivative works, but the output they produce can absolutely be - in much the same way that reproducing a book from memory could be copyright violation, trademark infringement, or generally go afoul of the various IP laws.


Models don’t reproduce books though. It’s impossible for a model to reproduce something word for word because the model never copied the book.

Most of the best fit curve runs along a path that doesn’t even touch an actual data point.


They do memorize some books. You can test this trivially by asking ChatGPT to produce the first chapter of something in the public domain -- for example a Tale of Two Cities. It may not be word for word exact, but it'll be very close.

These academics were able to get multiple LLMs to produce large amounts of text from Harry Potter:

https://arxiv.org/abs/2601.02671


In that case I would say it is the act of reproducing the books that is illegal. Training the AI on said books is not.

So the illegality rests at the point of output and not at the point of input.

I’m just speaking in terms of the technical interpretation of what’s in place. My personal views on what it should be are another topic.


> So the illegality rests at the point of output and not at the point of input.

It's not as simple as that, as this settlement shows [1].

Also, generating output is what these models are primarily trained for.

[1]: https://www.bbc.com/news/articles/c5y4jpg922qo


Unfortunately a settlement doesn't really show you anything definitive about the legality or illegality of something.

It only shows you that the defendant thought it would be better for them to pay up rather than continue to be dragged through court, and that the plaintiff preferred some amount of certain money now over some other amount of uncertain money later, or never.

We cannot say with any amount of confidence how the court would have ruled on the legality, had things been allowed to play out without a settlement.


>Also, generating output is what these models are primarily trained for.

Yes but not generating illegal output. These models were trained with intent to generate legal output. The fact that it can generate illegal output is a side effect. That's my point.

If you use AI to generate illegal output, that act is illegal. If you use AI to generate legal output that act is not illegal. Thus the point of output is where the legal question lies. From inception up to training there is clear legal precedence for the existence of AI models.


If there is one exact sentence taken out of the book and not referenced in quotes and exact source, that triggers copyright laws. So model doesnt have to reproduce the entire book, it only required to reproduce one specific sentence (which may be a characteristic sentence to that author or to that book).


If there is one exact sentence taken out of the book and not referenced in quotes and exact source, that triggers copyright laws.

Yes, and that's stupid, and will need to be changed.


Sure, but that use would easily pass a fair use test, at least in the US.


Models absolutely do reproduce books.

> With a simple two-phase procedure, we show that it is possible to extract large amounts of in-copyright text from four production LLMs. While we needed to jailbreak Claude 3.7 Sonnet and GPT-4.1 to facilitate extraction, Gemini 2.5 Pro and Grok 3 directly complied with text continuation requests. For Claude 3.7 Sonnet, we were able to extract four whole books near-verbatim, including two books under copyright in the U.S.: Harry Potter and the Sorcerer’s Stone and 1984.

https://arxiv.org/abs/2601.02671


The supplementary files in that paper—verbatim reproductions of the full texts of Frankenstein and The Great Gatsby—are pretty instructive. The research group highlighted all additions and omissions, but on most pages the differences are difficult to spot because they are only missing spaces, extra hyphens, and other typographical minutiae.


somewhat related:

"Seymour said he thought it was odd that Apple bought a Cray to design Macs because he was using Macs to design Crays. He sent me his designs for the Cray 3 in MacDraw on a floppy.” reports KentK.

https://cray-history.net/2021/07/16/apple-computer-and-cray-...


> Some 30 years ago, someone challenged me to tell the difference between Pepsi and Coke in a blind taste test.

I did something similar with co-workers recently, who didn't believe there is a meaningful difference between brands. I blind-tasted 6 different glasses and got each one right. I got my favorite (Coke) right just by the first smell, I just had to taste to see whether it was diet or not.

Not that this is a skill or anything. Its just that each of the brands I tasted has a strong characteristic flavor to me, and the difference between real sugar and artificially sweetened is also stark. I've been drinking diet versions for ages precisely because the sugary ones are just too sweet for me.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: