More

ProofHouse · 2026-01-26T18:02:16 1769450536

Is anyone a researcher here that has studied the proven ability to sneak malicious behavior into an LLM's weights (somewhat poisoning weights but I think the malicious behavior can go beyond that).

As I recall reading in 2025, it has been proven that an actor can inject a small number of carefully crafted, malicious examples into a training dataset. The model learns to associate a specific 'trigger' (e.g. a rare phrase, specific string of characters, or even a subtle semantic instruction) with a malicious response. When the trigger is encountered during inference, the model behaves as the attacker intended.You can also directly modify a small number of model parameters to efficiently implement backdoors while preserving overall performance and still make the backdoor more difficult to detect through standard analysis. Further, can do tokenizer manipulation and modify the tokenizer files to cause unexpected behavior, such as inflating API costs, degrading service, or weakening safety filters, without altering the model weights themselves. Not saying any of that is being done here, but seems like a good place to have that discussion.

mrandish · 2026-01-26T18:20:38 1769451638

> The model learns to associate a specific 'trigger' (e.g. a rare phrase, specific string of characters, or even a subtle semantic instruction) with a malicious response. When the trigger is encountered during inference, the model behaves as the attacker intended.

Reminiscent of the plot of 'The Manchurian Candidate' ("A political thriller about soldiers brainwashed through hypnosis to become assassins triggered by a specific key phrase"). Apropos given the context.

fragmede · 2026-01-26T18:31:50 1769452310

In that area, https://arxiv.org/html/2507.06850v3 was pretty interesting imo.

ProofHouse · 2026-01-23T02:02:12 1769133732

Scamthropic at it again

ProofHouse · 2026-01-14T17:07:38 1768410458

Yup. Avoid Google services especially map at all costs

ProofHouse · 2026-01-14T17:07:00 1768410420

That map pricing change stole literally months of my life having to rip out google from an app I spend months building it into. F Google

ProofHouse · 2026-01-12T18:35:10 1768242910

It tells you how bad their product management and engineering team is that they haven’t just decided to kill Siri and start from scratch. Siri is utterly awful and that’s an understatement, for at least half a decade.

ProofHouse · 2026-01-12T18:33:54 1768242834

Huge mistake. That’s what they specialize in though

ProofHouse · 2026-01-02T16:17:37 1767370657

AI but does it matter, no

ramon156 · 2026-01-02T17:51:01 1767376261

Thanks for sharing your opinion

ProofHouse · 2025-12-27T17:37:50 1766857070

And it was a no brainer

ProofHouse · 2025-12-27T15:46:06 1766850366

Thank you! Shame all these big corps that do this forever. Meta #1, Apple # 2, psuedo fake journalists # 3

ProofHouse · 2025-12-25T01:27:55 1766626075

God bless you. Wishing you joy.