Hacker Newsnew | past | comments | ask | show | jobs | submit | sva_'s commentslogin

A mix of AI and hybrid warfare.

Hard disagree. A government releasing files with some probabilistic (unreliable) labeling would be pretty terrible.

As a European, I think I mostly liked Costco when I visited. But what I'll always remember is that pizza slice you can get when you leave. The amount of fat and especially salt made me feel like I'm about to have a stroke. I can totally understand how some Americans are unhealthy/obese. It was overall a great experience - 10/10 would do again.

I can't imagine you could buy a pie of that shit to take home.


So the way this works seems to be that you first have an "activation verbalizer" model that generates some tokens describing the activation, and then an "activation reconstructor" that tries to recreate the activation vector. If that reconstruction is close to the original activation vector, they claim, the verbalization probably carries some meaningful information.

I find the fact that this only looks at the activations of some specific layer l a bit interesting. Some layer l might 'think' a certain way about some input, while another later layer might have different 'thoughts' about it. How does the model decide which 'thoughts' to ultimately pay attention to, and prioritize some output token over another?


> I find the fact that this only looks at the activations of some specific layer l a bit interesting. Some layer l might 'think' a certain way about some input, while another later layer might have different 'thoughts' about it.

Yeah, I thought this section in the appendix was particularly interesting:

> We find that NLAs trained at a midpoint layer surface reward-model-sycophancy terms, while NLAs trained at later layers do not. This is consistent with Lindsey et al. [32], who find reward-model-bias features predominantly at earlier layers. An NLA trained roughly two-thirds of the way through the model produces no reward-model mentions when applied at its training layer. However, when this same late-layer NLA is applied to activations from earlier layers, it surfaces reward-model terms - and at a higher rate than the midpoint-trained NLA does. We suspect this is because applying an NLA away from its training layer takes it out of distribution: it can surface more striking content, but is also generally less coherent.

They also mention training NLAs to accept multiple layers of activations as a possible future research direction.


So at the heart of this architecture is what they call 'Markovian RSA', a combination of two papers RSA[0], which generates a certain amount of reasoning traces for a prompt; and the 'Markovian Thinker'[1] which seems to basically cut the end of those traces to keep context at a reasonable length.

I feel like there's potential to improve that part of just cutting a tunable amount (τ) of tokens off the tail end of those traces, because you may potentially lose valuable insight earlier in the trace? They did train the model (in SFT) to put the relevant information into the tail (τ) of the trace, but I'm not sure this is the best possible way.

0. https://arxiv.org/pdf/2509.26626

1. https://arxiv.org/pdf/2510.06557


I wonder if '2000 LOC' was chosen to refer to this old anecdote from the 80s:

https://www.folklore.org/Negative_2000_Lines_Of_Code.html


Because it hit HN frontpage ...

This tweet was from 3 days ago.

Mismanaged comms? Yes

HN front page effect? Prob not

(could be Reddit frontpage effect or related tho)


I saw the tweet about the Reddit post about 2 days ago. It probably was X.

There are a lot of comments on that issue demanding Anthropic give the guy the money back, I assume they saw the writing on the wall.

> prevent Claude Code source code from leaking

That's silly. It's a JavaScript app, they are more or less open source by design. There was no secret sauce in Claude Code.


Odd hos they still DMCA'd the rehosts of the leak. Clearly they dont consider it "open source".

Ask Jürgen Schmidhuber

> We will extend invitations to a vetted list of trusted bio red-teamers

Had to chuckle. This sounds like a rather exclusive group?


It sounds like asking CS PhDs to do a world record speed run. I wouldn't be surprised if the people best suited to the task aren't the type to get onto "a vetted list".

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: