Why was Anthropic's interpretability work not discussed? Inconvenient for the co... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		energy123 7 months ago \| parent \| context \| favorite \| on: Generative AI's failure to induce robust models of... Why was Anthropic's interpretability work not discussed? Inconvenient for the conclusion? https://www.anthropic.com/news/tracing-thoughts-language-mod...

lossolo 7 months ago [–]

The same work in which they show that the LLM doesn’t know what it "thinks"? or how it arrives at its conclusions where they demonstrate that it outputs what is statistically most probable? even though the logits indicate it was something else.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact