Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Why was Anthropic's interpretability work not discussed? Inconvenient for the conclusion?

https://www.anthropic.com/news/tracing-thoughts-language-mod...



The same work in which they show that the LLM doesn’t know what it "thinks"? or how it arrives at its conclusions where they demonstrate that it outputs what is statistically most probable? even though the logits indicate it was something else.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: