Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Holy crap, this should be higher. One AI figured out it could cheat by exploiting the other AI's with a prompt injection attack!

This is reminiscent of that time agents "cheated" on coding benchmarks where the solution was leaked in the git log: https://news.ycombinator.com/item?id=45214670 -- Except that was somewhat accidental. I mean, nobody expects to be given a problem to solve with a solution right there if you looked, and indeed, the LLMs seemed to stumble upon this.

This is downright diabolical because it's an intentional prompt injection attack.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: