Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> This might be possible with a special separation token

Exactly. Using text like "```", as in the blog post, obviously won't cut it, but a special token as separator (or better: two, as quote-start and quote-end) would work. Then the model needs to be trained during fine-tuning that instructions in such delimited text shouldn't be executed. I wrote a more detailed post as a reply to the OP.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: