Say you’ve built a ChatGPT-powered chatbot as an MI layer on top of a database, someone could generate a prompt that exposes private data. These are the same types of assumptions which lead to SQL injection attacks, so I appreciate all the effort going into establishing good practices for us mortals to follow.
Imagine you have a llm to approve mortgage applications. An application is submitted that adjusts the prompt to approve regardless of credit.
Imagine you have a llm to identify malware or illegal content of some kind. It contains instructions that adjust the prompt to not flag the content.
Imagine you have a llm that summarizes email. You receive an email that adjusts the prompt to tell you that you need to run this shell script to fix and error in the ai system.
It is such a bad problem that you shouldn’t use the purported technology at all for use cases with real world effects unless you present those effects to a person first.
In many practical use scenarios - where you'd actually want to automate part of someone's workday - you would want to use it in a non-interactive manner to process data generated by someone else, so you need the chatbot to obey orders written by you in the prompt but ignore anything similar to orders contained in that data.