I don't think this approach is formalized but I can give a few examples:
A) Prompt leak prevention: chunk and embed LLM responses, than compare against original prompt to filter out chunks that leak the prompt
B) Automatic prompt refinement: Prompt a cheap model, use an expensive model to judge the output and rewrite the prompt (this is in part how Vicuna[1] did eval for their LLaMa fine-tuning)
> reinforcement though (LLM supervising LLM).
Is there something I can read to understand what that looks like?