I understood everything except: > reinforcement though (LLM supervising LLM). Is...

summarity · on April 11, 2023

I don't think this approach is formalized but I can give a few examples:

A) Prompt leak prevention: chunk and embed LLM responses, than compare against original prompt to filter out chunks that leak the prompt

B) Automatic prompt refinement: Prompt a cheap model, use an expensive model to judge the output and rewrite the prompt (this is in part how Vicuna[1] did eval for their LLaMa fine-tuning)

Basically using LLMs in the feedback loop.

[1] https://vicuna.lmsys.org