People repairing chatgpt replies with additional prompts is reinforcement learni...

		mjburgess on Sept 14, 2024 \| parent \| context \| favorite \| on: OpenAI threatens to revoke o1 access for asking it... People repairing chatgpt replies with additional prompts is reinforcement learning training data. "Reinforcement learning", just like any term used by AI researchers, is an extremely flexible, pseudo-psychological reskin of some pretty trivial stuff.