Reframe this as scaling test time compute using a human in the loop as the rewar...

		deepsquirrelnet on Jan 3, 2025 \| parent \| context \| favorite \| on: Can LLMs write better code if you keep asking them... Reframe this as scaling test time compute using a human in the loop as the reward model. o1 is effectively trying to take a pass at automating that manual effort.