The point is that current methods are unable to get more than the current state-...

geetee · 2025-09-04T00:47:00 1756946820

What does synthetic training data actually mean? Just saying the same things in different ways? It seems like we're training in a way that's just not sustainable.

reasonableklout · 2025-09-04T01:00:53 1756947653

One example: when we want to increase performance on a task which can be automatically verified, we can often generate synthetic training data by having the current, imperfect models attempt the task lots of times, then pick out the first attempt that works. For instance, given a programming problem, we might write a program skeleton and unit tests for the expected behavior. GPT-5 might take 100 attempts to produce a working program; the hope is that GPT-6 would train on the working attempt and therefore take much less attempts to solve similar problems.

As you suggest, this costs lots of time and compute. But it's produced breakthroughs in the past (see AlphaGo Zero self-play) and is now supposedly a standard part of model post-training at the big labs.