I assume everyone knows this, but the idea of generating answers and testing the...

I assume everyone knows this, but the idea of generating answers and testing them, dates back decades, and has been widely used for problems where generating _the_ correct answer(s) is difficult, but where generating a bunch of potential answers--(at least) one of which is likely correct--is easier. Generate-and-test of course relies on having a test algorithm that is reliable, (relatively) fast, and memory efficient, and is most useful when an exact generate algorithm (one that generated only the correct answer(s)) is either slow or inefficient of memory use (or both).

In the case described, the generator is an LLM, and the tester (called a "verifier") is "the compiler, linter, SAT solver, ground truth dataset, etc."

And of course generate-and-test is related to trial-and-error, which has probably existed since the Paleolithic.