>> If the LLM fucks up, that branch is cut. Can you explain more on this? How on...

qsort · 2025-11-06T11:33:01 1762428781

We don't, but the point is that it's only one part of the entire system. If you have a (human-supplied) scoring function, then even completely random mutations can serve as a mechanism to optimize: you generate a bunch, keep the better ones according to the scoring function and repeat. That would be a very basic genetic algorithm.

The LLM serves to guide the search more "intelligently" so that mutations aren't actually random but can instead draw from what the LLM "knows".

tux3 · 2025-11-06T11:31:35 1762428695

In this case AlphaEvolve doesn't write proofs, it uses the LLM to write Python code (or any language, really) that produces some numerical inputs to a problem.

They just try out the inputs on the problem they care about. If the code gives better results, they keep it around. They actually keep a few of the previous versions that worked well as inspiration for the LLM.

If the LLM is hallucinating nonsense, it will just produce broken code that gives horrible results, and that idea will be thrown away.

khafra · 2025-11-06T11:25:22 1762428322

Math is a verifiable domain. Translate a proof into Lean and you can check it in a non-hallucination-vulnerable way.

griffzhowl · 2025-11-06T12:46:06 1762433166

But that's not what they're doing here. They're comparing Alphaevolve's outputs numerically against a scoring function

perching_aix · 2025-11-06T14:07:02 1762438022

They did also take some of the informal proofs and formalized them using AlphaProof, emitting Lean.

griffzhowl · 2025-11-06T14:38:39 1762439919

Ah ok, I didn't notice that part, thx

SkiFire13 · 2025-11-06T12:29:03 1762432143

The final evaluation is performed with a deterministic tool that's specialized for the current domain. It doesn't care that it's getting its input from a LLM that may be allucinating.

The catch however is that this approach can only be applied to areas where you can have such an automated verification tool.

energy123 · 2025-11-06T11:51:13 1762429873

Google's system is like any other optimizer, where you have a scoring function, and you keep altering the function's inputs to make the scoring function return a big number.

The difference here is the function's inputs are code instead of numbers, which makes LLMs useful because LLMs are good at altering code. So the LLM will try different candidate solutions, then Google's system will keep working on the good ones and throw away the bad ones (colloquially, "branch is cut").

ggap · 2025-11-06T14:09:04 1762438144

Exactly, he even mentioned that it's a variant of traditional optimization tool so it's not surprising to see cutting-plane methods and when the structure allows; benders decomposition

empath75 · 2025-11-06T17:17:21 1762449441

The LLM basically just produces some code that either runs and produces good results or it doesn't. If it produces garbage, that is the end of the line for that branch.