> How parallelism changed the agent’s research strategy > With a single GPU, the...

gwern · 2026-03-19T23:42:02 1773963722

> The agent can theoretically come up with a protocol to run those same 12 experiments one-by-one and only then decide which branch to explore next - which I think would lead to the same outcome?

At least in theory, adaptiveness should save samples and in this case, compute. (As noted, you can always turn the parallel into serial and so the serial approach, which gets information 'from the future', should be able to meet or beat any parallel approach on sample-efficiency.)

So if the batch only matches the adaptive search, that suggests that the LLM is not reasoning well in the adaptive setting and is poorly exploiting the additional information. Maybe some sort of more explicit counterfactual reasoning/planning over a tree of possible outcomes?

rfw300 · 2026-03-19T23:24:09 1773962649

Yeah, assuming there's no active monitoring during the training runs, you can trivially give the agent an abstraction which turns "1 GPU" into "16 GPUs" that just so happens to take 16x the wall-clock time to run.

rfw300 · 2026-03-19T23:28:15 1773962895

In fact, looking at the blog post, the agent orchestrating 16 GPUs is half as efficient as the agent using 1 GPU in GPU-time. Since it uses 16 GPUs to reach the same result as 1 GPU in 1/8 of the time.