More

Davidzheng · 2026-02-16T02:10:02 1771207802

I think it's because he has a young child too. But I don't think there's an edge.

Davidzheng · 2026-02-15T11:03:13 1771153393

Yes! I also don't know why losing zulip is so insanely slow!

Davidzheng · 2026-02-15T09:43:09 1771148589

Am i misunderstanding? spending more tokens certainly is not optimizing for compute cost. It's the opposite

Davidzheng · 2026-02-14T06:49:18 1771051758

Note that they are claiming 6/10 solutions (yet to be independently verified).

Davidzheng · 2026-02-13T22:38:28 1771022308

It's difficult for me to express this view, which I hold genuinely, without reading as lacking in humanity. However, I think it would be disastrous for humanity as a whole if we eliminate disease completely. To fight against it and to make progress in that fight is of course deeply human. And we are all affected emotionally and personally by disease of all forms. But if we win the fight against disease, I am almost sure that the human race will just end as a (long term) consequence.

nathan_douglas · 2026-02-13T23:57:21 1771027041

Could you elaborate? How do you see this playing out? Is this unique to disease or do you believe it's also true of other forms of suffering, e.g. poverty?

Davidzheng · 2026-02-14T00:18:27 1771028307

Well I think anything which gives humans unbounded lifespans is probably going end human civilization long term. So I don't think eliminating poverty is dangerous in a similar way no.

nathan_douglas · 2026-02-14T00:46:16 1771029976

Because of resource exhaustion or a spiritual crisis or something else/something in addition?

Davidzheng · 2026-02-13T19:55:18 1771012518

No it's not a February 2020 moment for sure. In February 2020, most people had heard of COVID and a few scattered outbreaks happened, but people generally viewed the topic as more of a curiosity (like major world news but not necessarily something that will deeply impact them). This is more like start of March 2020 for general awareness.

Davidzheng · 2026-02-13T19:38:12 1771011492

My understanding is there's been around 10 erdos problems solved by GPT by now. Most of them have been found to be either in literature or a very similar problem was solved in literature. But one or two solutions are quite novel.

https://github.com/teorth/erdosproblems/wiki/AI-contribution... may be useful

carefree-bob · 2026-02-13T21:44:55 1771019095

I am not aware of any unsolved Erdos problem that was solved via an LLM. I am aware of LLMs contributing to variations on known proofs of previously solved Erdos problems. But the issue with having an LLM combine existing solutions or modify existing published solutions is that the previous solutions are in the training data of the LLM, and in general there are many options to make variations on known proofs. Most proofs go through many iterations and simplifications over time, most of which are not sufficiently novel to even warrant publication. The proof you read in a textbook is likely a highly revised and simplified proof of what was first published.

If I'm wrong, please let me know which previously unsolved problem was solved, I would be genuinely curious to see an example of that.

Davidzheng · 2026-02-13T22:01:30 1771020090

It's in the link above, but you can look at #1051 or #851 on the erdosproblems website.

carefree-bob · 2026-02-13T22:35:36 1771022136

The erdosproblems website shows 851 was proved in 1934. https://www.erdosproblems.com/851

I guess 1051 qualifies - from the paper: "Semi-autonomous mathematical discovery with gemini" https://arxiv.org/pdf/2601.22401

"We tentatively believe Aletheia’s solution to Erdős-1051 represents an early example of an AI system autonomously resolving a slightly non-trivial open Erdős problem of somewhat broader (mild) mathematical interest, for which there exists past literature on closely-related problems [KN16], but none fully resolves Erdős-1051. Moreover, it does not appear to us that Aletheia’s solution is directly inspired by any previous human argument (unlike in many previously discussed cases), but it does appear to involve a classical idea of moving to the series tail and applying Mahler’s criterion. The solution to Erdős-1051 was generalized further, in a collaborative effort by Aletheia together with human mathematicians and Gemini Deep Think, to produce the research paper [BKK+26]."

Davidzheng · 2026-02-13T23:00:18 1771023618

"The erdosproblems website shows 851 was proved in 1934." I disagree with this characterization of the Erdos problem. The statement proven in 1934 was weaker. As evidence for this, you can see that Erdos posed this problem after 1934.

carefree-bob · 2026-02-15T04:30:54 1771129854

You recommended I look at the erdosproblems website.

But evidence that it was posed after 1934 is not really evidence it was not solved, because one of the things we learned from LLMs was that many of these problems were already solved in the literature, or are relatively straightforward applications of known, yet obscure, results. Particularly in the world of Erdos problems, the majority of which can be described as "off the beaten path" and are basically musings in papers that Erdos was asking -- many of these are in fact solved in more obscure articles and no one made the connection until LLMs allowed us to do systematic literature searches. This was the primary source of "solutions" of these problems by LLMs in the cited paper.

Davidzheng · 2026-02-15T04:45:44 1771130744

The Erdos Problem site also does not say it was solved in 1934. If you read the full sentence there, it refers to a different statement proven which is related.

emp17344 · 2026-02-13T20:29:35 1771014575

Some of these were initially hyped as novel solutions, and then were quietly downgraded after it was discovered the solutions weren’t actually novel.

Insanity · 2026-02-13T21:02:37 1771016557

Yeah that was also my take-away when I was following the developments on it. But then again I don't follow it very closely so _maybe_ some novel solutions are discovered. But given how LLMs work, I'm skeptical about that.

compass_copium · 2026-02-13T22:06:37 1771020397

...am I wrong in thinking that 1(a) is the relevant section here, and shows a lot of red?

Davidzheng · 2026-02-13T22:20:01 1771021201

I honestly don't see the point of the red data points. By now all the erdos problems have been attempted by AIs--so every unsolved one can be a red data point.

compass_copium · 2026-02-14T02:16:14 1771035374

The post's author points that out as well

Davidzheng · 2026-02-13T19:36:05 1771011365

"An internal scaffolded version of GPT‑5.2 then spent roughly 12 hours reasoning through the problem, coming up with the same formula and producing a formal proof of its validity."

When I use GPT 5.2 Thinking Extended, it gave me the impression that it's consistent enough/has a low enough rate of errors (or enough error correcting ability) to autonomously do math/physics for many hours if it were allowed to [but I guess the Extended time cuts off around 30 minute mark and Pro maybe 1-2 hours]. It's good to see some confirmation of that impression here. I hope scientists/mathematicians at large will be able to play with tools which think at this time-scale soon and see how much capabilities these machines really have.

mmaunder · 2026-02-13T19:40:22 1771011622

Yes and 5.3 and the latest codex cli client is incredibly good across compactions. Anyone know the methodology they're using to maintain state and manage context for a 12 hour run? It could be as simple as a single dense document and its own internal compaction algrorithm, I guess.

knicholes · 2026-02-13T20:11:24 1771013484

https://developers.openai.com/cookbook/articles/codex_exec_p... might be what you're looking for

slopusila · 2026-02-13T20:26:22 1771014382

after those 30 min you can manually ask it again to continue working on the problem

Davidzheng · 2026-02-13T20:54:44 1771016084

It's a bit unclear to me what happens if I do that after it thinks for 30 minutes and ends with no response. Does it start off where it left off? Does it start from scratch again? Like I don't know how the compaction of their prior thinking traces work

Davidzheng · 2026-02-13T05:28:31 1770960511

I'm not sure the benchmark is high enough quality that >80% of problems are well-specified & have correct labels tbh. (But I guess this question has been studied for these benchmarks)

Davidzheng · 2026-02-13T04:29:26 1770956966

Yeah i think that's a big missing piece still. Though it might be the last one

red75prime · 2026-02-13T04:45:11 1770957911

Episodic memory might be another piece, although it can be seen as part of continuous learning.