It's difficult for me to express this view, which I hold genuinely, without reading as lacking in humanity. However, I think it would be disastrous for humanity as a whole if we eliminate disease completely. To fight against it and to make progress in that fight is of course deeply human. And we are all affected emotionally and personally by disease of all forms. But if we win the fight against disease, I am almost sure that the human race will just end as a (long term) consequence.
Could you elaborate? How do you see this playing out? Is this unique to disease or do you believe it's also true of other forms of suffering, e.g. poverty?
Well I think anything which gives humans unbounded lifespans is probably going end human civilization long term. So I don't think eliminating poverty is dangerous in a similar way no.
No it's not a February 2020 moment for sure. In February 2020, most people had heard of COVID and a few scattered outbreaks happened, but people generally viewed the topic as more of a curiosity (like major world news but not necessarily something that will deeply impact them). This is more like start of March 2020 for general awareness.
My understanding is there's been around 10 erdos problems solved by GPT by now. Most of them have been found to be either in literature or a very similar problem was solved in literature. But one or two solutions are quite novel.
I am not aware of any unsolved Erdos problem that was solved via an LLM. I am aware of LLMs contributing to variations on known proofs of previously solved Erdos problems. But the issue with having an LLM combine existing solutions or modify existing published solutions is that the previous solutions are in the training data of the LLM, and in general there are many options to make variations on known proofs. Most proofs go through many iterations and simplifications over time, most of which are not sufficiently novel to even warrant publication. The proof you read in a textbook is likely a highly revised and simplified proof of what was first published.
If I'm wrong, please let me know which previously unsolved problem was solved, I would be genuinely curious to see an example of that.
"We tentatively believe Aletheia’s solution to Erdős-1051 represents an early example of an AI system autonomously resolving a slightly non-trivial open Erdős problem of somewhat broader (mild) mathematical interest, for which there exists past literature on closely-related problems [KN16], but none fully resolves Erdős-1051. Moreover, it does not appear to us that Aletheia’s solution is directly inspired by any previous human argument (unlike in
many previously discussed cases), but it does appear to involve a classical idea of moving to the series tail and applying Mahler’s criterion. The solution to Erdős-1051 was generalized further, in a collaborative effort by Aletheia together with human mathematicians and Gemini Deep Think, to produce the research paper [BKK+26]."
"The erdosproblems website shows 851 was proved in 1934." I disagree with this characterization of the Erdos problem. The statement proven in 1934 was weaker. As evidence for this, you can see that Erdos posed this problem after 1934.
You recommended I look at the erdosproblems website.
But evidence that it was posed after 1934 is not really evidence it was not solved, because one of the things we learned from LLMs was that many of these problems were already solved in the literature, or are relatively straightforward applications of known, yet obscure, results. Particularly in the world of Erdos problems, the majority of which can be described as "off the beaten path" and are basically musings in papers that Erdos was asking -- many of these are in fact solved in more obscure articles and no one made the connection until LLMs allowed us to do systematic literature searches. This was the primary source of "solutions" of these problems by LLMs in the cited paper.
The Erdos Problem site also does not say it was solved in 1934. If you read the full sentence there, it refers to a different statement proven which is related.
Yeah that was also my take-away when I was following the developments on it. But then again I don't follow it very closely so _maybe_ some novel solutions are discovered. But given how LLMs work, I'm skeptical about that.
I honestly don't see the point of the red data points. By now all the erdos problems have been attempted by AIs--so every unsolved one can be a red data point.
"An internal scaffolded version of GPT‑5.2 then spent roughly 12 hours reasoning through the problem, coming up with the same formula and producing a formal proof of its validity."
When I use GPT 5.2 Thinking Extended, it gave me the impression that it's consistent enough/has a low enough rate of errors (or enough error correcting ability) to autonomously do math/physics for many hours if it were allowed to [but I guess the Extended time cuts off around 30 minute mark and Pro maybe 1-2 hours]. It's good to see some confirmation of that impression here. I hope scientists/mathematicians at large will be able to play with tools which think at this time-scale soon and see how much capabilities these machines really have.
Yes and 5.3 and the latest codex cli client is incredibly good across compactions. Anyone know the methodology they're using to maintain state and manage context for a 12 hour run? It could be as simple as a single dense document and its own internal compaction algrorithm, I guess.
It's a bit unclear to me what happens if I do that after it thinks for 30 minutes and ends with no response. Does it start off where it left off? Does it start from scratch again? Like I don't know how the compaction of their prior thinking traces work
I'm not sure the benchmark is high enough quality that >80% of problems are well-specified & have correct labels tbh. (But I guess this question has been studied for these benchmarks)
reply