Decompilation does not preserve semantics. You generally do not know whether the code from the decompiler will be compiled to semantically equivalent binary that you initially decompiled.
My test harness loads up the original DLL then executes that in parallel against the converted code (differential testing). That closes the feedback loop the LLM needs to be able to find and fix discrepancies.
I'm also doing this on an old Win32 DLL so the task is probably much easier than a lot of code bases.
I am applying differential/property based testing to all the side effects of functions (mutations) and return values. The rust code coverage is also used to steer the LLM as it finds discrepancies in side effects.
It is written up in my link - please bear in mind it is really hard to find the right level to communicate this level of detail at - so I'm happy to answer questions.
Many of the decompiled console games of the '90s were originally written in C89 using an ad-hoc compiler from Metrowerks or some off-branch release of gcc-2.95 plus console specific assemblers.
I willing to bet that the decompiled output is gonna be more readable than the original source code.
Not related to what I was saying. Compilation is a many-to-one transformation & although you can try to guess an inverse there is no way to guarantee you will recover the original source b/c at the assembly level you don't have any types & structs.
You're confusing yourself w/ fancy words like "proof space". The LLM is not doing any kind of traversal in any meaningful sense of the word b/c the "proof" is often just grammatically coherent gibberish whereas an actual traversal in an actual space of proofs would never land on incorrect proofs.
My reading of their comment is that a proof space is a concept where a human guesses that a proof of some form q exists, and the AI searches a space S(q) where most points may be not valid proofs, but if there is a valid proof, it will hopefully be found.
So it is not a space of proofs in the sense that everything in a vector space is a vector. More like a space of sequences of statements, which have some particular pattern, and one of which might be a proof.
So it's not a proof space then. It's some computable graph where the edges are defined by standard autoregressive LLM single step execution & some of the vertices can be interpreted by theorem provers like Lean, Agda, Isabelle/HOL, Rocq, etc. That's still not any kind of space of proofs. Actually specifying the real logic of what is going on is much less confusing & does not lead readers astray w/ vague terms like proof spaces.
I still don't get how achieving 96% on some benchmark means it's a super genius but that last 4% is somehow still out of reach. The people who constantly compare robots to people should really ponder how a person who manages to achieve 90% on some advanced math benchmark still misses that last 10% somehow.
This feels like a maybe interesting position, but I don’t really follow what you mean. Is it possible to just state it directly? Asking us to ponder is sort of vague.
These math LLMs seem very different from humans. A person has a specialty. A LLM that was as skilled as, say, a middling PhD recipient (not superhuman), but also was that skilled in literally every field, maybe somebody could argue that’s superhuman (“smarter” than any one human). By this standard a room full of people or an academic journal could also be seen as superhuman. Which is not unreasonable, communication is our superpower.
Yeah - it's interesting where the edge is. In theory, an llm trained in everything should be more ready to make cross-field connections. But doing that well requires certain kind of translation and problem selection work which is hard even for humans. (I would even say, beyond PhD level - knowing which problem is with throwing PhD students at is the domain of professors... And many of them are bad at it, as well.)
On the human side, mathematical silos reduce our ability to notice opportunities for cross-silo applications. There should be lots of opportunity available.
LLM are good at search, but plagiarism is not "AI".
Leonhard Euler discovered many things by simply trying proofs everyone knew was impossible at the time. Additionally, folks like Isaac Newton and Gottfried Leibniz simply invented new approaches to solve general problems.
The folks that assume LLM are "AI"... also are biased to turn a blind eye to clear isomorphic plagiarism in the models. Note too, LLM activation capping only reduces aberrant offshoots from the expected reasoning models behavioral vector (it can never be trusted.) Thus, will spew nonsense when faced with some unknown domain search space.
Most exams do not have ambiguous or unknown contexts in the answer key, and a machine should score 100% matching documented solutions without fail. However, LLM would also require >75% of our galaxy energy output to reach 1 human level intelligence error rates in general.
YC has too many true believers with "AI" hype, and it is really disturbing. =3
In general, "any conceivable LLM" was the metric based on current energy usage trends within the known data-centers peak loads (likely much higher due to municipal NDA.) A straw-man argument on whether it is asymptotic or not is irrelevant with numbers that large. For example, 75% of a our galaxy energy output... now only needing 40% total output... does not correct a core model design problem.
LLM are not "AI", and unlikely ever will be due to that cost... but Neuromorphic computing is a more interesting area of study. =3
reply