Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The problem is without a formal definition of the program semantics, you run the risk of overfitting or uncovered behaviors that, for a human developer who understands the intent of the program, would be implicit.

And given how hard formal verification is, I don't know that you'll ever get away with not having to manually check these programs, at which point I question just how much productivity you've gained.

It's kinda like self-driving cars: when they work, they work great. But when they fail, they fail in ways a human never would, and therefore a human struggles to anticipate or trust their behaviour.

That said, I'm waiting to see the rise of programming languages designed with LLMs in mind, where a human could use contract oriented programming or similar (think: Ada) combined with TDD methods to more formally specify the problem that an LLM is being asked to solve.



> I question just how much productivity you've gained.

Me too. It's an empirical question to be answered by those who will dare to try.

> It's kinda like self-driving cars

Strong disagree. Yes, neural nets are blackboxes, but the generated code can be idiomatic, modular, easy to inspect with a debugger, etc.

> more formally specify the problem that an LLM is being asked to solve.

That would be a great direction to explore.


> Strong disagree. Yes, neural nets are blackboxes, but the generated code can be idiomatic, modular, easy to inspect with a debugger, etc.

I think you missed my point.

If I'm inspecting code from another human, I'm going to make assumptions about the kinds of errors they're gonna make. There's probably obvious dumb stuff I won't look for because a human would never typically make certain classes of mistake. They're the self-driving car equivalent of driving into the back of a stopped semi truck because it was mistaken for a billboard, an error no human of sound mind and body would make.

So if I'm inspecting code written by a computer, I'll either 1) make those same assumptions and then run the risk of missing unexpected problems in the code, or 2) I'm gonna be overly cautious (because I don't trust the machine) and will examine the code with a fine tooth comb, which will take a great deal more time.


Based on my experience with Autopilot and Copilot, I think this is way less of a problem in code.

You can put code mistakes on a gradient, from subtle to obvious. Obvious bugs are like when the LLM finds a pattern and repeats it for 100 lines. Subtle mistakes are like misnaming a variable so you use one left over from earlier, not the correct one.

Obvious mistakes are easy to catch because they’re obvious. The LLM makes more of those. I think because of the way LLMs work, I have never seen Copilot make a subtle mistake that I wouldn’t expect of a person. People are so good at making surprising bugs, it’s really hard for Copilot to beat it.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: