Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Situation A) Model writes a new endpoint and that's it

Situation B) Model writes a new endpoint, runs lint and build, adds e2e tests with sample data and runs them.

Did situation B mathematically prove the code is correct? No. But the odds the code is correct increases enormously. You see all the time how the Agent finds errors at any of those steps and fixes them, that otherwise would have slipped by.





LLM generated tests in my experience are really poor

Doesn't change the fact that what I mentioned greatly improves agent accuracy.

AI-generated implementation with AI-generated tests left me with some of the worst code I've witnessed in my life. Many of the passing tests it generated were tautologies (i.e. they would never fail even if behavior was incorrect).

When the tests failed the agent tended to change the (previously correct) test making it pass but functionally incorrect, or it "wisely" concluded that both the implementation and the test are correct but that there are external factors making the test fail (there weren't).

It behaved much like a really naive junior.


Which coding agent and which model?



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: