Situation A) Model writes a new endpoint and that's it
Situation B) Model writes a new endpoint, runs lint and build, adds e2e tests with sample data and runs them.
Did situation B mathematically prove the code is correct? No. But the odds the code is correct increases enormously. You see all the time how the Agent finds errors at any of those steps and fixes them, that otherwise would have slipped by.
AI-generated implementation with AI-generated tests left me with some of the worst code I've witnessed in my life. Many of the passing tests it generated were tautologies (i.e. they would never fail even if behavior was incorrect).
When the tests failed the agent tended to change the (previously correct) test making it pass but functionally incorrect, or it "wisely" concluded that both the implementation and the test are correct but that there are external factors making the test fail (there weren't).
Because most people don't work on public projects and can't share the code publicly?
What's more interesting is the lack of examples of non-trivial projects that are provably vibe-coded and that claim to be of high-quality.
I think many of us are looking for: "I vibe-coded [this] with minimal corrections/manual coding on a livestream [here] and I believe it to be high-quality code"
If the code is in fact good quality then the livestream would serve as educational material for using LLMs/agents productively and I guarantee that it would change many minds. Stop telling people how great it all is, show them. I don't want to be a naysayer, I want to be impressed.
I'm considering attempting to vibe code translate one of my XNA games to javascript and recording the process and using all of the latest tools and strategies like agents and .md files and multiple LLMs etc
It was already shown repeatedly in GitHub repositories in the last year that authors are really unhappy with AI generated pull-requests and test cases.
reply