More

Madmallard · 2026-01-16T07:45:28 1768549528

Idk I was using chat gpt 3.5 to do stuff and it was pretty helpful then

Madmallard · 2026-01-15T06:08:17 1768457297

Doubt it.

People want to interact with other humans.

Hotel doorman problem etc.

damian2000 · 2026-01-15T06:39:52 1768459192

Devs really don't want to work with tech writers to document their code though

Madmallard · 2026-01-15T00:38:24 1768437504

I don't know if it's as nuanced as this.

Just seems like it's dependent on what you're working on and what training data is available for it.

AI definitely just spews out python and JavaScript for me to do all sorts of things quickly.

But it can't translate my XNA game to JavaScript worth a damn. It's terrible with visual work as well.

Madmallard · 2026-01-12T08:16:19 1768205779

"The LLM should be able to determine if a task was completed successfully or not."

Writing logic that verifies something complex requires basically solving the problem entirely already.

redox99 · 2026-01-12T08:32:02 1768206722

Situation A) Model writes a new endpoint and that's it

Situation B) Model writes a new endpoint, runs lint and build, adds e2e tests with sample data and runs them.

Did situation B mathematically prove the code is correct? No. But the odds the code is correct increases enormously. You see all the time how the Agent finds errors at any of those steps and fixes them, that otherwise would have slipped by.

Madmallard · 2026-01-12T08:43:53 1768207433

LLM generated tests in my experience are really poor

redox99 · 2026-01-12T08:50:49 1768207849

Doesn't change the fact that what I mentioned greatly improves agent accuracy.

dns_snek · 2026-01-12T13:37:30 1768225050

AI-generated implementation with AI-generated tests left me with some of the worst code I've witnessed in my life. Many of the passing tests it generated were tautologies (i.e. they would never fail even if behavior was incorrect).

When the tests failed the agent tended to change the (previously correct) test making it pass but functionally incorrect, or it "wisely" concluded that both the implementation and the test are correct but that there are external factors making the test fail (there weren't).

It behaved much like a really naive junior.

simonw · 2026-01-12T14:33:57 1768228437

Which coding agent and which model?

Madmallard · 2026-01-12T07:41:41 1768203701

Those just don't appear at all on HackerNews

Gee I wonder why

dns_snek · 2026-01-12T13:14:14 1768223654

Because most people don't work on public projects and can't share the code publicly?

What's more interesting is the lack of examples of non-trivial projects that are provably vibe-coded and that claim to be of high-quality.

I think many of us are looking for: "I vibe-coded [this] with minimal corrections/manual coding on a livestream [here] and I believe it to be high-quality code"

If the code is in fact good quality then the livestream would serve as educational material for using LLMs/agents productively and I guarantee that it would change many minds. Stop telling people how great it all is, show them. I don't want to be a naysayer, I want to be impressed.

Madmallard · 2026-01-12T14:04:53 1768226693

I'm considering attempting to vibe code translate one of my XNA games to javascript and recording the process and using all of the latest tools and strategies like agents and .md files and multiple LLMs etc

Madmallard · 2026-01-12T01:57:08 1768183028

"sufficient detail, and if done correctly" -> the machine will once again be able to interpret your intention ...

This does not actually follow from the way LLMs work.

Madmallard · 2026-01-12T01:46:43 1768182403

Yeah grinding the domain expertise is definitely the play if you have the resources to do so.

Madmallard · 2026-01-12T01:46:06 1768182366

They are stealing trillions in assets lol

And destroying the gaming industry and altering the energy grid and pooping on the environment

Madmallard · 2026-01-12T01:45:35 1768182335

It was already shown repeatedly in GitHub repositories in the last year that authors are really unhappy with AI generated pull-requests and test cases.

epolanski · 2026-01-12T02:07:31 1768183651

I am not invested in anything, I am merely sharing my personal experience.

Madmallard · 2026-01-12T01:43:36 1768182216

Lol there's definitely a war on hacker news

There's vested interests posting 20 replies in a single thread that benefits them and flagging replies that don't

There's literally 20-25% of dissenters comments in each of these posts being repeatedly flagged.

epolanski · 2026-01-12T02:10:30 1768183830

You're witch hunting.

I haven't flagged or downvoted anybody and I have no vested interest in anything. Not sure what my cause should be and what would be my benefit.

My profile contains my full name, you can search me, I'm a random freelancer, not somebody with any stakes in pushing AI.