Hacker Newsnew | past | comments | ask | show | jobs | submit | intellectronica's commentslogin

I hope Claude adds this too. The "all-you-can-eat" model was never going to work for serious users. You can't really use a tool that might bail out on you in the middle of a session because you've hit some limits.


Codex works much better for long-running tasks that require a lot of planning and deep understanding.

Claude, especially 4.5 Sonnet, is a lot nicer to interact with, so it may be a better choice in cases where you are co-working with the agent. Its output is nicer, it "improvises" really well even if you give it only vague prompts. That's valueable for interactive use.

But for delegating complete tasks, Codex is far better. The benchmarks indicate that, as do most practicioners I talk to (and it is indeed my own experience).

In my own work, I use Codex for complete end-to-end tasks, and Claude Sonnet for interactive sessions. They're actually quite different.


I disagree, Codex always gets stuck and wants to double check and clarify things, its like "dammit just execute the plan and don't tell me until its completely finished"

The output of codex is also not as great. Codex is great at the planning and investigation portion but sucks at execution and code quality.


I've been dealing with this on Codex a lot lately. It confidently wraps up a task, I go to check it's work... and it's not even close.

Then I do a double take and re-read the summary message and realize that it pulled a "and then draw the rest of the owl", seemingly arbitrarily picking and choosing what it felt like doing in that session and what it punted over to "next steps to actually get it running".

Claude is more prone to occasional "cheating" with mocked data or "tbd: make this an actual conditional instead of hardcoded If True" stuff when it gets overwhelmed which is annoying and bad. But it at least has strong task adherence for the user's prompt and doesn't make me write a lawyer-esque contract to avoid any loopholes Codex will use to avoid doing work.


Are you using something like spec-kit?


Can / Does Codex actually check docker logs and other things for feedback while iterating on something that isnt working ? That is where the true magic of Claude comes for me. Often things cant be one shot, but being able to iteratively check logs, make an adjustment, rebuild the docker containers, send a curl, and confirm fixed is huge improvement.


Yes, in this regard it's very similar. It works as an agent and does whatever you need it to do to complete the task. In comparison to Claude it tends to plan more and improvise less.


How is this ironic? Carelessly AI-generated output (what we call "slop") is precisely that mediocre average you get before investing more in refining it through iteration. The problem isn't that additional work is needed, but that in many cases it is assumed that no additional work is needed and the first generation from a vague prompt is good enough.


The irony stems from the fact workers are fired due to being 'replaced' by AI only to then be re-hired afterwards to clean up the slop, thus maximizing costs to the business!


Relative cost of labour will differ. One was subject matter expert price, the other will aim for mechanical turk.

When the big lawsuits hit, they'll roll back.


It'll be a large cost reduction over time. The median software developer in the US was at around $112,000 in salary plus benefits on top of that (healthcare, stock compensation), prior to the job plunge. Call it a minimum of $130,000 just at the median.

They'll hire those people back at half their total compensation, with no stock, far fewer benefits, to clean up AI slop. And or just contract it overseas at ~1/3 the former total cost.

Another ten years from now the AI systems will have improved drastically, reducing the slop factor. There's no scenario where it goes back to how it was, that era is over. And the cost will decline substantially versus the peak for US developers.


Cleaning up code requires more skill than creating it (see Kernhigans quote)

Why does that fact stop being true when the code is created by AI?


Based on... what? The more you try to "reduce costs" by letting LLMs take the reigns, the more slop will eventually have to be cleaned up by senior developers. The mess will get exponentially bigger and harder to resolve.

Because I think it won't just be a linear relationship. If you let 1 vibe coder replace a team of 10, you'll need a lot more than 10 people to clean it up and maintain it going forward when they hit the wall.

Personally I'm looking forward to the news stories about major companies collapsing under the weight of their LLM-induced tech debt.


Maybe it will finally start recommending better cheese. My feed is so boring.


The AI Toolbox Survey maps the real-world dev stack: which tools developers actually use across IDEs, extensions, terminal/CLI agents, hosted “vibe coding” services, background agents, models, chatbots, and more.

No vendor hype - just a clear picture of current practice.

Please take ~2 minutes to help us understand the landscape, benchmark your own setup against what’s popular, spot gaps and new options to try, and receive the aggregated results to explore later.

Thank you!


:D


Opus costs 10X more. Maybe it's better, but I can't afford to use it, so who cares.


OP here. I actually agree with you that the "findings" here are meaningless. This is pure vibe.

Also regarding "Sonnet is faster" I did explicitly mention that I believe this is because GPT-5 is in preview and hours from the release. The speed I experienced doesn't say anything about the model performance you can expect.


If they're meaningless why'd you post it besides getting views on your blog?

> Also regarding "Sonnet is faster" I did explicitly mention that I believe this is because GPT-5 is in preview and hours from the release.

I genuinely don't see this mentioned, where is it?



I've been at it for over 30 years. Still learning.

You can learn fast today, and then continue tomorrow, and next month, and next year, and if you remain curious, half a lifetime later you are still learning.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: