A new benchmark comes out, it's designed so nothing does well at it, the models ...

falcor84 · 2025-11-18T14:25:48 1763475948

Your mileage may vary, but for me, working today with the latest version of Claude Code on a non-trivial python web dev project, I do absolutely feel that I can hand over to the AI coding tasks that are 10 times more complex or time consuming than what I could hand over to copilot or windsurf a year ago. It's still nowhere close to replacing me, but I feel that I can work at a significantly higher level.

What field are you in where you feel that there might not have been any growth in capabilities at all?

EDIT: Typo

jhonof · 2025-11-18T16:04:20 1763481860

Claude 3.5 came out in June of last year, and it is imo marginally worse than the AI models currently available for coding. I do not think models are 10x better than 1 year ago, that seems extremely hyperbolic or you are working in a super niche area where that is true.

Miraste · 2025-11-18T17:48:03 1763488083

Are you using it for agentic tasks of any length? 3.5 and 4.5 are about the same for single file/single snippet tasks, but my observation has been that 4.5 can do longer, more complex tasks that were a waste of time to even try with 3.5 because it would always fail.

FergusArgyll · 2025-11-18T19:00:06 1763492406

Yes, this is important. Gpt 5 and o3 were ~ equivalent for a one shot one file task. But 5 and codex-5 can just work for an hour in a way no model was able to before (the newer claudes can too)

jhonof · 2025-11-19T17:13:51 1763572431

I use the newer claudes and letting them work for 1 hour leads to horrible code over 50% of the time that does not work. Maybe I am not the target person for agentic tasks, all I use agents for is to do product searches for me on the internet when I have specific constraints and I don't want to waste an hour looking for something.

hadlock · 2025-11-19T00:36:28 1763512588

Your knowledge on the topic is at least six months out of date; April 2025 was a huge leap forward in usability, and recent releases in the last 30 days are at least what I would call a full generation newer technology than June of 2024. Summer 2025 was arguably the dawn of true AI assisted coding. Heck reasoning models were still bleeding edge in late December 2024. They might not be 10x better but their ability to competently use (and build their own) tools makes them almost incomparable to last year's technology.

jhonof · 2025-11-19T17:12:36 1763572356

Maybe I am just using them wrong, but I don't know how my knowledge can be out of date considering I use the tools every day and pay for Clause and Gemini. I genuinely think GPT 5 was worse than previous models for reference. They are for sure marginally better, but I don't even think 2x better let alone 10x better.

zamadatix · 2025-11-18T15:12:42 1763478762

I'm in product management focused around networking. I can use the tools to create great mockups in a fraction of a time but the actual turnaround of that into production ready code has not been changing much. The team has been able to build test cases and pipelines a bit more quickly is probably the main gain on getting code written.