Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm the first to be tired of everyone, for every model, that says "uuuh became dumber" because I didn't believe them

... until this week! Opus is struggling worse than Sonnet those last two weeks.



Forget the agent itself being dumber: right now I'm getting an "API error: usage limit exceeded" message whenever I try anything despite my usage showing as 26% for the session limit and 8% for the week (with 0/5 routines, which I guess is what this thread is about). This is with the default model and effort, and Claude Code is saying I need to turn on extra usage for it to work. Forget that, I just canceled my subscription instead.

There's utility in LLMs for coding, but having literally the entire platform vibe-coded is too much for me. At this point, I might genuinely believe they're not intentionally watering anything down, because it's incredibly believable that they just have no clue how any of it works anymore.


Likewise, I foolishly assumed everybody else was just doing it wrong.

But this week I've lost count of the times I've had to say something along the lines of: "Can you check our plan/instructions, I'm pretty sure I said we need to do [this thing] but you've done [that thing]..."

And get hit with a "You're absolutely right...", which virtually never happened for me. I think maybe once since Opus 4-6.


Honestly, I thought it was a skill issue too, but it just turns out I wasn't using it enough.

I started a new job recently, so I'm asking it a lot of questions about the codebase, sometimes just to confirm my understanding and often it came up with wrong conclusions that would send me down rabbit holes only to find out it was wrong.

On a side project I gave it literally a formula and told it to run it with some other parameters. It was doing its usual "let me get to know the codebase" then a "I have a good understanding of the codebase" speech, only to follow it up with "what you're asking is not possible" I'm like... No, I know it's possible I implemented it already, just use it in more places only to get the same "o ye ur right, I missed that... Blabla"

Yeah, it's gotten pretty bad...


maybe a consequence of saving GPU for newer models? Also tuning effort level suppose to help, haven't get enough dp on this though


They track our frustration, which is probably really good coding data. The reason why it's painful is because that's data annotation, it's literally a job people get paid to do, yet we're paying to do it. If they need good data, they just turn the models to shit and gaslight everyone


My favourite was, Opus 4.6 last night (to be fair peak IST time, late afternoon my time), the first prompt with a small context: jams a copy-pasted function in between a bunch of import statements, doesn't even wire up it's own function and calls it done. Wild, I've not seen failure states like that since old Sonnet 4


Yesterday I had my biggest Opus WTF.

I asked Opus 4.6 to help me get GPU stats in btop on nixos. Opus's first approach was to use patchelf to monkey patch the btop binary. I had to redirect it to just look the nix wiki and add `nixpkgs.config.rocmSupport = true;`.

But the approach of modifying a compiled binary for a configuration issue is bizarre.


It does stuff like this all the time. It loves doing this with scripts with sed, so I'm not surprised to hear about it trying to do this with binaries. It's definitely wilder, though


It frequently gets indentation wrong on projects, then tries to write sed/awk scripts. Can't get it right, then write a python script that reformats the whole file on stdout, makes sure the indentation is correct, then writes requests an edit snippet.

And you might be thinking. Well, you should use a code formatter! But I do!

And then you might say, well surely you forgot to mention it in you AGENTS/CLAUDE file. Nope, it's there, multiple times even in different sections because once was apparently not enough.

And lastly, surely if I'm watching this cursed loop unfold and am approving edits manually, like some bogan pleb, I can steer it easily... Well, let me tell ya... I tried stopping it and injecting hints about the formatter, and it stick for a minute before it goes crazy again. Or sometimes it rereads the file and just immediately fucks up the formatting.

I think when this shit happens, it probably uses like 3x more tokens.

For a Rust project, it recently stated analysing binaries in the target as directory a first instinct, instead of looking at the code...

Good grief.


In my experience Opus and Claude have declined significantly over the past few weeks. It actually feels like dealing with an employee that has become bored and intentionally cuts corners.


And the worse part is the company is gaslighting people when they report it


Pretty reassuring to hear that. I was skeptical too, there's a lot of variables like some crap added to memory specific skill or custom instructions interfering with the workflow and what not. But now it was like a toddler that consumes money when talking.


It’s quite an interesting business model actually that the worse it performs to a degree the more money it makes you because of the token churn


Is it? Or is it the task you're trying to do? Opus 4.6 has been staggeringly good for me this last week, both inside Claude Code and through Antigravity until I used up my quota.


I think some of this comes down to undeclared A/B testing. I've had the worst week of interactions I have ever had using Claude Code. The whole week whenever I have a session that isn't failing miserably I seem to get tapped for a session survey but on any that are out and out shitting the bed it never asks. It has felt a little surreal. I'd love to see a product wide stats graph for swearing, I would 100% believe that it is hitting an all time high but maybe I'm just a victim of a bad A/B round.


Oh I’ve been getting a lot more of those too lately even though I dismiss it every time. Wonder if I should report not satisfied every time so that I get routed to something better…


Here's some good looking anecdata:

https://x.com/fkysly/status/2044283560170004777


Usually, Claude code with Opus checks by itself the right tools to check the docs, for Svelte for example. So what it gives me is usually flawless.

And right now, I have to remind it every time that the MCP exists, and even then it cannot manage to find a routing bug I have with Sveltekit.

Did a lot of Sveltekit with Opus in the past, and I didn't have to think about it, Opus always got it right easily. Until now




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: