From my observation, people who use the api either end up learning to be much more token efficient, or use a cheaper model.
I have been using the API for the last 2 years, OpenRouter for personal projects and Claude API for work, most of it in zed, always on high thinking.
For work I usually spend $25 if I use opus/sonnet all day, and for personal stuff I usually spend $2-$5 if I use sonnet for a full evening.
But, I don’t think someone who’s used to not thinking about token cost and efficient use would get anywhere close to that low spending if they switched from a plan to the API.
Analytics driven development easily leads to bad outcomes.
1. Important, but less frequently used feature gets moved to a hidden spot leading to even less usage leading to eventual removal.
2. Poorly functioning features not getting the improvement they need because few use them due to how poorly they function.
I have seen these patterns a lot in software where decisions are based on analytics, and I usually stop using that sofware when I find a replacement.
I don't know what they have done to Claude, but when using through copilot it's truly awful compared to using it straight from the API.
I have always just used the API, but I decided to give copilot a go on the weekend because of the cheap price. And I am seeing weird behavior like I have never seen before... It will somehow fail to use the file editing tool and then spend an absolutely huge amount of time/tokens building a python script to apply the edit in a sub process... And it will spin it's wheels on stuff the API routinely just gets right in one shot.
This might have been bad timing. Copilot API broke things last weekend with caused a lot of tool calls in various agent harnesses to start failing like the edit tool.
I have never seen a model be “lazy” before (I have seen them go for minimal change). I have been using the models through the api with various agents and no custom system prompt.
So I am curious, how do people get these lazy outputs?
Is it by having one of those custom system prompts that basically tells the model to be disrespectful?
I have seen some people complain about a new tendency where it can suggest wrapping up the current task even though it isn't done yet. I haven't seen it myself though.
Usually this gets worse if you have a phrase like "wrap it up" earlier in the output, or if you're at a few hundred thousand tokens without compacting.
In both cases the fix is really simple, just compact.
Pretty sure it’s a harness or system prompt issue.
I have never seen those “minimal change” issues when using zed, but have seen them in claude code and aider. Been using sonnet/opus high thinking with the api in all the agents I have tested/used.
I have been using the API for the last 2 years, OpenRouter for personal projects and Claude API for work, most of it in zed, always on high thinking. For work I usually spend $25 if I use opus/sonnet all day, and for personal stuff I usually spend $2-$5 if I use sonnet for a full evening.
But, I don’t think someone who’s used to not thinking about token cost and efficient use would get anywhere close to that low spending if they switched from a plan to the API.
reply