TIL! I'll finally give Claude Code a try. I've been using Cursor since it launched and never tried anything else. The terminal UI didn't appeal to me, but knowing it has better performance, I'll check it out.
Cursor has been a terrible experience lately, regardless of the model. Sometimes for the same task, I need to try with Sonnet 4.5, ChatGPT 5.1 Codex, Gemini Pro 3... and most times, none managed to do the work, and I end up doing it myself.
Glad you mentioned "Cursor has been a terrible experience lately", as I was planning to finally give it a try. I'd heard it has the best auto-complete, which I don't get use VSCode with Claude Code in the terminal.
+1, it had a bad period when they were hyperscaling up, but IME they've found their pace (very) recently - I almost ditched cursor in the summer, but am a quite happy user now.
I haven’t used Cursor since I use Neovim and it’s hard to move out.
The auto-complete suggestions from FIM models (either open source or even something Gemini Flash) punch far above their weight. That combined with CC/Codex has been a good setup for me.
I was evaluating codex vs claude code the past month and GPT 5.1 codex being slow is just the default experience I had with it.
The answers were mostly on par (though different in style which took some getting used to) but the speed was a big downer for me. I really wanted to give it an honest try but went back to Claude Code within two weeks.
I've actually been working on porting the tab completion from Cursor to Zed, and eventually IntelliJ, for fun
It shows exactly why their tab completion is so much better than everyone else's though: it's practically a state machine that's getting updated with diffs on every change and every file you're working with.
(also a bit of a privacy nightmare if you care about that though)
it's not about the terminal, but about decoupling yourself from looking at the code. The Claude app lets you interact with a github repo from your phone.
these agents are not up to the task of writing production level code at any meaningful scale
looking forward to high paying gigs to go in and clean up after people take them too far and the hype cycle fades
---
I recommend the opposite, work on custom agents so you have a better understanding of how these things work and fail. Get deep in the code to understand how context and values flow and get presented within the system.
> these agents are not up to the task of writing production level code at any meaningful scale
This is obviously not true, starting with the AI companies themselves.
It's like the old saying "half of all advertising doesn't work; we just don't which half that is." Some organizations are having great results, while some are not. From the multiple dev podcasts I've listened to by AI skeptics have had a lightbulb moment where they get AI is where everything is headed.
Not a skeptic, I use AI for coding daily and am working on a custom agent setup because, through my experience for more than a year, they are not up to hard tasks.
This is well known I thought, as even the people who build the AIs we use talk about this and acknowledge their limitations.
I'm pretty sure at this point more than half of Anthropic's new production code is LLM-written. That seems incompatible with "these agents are not up to the task of writing production level code at any meaningful scale".
how are you pretty sure? What are you basing that on?
If true, could this explain why Anthropics APIs are less reliable than Gemini's? (I've never gotten a service overloaded response from Google like I did from Anthropic)
My current understanding (based on this text and other sources) is:
- There exist some teams at Anthropic where around 90% of lines of code that get merged are written by AI, but this is a minority of teams.
- The average over all of Anthropic for lines of merged code written by AI is much less than 90%, more like 50%.
> I've never gotten a service overloaded response from Google like I did from Anthropic
They're Google, they out-scale everyone. They run more than 1.3 quadrillion tokens per month through LLMs!
You cannot clean up the code, it is too verbose. That said, you can produce production ready code with AI, you just need to put up very strong boundaries and not let it get too creative.
Also, the quality of production ready code is often highly exaggerated.
Has a section for code. You link it to your GitHub, and it will generate code for you when you get on the bus so there's stuff for you to review after you get to the office.
The app version is iPhone only, you don’t get Code in the Android app, you have to use a web browser.
I use it every day. I’ll write the spec in conversation with the chatbot, refining ideas, saying “is it possible to …?” Get it to create detailed planning and spec documents (and a summary document about the documents). Upload them to Github and then tell Code to make the project.
I have never written any Rust, am not an evangelist, but Code says it finds the error messages super helpful so I get it to one shot projects in that.
I do all this in the evenings while watching TV with my gf.
It amuses me we have people even this thread claiming what it already does is something it can’t do - write working code that does what is supposed to.
I get to spend my time thinking of what to create instead of the minutiae of “ok, I just need 100 more methods, keep going”. And I’ve been coding since the 1980 so don’t think I’m just here for the vibes.
very impressive. I wonder if this sends a different signal to the market regarding using TPUs for training SOTA models versus Nvidia GPUs. From what we've seen, OpenAI is already renting them to diversify... Curious to see what happens next
Can you elaborate on that? In which part of the RAG pipeline did GPT-4.1 perform better? I would expect GPT-5 to perform better on longer context tasks, especially when it comes to understanding the pre-filtered results and reasoning about them
For large context (up to 100K tokens in some cases). We found that GPT-5:
a) has worse instruction following; doesn't follow the system prompt b) produces very long answers which resulted in a bad ux c) has 125K context window so extreme cases resulted in an error
Interesting. https://www.robert-glaser.de/prompts-as-programs-in-gpt-5/ claims GPT-5 has amazing!1!! instruction following. Is your use-case very different, or is this yet another case of "developer A got lucky, developer B tested more things"?
ChatGPT when using 5 or 5-Thinking doesn’t even follow my “custom instructions” on the web version. It’s a serious downgrade compared to the prior generation of models.
That link explains how OpenAI uses it, but doesn't really walk through how it's any faster. I thought the whole point of transformers was that inference speed no longer depended on prompt length. So how does caching the prompt help reduce latency if the outputs aren't being cached.
> Regardless of whether caching is used, the output generated will be identical. This is because only the prompt itself is cached, while the actual response is computed anew each time based on the cached prompt
> I thought the whole point of transformers was that inference speed no longer depended on prompt length
That's not true at all and is exactly what prompt caching is for. For one, you can at least populate the attention KV Cache, which will scale with the prompt size. It's true that if your prompt is larger than the context size, then the prompt size no longer affects inference speed since it essentially discards the excess.
Later they mention that they have some kind of rate limiting because if over ~15 requests are being processed per minute, the request will be sent to a different server. I guess you could deny cache usage but I'm not sure what isolation they have between different callers so maybe even that won't work.
So the doc mentions you can influence the cache key by passing an optional user parameter. It’s unclear from the doc whether the user parameter is validated or if you can just provide an arbitrary string.
15 requests/min is pretty low. Depending on how large the fleet is you might end up getting load balanced to the same one and if it’s round robin then it would be deterministic
I saw this post on the first page a few minutes ago (published 5 hours ago), but it quickly dropped to the 5th page. Given its comments and points, that seems odd. I had to search to find it again. Any idea why?
Yes, but stocks are ripping higher on much worse jobless claims, housing starts, and Philly Fed numbers than expected, 113 unexplained cases of reinfection in South Korea, and the PPP fund being completely tapped out with three weeks to go in Congress' recess.
The market is fully convinced that everything is fixed.
That's not what the "market thinks" at all. The stock market is much more complicated than you assume it is. But if you want a really simple explication: 4t$ more in circulation chasing the same amount assets, interest rates make it non viable to hold bonds, and you don't want to buy bonds anyways if you expect a more inflationary economy. If there's expectations of inflation, you go for stocks and if there's inflation AND 0 interest rate, you RUN for stocks.
You can also think of it this way, retail investors can afford staying in cash indefinitely but make up a tiny portion of the capital. Big money NEEDS to either buy bonds or park their capital in any yielding asset. Right now treasuries offer safety but negative real yields for probably a long time. But a pension fund still needs to generate returns to pay it's beneficiaries. So what's the best and cheapest option right now, by far? Stocks. You have to keep in mind that prices are relative so if stocks are relatively cheaper than bonds you buy stocks. Now add 4t$ to that and you get a stock market that stays strong even if everyone in the market knows the huge economic risks we are facing right now.
Never assume that a whole entire sector that has so much incentives to price in all the available data would just ignore something because they feel like it. I don't get how people really believe that.
alot of people are saying it's just a dead cat bounce. REalVision presented that in most of the past drops of this magnitude, you typically see this type of rebound before it goes lower back to retest the recent lows.
Cursor has been a terrible experience lately, regardless of the model. Sometimes for the same task, I need to try with Sonnet 4.5, ChatGPT 5.1 Codex, Gemini Pro 3... and most times, none managed to do the work, and I end up doing it myself.
At least I’m coding more again, lol