I think we've moved away from the secure perimeter thinking and towards defense in depth - if that list of passwords helps you get somewhere other than the vault, removing the post-it improves security. Vaults get infiltrated all the time - and often in partial ways like being able to see into the vault but not reach in.
Defence in depth matters, but an analysis here shows that the same mechanism used to breach the outer layers (getting administrative access) can be used to breach the next layer (more thoroughly prodding Edge or Chrome to give up passwords).
Why are two concurrent sessions updating the same memory key with different values? IMO it probably points to a fundamental flaw in how memory is being thought about and built.
Author here. Because of parallelism and non determinism.
This problem is quite common and not limited to memories. For instance, Claude Code will block write attempts and steer the agent to perform a read first (because the file might have been modified in the meantime by the user or another agent).
Same principle here: rather than trying to deterministically “merge” concurrent writes, you fail the last write and let the agent read again and try another write
If you look at the actual cost of your Claude Code conversations, you'll see that the cost is overwhelmingly dominated by the cost of input tokens (cached). Because of how we construct persistent conversations, each cached input token incurs cost on each API request, meaning that component of cost scales with O(request count). If you graph the cost curve of a claude code session, it's very obvious that this scaling factor overwhelms the cache discount.
Here is a blog post that shows some data - https://blog.exe.dev/expensively-quadratic. And I can confirm this is true for Claude Code - I set up a MITM capture for all Claude Code requests and graphed it.
So increasing Request Count that reuses the same prefix (which is what higher compaction thresholds do) really does lead to (substantially) higher API costs.
I don't agree. I avoided grok because of Musk for a long time, but having used it more, I think it is one the best models around and grok.com is an extremely good chat app. My evaluation was based on trying it before gpt-5.5 and obvious before grok 4.3, but it was, for me, the 2nd best model/chat app after claude. It's much less edgelordy than you might think based on the news.
All my usage of Grok for technical topics shows it regularly deeply misunderstanding things and just parroting back my question in fancy language. It’s the only frontier model I get this impression of. That makes it super annoying when it tries to market itself as good at engineering tasks when it seems (to me) to be much worse at them.
Interesting. I have not had this experience. I would like to learn more. Can you point me to any examples or domains where I might be able to replicate this?
I was asking questions about compiler techniques. Then when I got annoyed I started asking about experimental design. Both were very frustrating experiences once I started realizing how limited its responses were.
Though yeah the edgelord-y style faded after I criticized it a couple times.
What do you mean? Costs spiked with the introduction of the 1M context window I believe due to larger average cached input tokens, which dominate cost.
Nah, there's apparently a few caching bugs, one --resume and some noisy tool use. I have a little app that monitors and resets the context window at 70% usage based on 200k tokens and I'm about to run out of weekly allowance after just a couple days. Never happened before
I used them for repeated problems or workflows I encounter when running with the default. If I find myself needing to repeat myself about a certain thing a lot, I put it into claude.md. When that gets too big or I want to have detailed token-heavy instructions that are only occasionally needed, I create a skill.
I also import skills or groups of skills like Superpowers (https://github.com/obra/superpowers) when I want to try out someone else's approach to claude code for a while.
I don't really care if other people want to be on or off the AI train (no hate to the gp poster), but if you are on the train and you read the above comment, it's hard not to think that this person might be holding it wrong.
Using sonnet 4 or even just not knowing which model they are using is a sign of someone not really taking this tech all that seriously. More or less anyone who is seriously trying to adopt this technology knows they are using Opus 4.6 and probably even knows when they stopped using Opus 4. Also, the idea that you wouldn't review the code it generated is, perhaps not uncommon, but I think a minority opinion among people who are using the tools effectively. Also a rename falls squarely in the realm of operations that will reliably work in my experience.
This is why these conversations are so fruitless online - someone describes their experience with an anecdote that is (IMO) a fairly inaccurate representation of what the technology can do today. If this is their experience, I think it's very possible they are holding it wrong.
Again, I don't mean any hate towards the original poster, everyone can have their own approach to AI.
Yeah, I'm definitely guilty of not being motivated to use these tools. I find them annoying and boring. But my company's screaming that we should be using them, so I have been trying to find ways to integrate it into my work. As I mentioned, it's mostly not been going very well. I'm just using the tool the company put in front of me and told me to use, I don't know or really care what it is.
How is that the point of AI. The point is that it can chug through things that would take humans hours in a matter of seconds. You still have to work with it. But it reduces huge tasks into very small ones
> He discontinued the blood exchange after data showed “no benefits.” A suspicious person might note that a vampire would say exactly this after the media got too interested.
I don't think it's the media (clearly the younger generations are media friendly), it's probably pressure from the older vamps.
I felt the same way and came to the comments to see if anyone else smelled it. It's either AI-assisted writing or people are genuinely starting to write like how ChatGPT sounds.
First, the structure of this satirical post is headings and bullet points. Fine, whatever, a lot of people write this way.
Then there's the exhausting litany of super short sentence fragments.
> He published this. Openly. In a book. As a priest.
This is how airport novels and LinkedIn "thought leadership" clickbait is written, so ok, fine, I'll let it pass.
Then I started to notice a lot of: "It's not X. It's Y" or "this isn't just A. It's B."
> Feeding isn’t nutrition. It’s dialysis.
Before LLMs, people weren't writing this way. At the risk of sounding like a curmudgeon: it's insulting to read, like the reader is a 5-year-old.
When several of these smells pile up, I close the tab immediately and try to forget about it. This one was so egregious that I had to read the whole thing and then come to the comments to rant a bit.
The cost of replacement-level software drops a lot with agentic coding. And maintenance tasks are similarly much smaller time syncs. When you combine that with the long-standing benefits of inhouse software (customizable to your exact problem, tweakable, often cleaner code because the feature-set can be a lot smaller), I think a lot of previously obvious dependencies become viable to write in house.
It's going to vary a lot by the dependency and scope - obvious owning your own react is a lot different than owning your own leftpad, but to me it feels like there's no way that agentic coding doesn't shift the calculus somewhat. Particularly when agentic coding make a lot of nice-to-have mini-features trivial to add so the developer experience gap between a maintained library and a homegrown solution is smaller than it used to be.
The API price is 6x that of normal Opus, so look forward to a new $1200/mo subscription that gives you the same amount of usage if you need the extra speed.
The writing has been on the wall since day 1. They wouldn't be marketing a subscription being sold at a loss as hard as they are if the intention wasn't to lock you in and then increase the price later.
What I expect to happen is that they'll slowly decrease the usage limits on the existing subscriptions over time, and introduce new, more expensive subscription tiers with more usage. There's a reason why AI subscriptions generally don't tell you exactly what the limits are, they're intended to be "flexible" to allow for this.
It's explicitly called out as excluded in the blue info bubble they have there.
> Fast mode usage is billed directly to extra usage, even if you have remaining usage on your plan. This means fast mode tokens do not count against your plan’s included usage and are charged at the fast mode rate from the first token.
reply