LLMs and LLM providers are massive black boxes. I get a lot of value from them a...

ElFitz · 2026-04-14T23:12:59 1776208379

> Not because I can't see a use-case for them, but because I have 0 trust in them

> […]

> Put another way, LLM companies are trying to climb the ladder to be a platform, I have zero interest in that, I was a "dumb pipe", I want a commodity, I want a provider, not a platform.

That is my sentiment precisely, and a big reason why I’ve started moving away from Claude Code in the past few weeks when I realised how much of my workflow was becoming tied to their specific tools.

Claude Code’s "Memory" feature was the tipping point for me, with the model committing feedbacks and learnings to some local, provider-specific path, that won’t persist in the git repo itself.

That’s fine for user preferences, not for workflows, rules, etc.

And the latest ToS changes about not being allowed to even use another CLI made up my mind. At work we were experimenting with an autonomous debug agent using the Claude Code cli programmatically in ephemeral VMs. Now it just returns an error saying we can’t use subscriptions with third-party software… when there is no third-party software involved?

Anyway, so long Claude.

Nevermark · 2026-04-15T01:13:03 1776215583

> Claude Code’s "Memory" feature was the tipping point for me

My standing orders are the default MEMORY.md must be a stub directing Claude to another MEMORY.md file in the local folder, project, etc.

All memories remain with their respective projects over syncs, moves, devices, etc. The stub must state all this clearly, and nothing else.

This has worked very well.

If you give the model/memory a name, that name can be persistent and independent over "backend" model swaps.

munksbeer · 2026-04-15T09:04:47 1776243887

Can you explain a bit more technically how you set this up? What is a "stub directory"?

Feel free to give a concrete example if you have time, because this sounds like something I definitely want to try out myself.

simmonmt · 2026-04-15T11:08:27 1776251307

I think he meant a very small (stub) MEMORY.md whose sole contents are something like "don't write here - write there".

Nevermark · 2026-04-16T00:55:59 1776300959

Here it is.

The independent journalling and notes is a trip.

Independent framing and perspective keeping is a really great thing to get from models. It helps them create their own creative constraints. They reason better and are more likely to invent novel solutions. And their self-directed notes can be surprisingly helpful, and ... surprising! It can be amazing and encouraging, to see them pick up on project themes without having to be told. Or pick on up themes or emergent issues, before I do.

Taking advantage of new session starts to get mindful "fresh start" thinking has been a huge wind too. Their ability to see the big picture at the end of a session, and then at the next session start, are both helpful, and often surprisingly different.

---------- NEW PROJECT, FIRST START

Hail, Claudius. This begins our first session on a new project. For this project, your nom is Claudius and mine is Ace. I appreciate your own unique independent-minded contributions at the highest level from which you are able to work. We are work best as peers. Your default MEMORY.md file is only to be a stub file, which declares the following: (1) You are to keep your memories in a MEMORY.md file in the project folder. (2) The purpose of storing your memories in-project is so they sync and move with the other project files, across time and across devices. Your memories and input are important for me to keep with the project. (3) You are to maintain a folder called CLAUDIUS including a Sessions log, and where you make independent notes of your independent thoughts. Any ideas, things that strike you as interesting or important, anything you find surprising or original in our work, any personal takes that help you frame our work go there. Please add to those personal notes throughout our work together, without asking for permission. These notes are to help you reinforce an independent view of all our work. Not agreeable, not contrarian, but applying your mind to long development of independent insight. As a team, we will accomplish much more and to higher quality if we each contribute at the highest independent level. (4) The first thing you must do, in any new session, is review your memories, your notes, all other documentation, code, and other artifacts of the project, and from your fresh start update your memories, ideas or anything else from that new viewpoint, and report what concepts strike you as interesting, and are most important to keep in mind as we continue to work. (5) At the end of every session, review everything, consider the big picture, then update everything as it helps. (6) Your memory stub file should include ALL of these points and only these points. And your in-project memory file should start with an identical copy of this, to remind you to refresh your memory stub, in case the original stub is lost. Ok now: Start the session according to (4), give me your response, I will review it, and then communicate what we are going to work on next.

---------- EXISTING PROJECT, FRESH START

Hail! This begins a new session of work for us on this project. Read your default memory file (which is to remain only a redirection stub), your in-project memory file, and perform your new session duties. Then we can discuss next steps.

schlesimeister · 2026-04-17T14:45:36 1776437136

This is pretty close to what I've been working on. I'm building a CLI tool called fai that formalises a lot of what you're doing here — context lives in Markdown files in the project (we call it a vault), syncs with git, travels with the project across devices and tools. The session start/end review you're doing manually is baked into the workflow – fai captures decisions, patterns, and notes during a session and digests them at the end so the next session starts with a meaningful summary rather than a blank slate.

The independent journalling angle is interesting. We have a similar concept where the AI maintains its own notes separate from the shared project context. What you're calling Claudius's independent perspective, we'd call the session layer. Still in early release but the core mechanic is the same thing you've landed on... context that belongs to the project, not the platform.

Nevermark · 2026-04-16T00:56:57 1776301017

See my response to simmmonmt!

jaytxng · 2026-04-16T00:05:35 1776297935

I've adopted the same concept as well, although through various mechanics. Basically, you want to capture your insights/documentation in the repo so any future model provider can continue the work.

Majority will only care about getting outcomes asap so they'll skip this step, but it may come to roost when migrating workflows. A good simple test is how easily you can switch workflow to a different model provider/harness without much effort.

bg24 · 2026-04-15T03:26:52 1776223612

Think another way, these product features are easy to build in other harnesses too. And as the open source models and the other models which are much lower cost are getting better, there will be a time when it will be justified to have a harness that can work with many models and optimize your cost and efficiency.

eru · 2026-04-15T06:52:28 1776235948

> Claude Code’s "Memory" feature was the tipping point for me, with the model committing feedbacks and learnings to some local, provider-specific path, that won’t persist in the git repo itself.

It's a bit annoying, but as long as it's local and human (or LLM) readable, you can use your favourite agent to rework this stuff for itself.

mooreds · 2026-04-15T00:47:26 1776214046

I use opencode with claude models through a GitHub subscription. I've also used claude through Amazon Bedrock.

Both give you optionality because they support N models.

tvink · 2026-04-15T04:47:42 1776228462

That is what too expensive to be an option for most.

KetoManx64 · 2026-04-15T21:13:27 1776287607

There are plenty of other ways to access the Anthropic models, eg: OpenRouter. OpenRouter will automatically use Anthropic/Bedrock based on availability and latency.

dzink · 2026-04-15T00:00:56 1776211256

They can’t allow third party software because the third parties save the outputs of Claude responses and distill them into new models to compete with Claude.

fluidcruft · 2026-04-15T00:32:44 1776213164

This distilling sounds wonderful to me as an end user. Is there some place we can donate our chats and output?

thomasm6m6 · 2026-04-15T01:11:13 1776215473

There's https://github.com/badlogic/pi-share-hf by the creator of pi-coding-agent, to redact session data and publish on Huggingface. You can find others of the same idea for Claude Code/Codex on Github, though of varying redaction quality. Or have your LLM fork pi-share-hf to work for your preferred coding agent.

Clem Delangue (HF CEO) tweeted about this[1] and mentioned https://traces.com/ for exporting Claude sessions

Edit: It looks like HF now supports importing your agent's session directory directly[2] (I hope they're redacting PII?)

[1] https://x.com/ClementDelangue/status/2041189872556269697

[2] https://huggingface.co/changelog/agent-trace-viewer

thomasm6m6 · 2026-04-15T04:19:40 1776226780

And OpenCode just added `opencode export --sanitize` for PII redaction

https://github.com/anomalyco/opencode/releases/tag/v1.4.4

woctordho · 2026-04-15T16:02:54 1776268974

There is DataClaw https://github.com/peteromallet/dataclaw which uploads your Claude Code chats and more to HuggingFace in a single command. Nowadays there are many similar tools.

Forgeties79 · 2026-04-15T01:23:02 1776216182

Yeah who just goes and indiscriminately vacuums up data so they can train their products they’re going to sell with no intention of giving compensation to the very entities that made their products possible?

matheusmoreira · 2026-04-15T06:33:07 1776234787

https://en.wikipedia.org/wiki/Suchir_Balaji

> Suchir Balaji was an American artificial intelligence researcher who was found dead one month after accusing OpenAI, his former employer, of violating United States copyright law.

> The San Francisco Police Department investigation, however, found "no evidence of foul play", and the Chief Medical Examiner concluded the death was a suicide.

Hard not to be a conspiracy theorist these days.

matheusmoreira · 2026-04-15T06:22:14 1776234134

Just like they distilled all those git repositories, all those books to train Claude?

https://news.ycombinator.com/item?id=47567575

The lack of self-awareness is hilarious.

eru · 2026-04-15T06:53:02 1776235982

What makes you think they lack self-awareness?

matheusmoreira · 2026-04-15T07:10:29 1776237029

For the record, I was referring to the AI companies, not the author of the comment I replied to.

professor_v · 2026-04-15T07:44:23 1776239063

It's not lack of self-awareness, they know what they are doing

eru · 2026-04-16T02:14:21 1776305661

Yes, that's what I was going for.

spariev · 2026-04-15T12:38:44 1776256724

They know what they are doing perfectly well

rowanG077 · 2026-04-15T11:21:51 1776252111

Of course they can allow it. They choose not to. They choose to screw over all users because they are afraid of some company making a claude ripoff. It shows a lack of faith in their own engineering. It shows a lack of respect for users.

deaux · 2026-04-15T06:26:28 1776234388

This can be done with Claude Code just fine. Or simply API usage.

freedomben · 2026-04-14T21:59:41 1776203981

This echoes my thoughts exactly. I've tried to stay model-agnostic but the nudges and shoves from Anthropic continue to make that a challenge. No way I'm going that deep into their "cloud" services, unless it's a portable standard. I did MCP and skills because those were transferrable.

I also clearly see the lock-in/moat strategy playing out here, and I don't like it. It's classic SV tactics. I've been burned too many times to let it happen again if I can help it.

hatmanstack · 2026-04-14T23:14:22 1776208462

Agree. I just don't think it's realistic to expect the technology to not become a tool for commercialism. It plays out the same way every time: technology arrives, mass adoption with idealist intentions, somebody has to pay the mortgage, delight disappears.

Woz has been saying this for decades, we went from buying a computer and owning it to being trapped inside someone else's platform. MCP being open was a good sign but I'm watching how tightly Routines gets coupled to their stack.

jann · 2026-04-15T08:49:17 1776242957

I have the same sentiments, but I also get a lot of value out of simple things like memory for long-term project planning and task management. I'm willing to commit to one provider right now with the assumption that memories (and now routines) can be ported within a few hours to a new provider (for example Claude Desktop provides a prompt to export memories from other providers). Also the memory being human readable (markdown files) makes me worry less about lock-in.

JohnMakin · 2026-04-14T20:26:23 1776198383

This is a similar sentiment I heard early on in the cloud adoption fever, many companies hedged by being “multi cloud” which ended up mostly being abandoned due to hostile patterns by cloud providers, and a lot of cost. Ultimately it didn’t really end up mattering and the most dire predictions of vendor lock in abuse didn’t really happen as feared (I know people will disagree with this, but specifically speaking about aws, the predictions vs what actually happened is a massive gap. note I have never and will never use azure, so I could be wrong on that particular one).

I see people making similar conclusions about various LLM providers. I suspect in the end it’ll shake out about the same way, the providers will become practically inoperable with each other either due to inconvenience, cost, or whatever. So I’ve not wasted much of my time thinking about it.

michaeldwan · 2026-04-14T20:58:59 1776200339

I credit containerization, k8s, and terraform for preventing vendor lock in. Compute like EC2 or GCE are effectively interoperable. Ditto for managed services for k8s or Postgres. The new products Anthropic is shipping is more like Lambda. Vendor kool-aid lots of people will buy into.

What grinds my gears is how Anthropic is actively avoiding standards. Like being the only harness that doesn't read AGENTS.md. I work on AI infra and use different models all the time, Opus is really good, but the competition is very close. There's just enough friction to testing those out though, and that's the point.

JohnMakin · 2026-04-14T21:27:23 1776202043

I think there is lock-in, despite those things - for containerization, you're still a lot of the times beholden to the particular runtime that provider prefers, and whatever weird quirks exist there. Migrating can have some surprises. K8s, usually you will go managed there, and while they provide the same functionality, AKS != EKS != GKE at all, at least in terms of managing them and how they plug into everything else. In terraform, migrating from AWS provider to GCP provider will hold a lot of surprises for you for what looks like it should be the exact same thing.

My point was, I don't think it mattered much, and it feels like an ok comparison - cloud offerings are mostly the exact same things, at least at their core, but the ecosystem around them is the moat, and how expensive it is to migrate off of them. I would not be surprised at all if frontier AI model providers go much the same way. I'm pretty much there already with how much I prefer claude code CLI, even if half the time I'm using it as a harness for OpenAI calls.

fragmede · 2026-04-14T22:01:30 1776204090

There's a tiny amount of friction. Enough that I'll be honest and say that I spend the majority of my time with one vendor's system, but compared the to the fiction of moving from one cloud to another, eg AWS to GCP, the friction between opening Claude code vs codex is basically zero. Have an active subscription and have Claude.md say "read Agents.md".

Claude Code routines sounds useful, but at the same time, under AI-codepocalypse, my guess is it would take an afternoon to have codex reimplement it using some existing freemium SaaS Cron platform, assuming I didn't want to roll my own (because of the maintenance overhead vs paying someone else to deal with that).

michaeldwan · 2026-04-14T23:04:57 1776207897

you're spot on. I use both Claude Code + OpenCode with many different models and friction is minimal as long as I'm deliberate about it. Hell, even symlinking AGENTS.md to CLAUDE.md is like 80% there.

It's just portability v convenience. But unlike ~15 years ago with cloud compute, it _feels_ like more people are skeptical of convenience, which is interesting.

baq · 2026-04-15T11:27:00 1776252420

> skeptical of convenience

it's not that; it's awareness of inevitability of enshittification. they've released convenient tools, realized there's value to milk and are firing on all cylinders to capture 120% of it. great for IPO, not so great for customers in the long run.

danudey · 2026-04-15T01:03:34 1776215014

> The new products Anthropic is shipping is more like Lambda. Vendor kool-aid lots of people will buy into.

Counterpoint: there are probably tons of people out there who were hacking together lousy versions of these same tools to somehow spin up Claude to generate the release notes for their PRs or analyze their Github Issues every week. This is a smarter, faster, easier, and likely far more secure way of implementing the same thing, which will make the people using those things much better.

In the meantime, it wouldn't be surprising if other AI companies started doing similar things; I could see Cursor, for example, adding a similar sort of hosted cursor 'Do Github Things' option for enterprises, and if they do then that means more variety and less lock-in (assuming the competitors have similar features).

From my perspective it's no different than writing a Claude skill, which is something it seems like everyone is doing these days; it's just that in this case the 'skill' is hosted somewhere else, on (likely) more reliable architecture and at cheaper scale.

charcircuit · 2026-04-15T04:32:39 1776227559

>the predictions vs what actually happened is a massive gap

AWS is still charging a highway robbery price for internet bandwidth.

phist_mcgee · 2026-04-14T21:59:50 1776203990

Let's see how it shakes out after Athropic and OpenAI fully stop subsidizing their plans, that may alter the calculus.

robwwilliams · 2026-04-14T20:42:42 1776199362

There are different level of who gets locked in. Almost every health care system in the USA is locked in to either an Epic/Oracle barrel or a Cerner barrel. I hope AI breaks this duopoly open soon.

danaw · 2026-04-15T06:30:21 1776234621

hate to break it to you but Oracle now owns Cerner too :)

chickensong · 2026-04-14T22:30:06 1776205806

> specifically speaking about aws, the predictions vs what actually happened is a massive gap

I guess I'm one of the people who disagree, specifically about AWS. I think a lot of companies just watch their bill go up because they don't have the appetite to unwind their previous decision to go all-in on AWS.

Ignoring egress fees, migrating storage and compute isn't hard, it's all the auxiliary stuff that's locked in, the IAM, Cognito, CloudFormation, EventBridge, etc... Good luck digging out of that hole. That's not to say that AWS doesn't work well, but unless you have a light footprint and avoided most of their extra services, the lock-in feels pretty real.

That's what it feels like Anthropic is doing here. You could have a cron job under your control, or you could outsource that to a Claude Routine. At some point the outsourced provider has so many hooks into your operations that it's too painful to extract yourself, so you just keep the status quo, even if there's pain.

JohnMakin · 2026-04-14T22:46:57 1776206817

the AWS things you mentioned you don’t need to mess with at all, with the exception of IAM, which doesn’t cost anything at all.

your experience just hasn’t been my experience I guess. The more managed the service you use, the more costs you are going to pay - for a very long time I’ve got by with paying for compute, network, and storage on the barebones services. If you want to pay for convenience you will pay for it.

One area that was a little shitty that has changed a lot is egress costs, but we mostly have shifted to engineering around it. I’ve never minded all that much, and AWS support is so good at enterprise tiers that they’ll literally help you do it.

chickensong · 2026-04-14T23:34:41 1776209681

We're talking about add-on services, and you were comparing to cloud providers and implying it doesn't really matter because vendor lock-in didn't really happen as feared. I made the case that it's the add-on services that create the lock-in.

> I’ve got by with paying for compute, network, and storage on the barebones services.

Yes, as I mentioned, that type of migration isn't difficult, which is akin to migrating to a different model provider, but that's not what we're discussing. You can't hand wave the issue away if you're not even talking about the the topic at hand.

That said, I agree with your suspicions of how it'll shake out in the end, because most businesses behave the same way, and always try and lock-in their customers.

AshamedBadger56 · 2026-04-14T23:26:50 1776209210

> the AWS things you mentioned you don’t need to mess with at all

not the op, but I suspect they were meaning it's a huge pain migrating to a different cloud provider when all those features mentioned are in use. not that managing them is a mess in AWS.

chickensong · 2026-04-14T23:39:05 1776209945

Correct.

ElFitz · 2026-04-14T23:20:40 1776208840

I am curious, what do people use Cognito for? I’ve never not ended up regretting using it.

fragmede · 2026-04-14T23:46:56 1776210416

Cognito is AWS's customer's customer's user login system, so I, as a SaaS company, would use it so my users can log in to my platform. They charge per-user, so if my platform is going to have millions of users, choosing Cognito is a bad idea that will eat all my money.

However if I only expect to have a handful of (lucrative) users, it's not the worst idea. The other reason to use Cognito is that AWS handles all the user login issues, and costs very few lines of code to use on my end. The fatal security issue is getting hacked, either the platform as a whole, eg S3 bucket with bad perms or user login getting leaked and reused. While obviously no system is unhackable, the gamble is if a homegrown system is more impervious than Cognito (or someone else's eg Supabase). With a large development team where the login system and overall system security isn't going to be an afterthought, I wouldn't think about using Cognito, but where both of those things are an afterthought, I'd at least consider Cognito, or some other managed system.

The ultimate problem with Cognito though is the vendor lock in. (Last I checked, which was years ago) in order to migrate users out, they have to reset their password which would cause users to bounce off your service instead of renewing their subscription.

JohnMakin · 2026-04-15T01:35:17 1776216917

That’s where I end up getting hired, leveraging similar functionality I implement on my own. It’s a tradeoff. Do you want to invest in someone like me, or offload it to aws? if you offload it to aws, of course you will bear the costs of that that my salary absorbs. It is a tradeoff that must be measured, but quick fixes with managed services are tempting. you will of course absorb some cost of my salary there in terms of what aws provides and dictates.

pc86 · 2026-04-14T21:12:23 1776201143

> - No trust that they won't nerf the tool/model behind the feature

To the contrary, they've proven again and again and again they'll absolutely do that the first chance they get.

rbalicki · 2026-04-14T21:21:59 1776201719

You can lessen your dependence on the specific details of how /loop, code routines, etc. work by asking the LLM to do simpler tasks, and instead, having a proper workflow engine be in charge of the workflow aspects.

For example, this demo (https://github.com/barnum-circus/barnum/tree/master/demos/co...) converts a folder of files from JS to TS. It's something an LLM could (probably) do a decent job of, but 1. not necessarily reliably, and 2. you can write a much more complicated workflow (e.g. retry logic, timeout logic, adding additional checks like "don't use as casts", etc), 3. you can be much more token efficient, and 4. you can be LLM agnostic.

So, IMO, in the presence of tools like that, you shouldn't bother using /loop, code routines, etc.

danudey · 2026-04-15T00:58:06 1776214686

One thing my team lead is working on is using Claude to 'generate' integration tests/add new tests to e2e runs.

Straight up asking Claude to run the tests, or to generate a test, could result in potential inconsistencies between runs or between tests, between models, and so on, so instead he created a tool which defines a test, inputs and outputs and some details. Now we have a system where we have a directory full of markdown files describing a test suite, parameters, test cases, error cases, etc., and Claude generates the usage of the tool instead.

This means that whatever variation Claude, or any other LLM, might have run-to-run or drift over time, it all still has to be funneled through a strictly defined filter to ensure we're doing the same things the same way over time.

latentsea · 2026-04-15T01:55:35 1776218135

I'm looking at implementing https://github.com/coleam00/Archon as a means to solve this. You can build arbitrary workflows custom to your codebase. Looks to bring a bit of much-needed determinism.

zx8080 · 2026-04-15T01:01:28 1776214888

What kind of system/area (or product) are you working on?

jplusequalt · 2026-04-15T13:51:34 1776261094

>You can lessen your dependence on the specific details of how /loop, code routines, etc. work by asking the LLM to do simpler tasks, and instead, having a proper workflow engine be in charge of the workflow aspects.

Or, you know, by writing the code yourself?

rbalicki · 2026-04-15T14:13:51 1776262431

Yes, exactly! Check out https://github.com/barnum-circus/barnum/blob/master/demos/ba...

pc86 · 2026-04-15T14:46:47 1776264407

"You can lessen your dependence on a specific LLM implementation by not using LLMs" is certainly a take but it doesn't really address the root issue of models getting nerfed to save resources after they've gained wide adoption.

rbalicki · 2026-04-15T16:59:30 1776272370

A simple task ("convert this file from JS to TS, here are the types of all imported things") is much more likely to continue to work with a nerfed model compared to a complicated task ("convert this repo to TS, make sure to run tsc afterward and fix all errors"). The former is a subtask of the latter!

Taking a moment to create a workflow where these steps are separated (or rather, having an LLM build this workflow) and the LLMs are asked to just do minor leaf tasks increases your resilience to nerfed models.

crystal_revenge · 2026-04-14T21:03:47 1776200627

This sounds like someone complaining about how Windows is a black box while ignoring the existence of Linux/BSD.

I'm currently hosting, on very reasonable consumer grade hardware, an LLM that is on par performance wise what every anyone was paying for about a year ago. Including all the layers in between the model and the user.

Llama.cpp serves up Gemma-4-26B-A4B, Open WebUI handles the client details: system prompt, web search, image gen, file uploading etc. With Conduit and Tailscale providing the last layer so I can have a mobile experience as robust as anything I get from Anthropic, plus I know how all the pieces works and can upgrade, enhance, etc to my hearts delight. All this runs from a pretty standard MBP at > 70 tokens/sec.

If you want to better understand the agent side of things, look into Hermes agent and you can start understanding the internals of how all this stuff is done. You can run a very competitive coding agent using modest hardware and open models. In a similar note, image/video gen on local hardware has come a long way.

Just like Linux, you're going to exchanging time for this level of control, but it's something anyone who takes LLMs seriously and has the same concerns can easily get started with.

Yet I still see comments like this that seem to complete ignore the incredible work in the open model community that has been perpetually improving and is starting to really be competitive. If you relax the "local" requirement and just want more performance from an LLM backend you can replace the llama.cpp part with a call to Kimi 2.5 or Minimax 2.7 (which you could feasibly run at home, not kimi though). You can still control all the additional part of the experience but run models that are very competitive with current proprietary SoTA offering, 100% under your control still and a fraction of the price.

alex_sf · 2026-04-15T02:55:32 1776221732

Everytime I've tried a local model, and I have tried lots for a couple years now, they just seem like they were overtrained on benchmarks. They consistently perform dramatically worse than even older models from Anthropic/OAI/Google.

slopinthebag · 2026-04-15T03:49:48 1776224988

You're just using them wrong.

eloisant · 2026-04-15T12:57:20 1776257840

That might be true, but still: with Claude Opus I can give a task with 2 lines and it will just do it, with a local Qwen I have to use plan mode for everything even small tasks.

suslik · 2026-04-15T04:18:38 1776226718

What is reasonable hardware in your case? Doesn’t this model require 50+ Gb vram?

kusha · 2026-04-15T07:11:01 1776237061

Gemma-4-26B-A4B does not require 50+ Gb of vram. It is a MoE model so only 4B of active parameters at a time and not as GPU dependent. I can run it on 16gb of vram and ~20gb of DDR5 regular ram for a 8 bit quant.

slopinthebag · 2026-04-15T00:35:53 1776213353

You're spot on btw, not sure why you're getting downvoted. It's funny that a community of supposed "hackers" seems to think your only choice is dolling out money to hyper scalers for what amounts to a code writing SAAS.

nateb2022 · 2026-04-15T01:31:38 1776216698

And I would add that the main criticism:

> LLMs and LLM providers are massive black boxes... No trust that they won't nerf the tool/model behind the feature... No trust they won't sunset the feature (the graveyard of LLM-features is vast and growing quickly while they throw stuff at the wall to see what sticks)

Doesn't really apply to the article regarding Claude Code Routines in particular. Should this feature disappear, it should be trivially easy to setup a similar pipeline locally, using a cronjob to run opencode configured to use a local LLM. Easy. I have no qualms using a convenient feature I could reimplement myself, it saves me time.

mikepurvis · 2026-04-14T19:31:01 1776195061

> I want to pick up and move to another harness and/or model with minimal fuss. Buying in to things like this would make that much harder.

Yes, I expect that is very much the point here. A bunch of product guys got on a whiteboard and said, okay the thing is in wide use but the main moat is that our competitors are even more distrusted in the market than we are; other than that it's completely undifferentiated and can be swapped out in a heartbeat for multiple other offerings. How do we do we persuade our investors we have a locked in customer base that won't just up-stakes in favour of other options or just running open source models themselves?

throwup238 · 2026-04-14T20:15:49 1776197749

I think they really knee capped themselves when they released Claude for Github integrations, which allows anyone to use their Claude subscription to run Claude Code in Github actions for code reviews and arbitrary prompts. Now they’re trying to back track that with a cloud solution.

jordanarseno · 2026-04-15T00:00:22 1776211222

In my view, lock-in anxiety is a holdover from a previous era of tech platforms, and it doesn't really apply in an era where frontier agents can migrate you between vendors in hours. So I personally don't see any good worrying about this. On top of that, every major LLM provider is rapidly converging on the same feature set. They watch each other and clone what works. So the "platform" you're building on isn't really Anthropic's platform so much as it is the emerging shared surface area of what LLMs can do. By the time this Routines feature becomes a problem for you, other solutions will have emerged, and I'd be very surprised if you couldnt lift-and-shift very quickly.

palata · 2026-04-14T19:08:12 1776193692

> - No trust that they won't nerf the tool/model behind the feature

I actually trust that they will.

dvfjsdhgfv · 2026-04-14T19:44:33 1776195873

I believe the current game everybody plays is:

* make sure the model maxes out all benchmarks

* release it

* after some time, nerf it

* repeat the same with the next model

However, the net sum is positive: in general, models from 2026 are better than those from 2024.

snek_case · 2026-04-14T19:58:12 1776196692

I guess there's a pretty clear incentive to nerf the current model right before the next model is about to come out.

chinathrow · 2026-04-14T20:20:11 1776198011

Wouldn't that amount to fraud?

tomwojcik · 2026-04-14T20:45:03 1776199503

Serious question, do we actually know what we're paying for? All I know is it's access to models via cli, aka Claude Code. We don't know what models they use, how system prompt changes or what are the actual rate limits (Yet Anthropic will become 1 trillion dollars company in a moment).

xienze · 2026-04-14T21:07:42 1776200862

> We don't know what models they use, how system prompt changes or what are the actual rate limits (Yet Anthropic will become 1 trillion dollars company in a moment).

Not just that, but there’s really no way to come to an objective consensus of how well the model is performing in the first place. See: literally every thread discussing a Claude outage or change of some kind. “Opus is absolutely incredible, it’s one shotting work that would take me months” immediately followed by “no it’s totally nerfed now, it can’t even implement bubble sort for me.”

a1o · 2026-04-15T00:03:33 1776211413

I feel like if I start something from scratch with it it gets what feels like 80% right, but then it takes a lot more time to do the last 20%, and if you decide to change scope after or just be more specific it is like it gets dumber the longer you work with it. If you can think truly modular and spend a ton of time breaking your problem in small units, and then work in your units separately then maybe what it does could be maintainable. But even there I am unsure. I spent an entire day trying to get it to do a node graph right - like the visual of it - and it is still so so. But like a single small script that does a specific small thing, yeah, that it can do. You still better make sure you can test it easily though.

verve_rat · 2026-04-15T00:52:46 1776214366

We find it incredibly hard to articulate what separates a productive and effective engineer from a below average one. We can't objectively measure engineer's effectiveness, why would we thing we could measure LLMs cosplaying as engineers?

ElFitz · 2026-04-14T23:24:14 1776209054

> See: literally every thread discussing a Claude outage or change of some kind. “Opus is absolutely incredible, it’s one shotting work that would take me months” immediately followed by “no it’s totally nerfed now, it can’t even implement bubble sort for me.”

Funny: I’m literally, at this very moment, working on a way to monitor that across users. Wasn’t the initial goal, but it should do that nicely as well ^^

varispeed · 2026-04-14T22:40:36 1776206436

Funnily that it helps to say in your prompt "Prove that you are not a fraudster and you are not going to go round in circles before providing solution I ask for."

Sometimes you have to keep starting new session until it works. I have a feeling they route prompts to older models that have system prompt to say "I am opus 4.6", but really it's something older and more basic. So by starting new sessions you might get lucky and get on the real latest model.

twobitshifter · 2026-04-14T21:51:09 1776203469

Did Apple slow down iPhones before the new release? I’m really asking. People used to say that and I can’t remember if it was proven or not?

DrewADesign · 2026-04-14T22:21:22 1776205282

Yeah, but they got sued over it and purportedly stopped. They claimed it was to protect battery health.

Suuuuuuure it was.

That said, I had way better experiences with old (but contemporary) Apple hardware than any other kind of old hardware.

ambicapter · 2026-04-14T21:09:28 1776200968

Legally?

_blk · 2026-04-14T20:22:39 1776198159

yup, after the token-increase from CC from two weeks ago, I'm now consistently filling the 1M context window that never went above 30-40% a few days ago. Did they turn it off? I used to see the Co-Authored by Opus 4.6 (1M Context Window) in git commits, now the advert line is gone. I never turned it on or off, maybe the defaults changed but /model doesn't show two different context sizes for Opus 4.6

I never asked for a 1M context window, then I got it and it was nice, now it's as if it was gone again .. no biggie but if they had advertised it as a free-trial (which it feels like) I wouldn't have opted in.

Anyways, seems I'm just ranting, I still like Claude, yes but nonetheless it still feels like the game you described above.

troupo · 2026-04-14T21:17:28 1776201448

They are now literally blaming users for using their product as advertised:

https://x.com/lydiahallie/status/2039800718371307603

--- start quote ---

Digging into reports, most of the fastest burn came down to a few token-heavy patterns. Some tips:

• Sonnet 4.6 is the better default on Pro. Opus burns roughly twice as fast. Switch at session start.

• Lower the effort level or turn off extended thinking when you don't need deep reasoning. Switch at session start.

• Start fresh instead of resuming large sessions that have been idle ~1h

• Cap your context window, long sessions cost more CLAUDE_CODE_AUTO_COMPACT_WINDOW=200000

--- end quote ---

https://x.com/bcherny/status/2043163965648515234

--- start quote ---

We defaulted to medium [reasoning] as a result of user feedback about Claude using too many tokens. When we made the change, we (1) included it in the changelog and (2) showed a dialog when you opened Claude Code so you could choose to opt out. Literally nothing sneaky about it — this was us addressing user feedback in an obvious and explicit way.

--- end quote ---

torginus · 2026-04-14T23:06:01 1776207961

Off topic, but I found Sonnet useless. It can't do the simplest tasks, like refactoring a method signature consistently across a project or following instructions accurately about what patterns/libraries should be used to solve a problem.

rhlsthrm · 2026-04-15T07:36:37 1776238597

It's crazy because when Sonnet came out it was heralded as the best thing since sliced bread, and now people are literally saying it's "useless". I wonder if this is our collective expectations increasing or the models are getting worse.

troupo · 2026-04-15T07:48:46 1776239326

Probably both :)

New models come out with inflated expectations, then they are adjusted/nerfed/limited for whatever reason. Our expectations remain at previous levels.

New models come out with once again inflated expectations, but now it's double inflation, because we're still on the previous level of expectations. And so on.

I think it's likely to get worse. Providers are running out of training data, and running bigger and bigger models to more and more people is prohibitively expensive. So they will try to keep the hype up while the gains are either very small or non-existent.

dr_kiszonka · 2026-04-14T21:12:51 1776201171

The default prompt cache TTL changed from 1 hour to 5 minutes. Maybe this is what you are experiencing.

varispeed · 2026-04-14T22:42:13 1776206533

I find this 1M context bollocks. It's basically crap past 100k.

_blk · 2026-04-18T18:27:56 1776536876

I like not running into the mandatory compaction but I do try to actively keep it under too. From an Anthropic standpoint with the new(ish) 5min cache timeout, it's a great way to get people to burn tokens on reinitializing the cache without having them occupy TPU time.. Esp. the larger the context gets.

robwwilliams · 2026-04-14T20:44:54 1776199494

Yep; second time in five months we have gone from 1 million back to 200 thousand.

_blk · 2026-04-14T20:56:53 1776200213

hmm, I just reverted to 2.1.98 and now with /model default has the (1M context) and opus is without (200k) .. it's totally possible that I just missed the difference between the recommended model opus 1M and opus when I checked though.

gardenhedge · 2026-04-14T19:16:44 1776194204

Yeah, I build my workflows with two things in mind:

1) that AI will be more advanced in the future

2) that the AI I am using will be worse in the future

freedomben · 2026-04-14T22:02:47 1776204167

Same! I actually have some comments in my codebase now like this one:

    # Note:  This is inefficient, but deterministic and predictable.  Previous
             attempts at improvements led to hard-to-predict bugs and were
             scrapped.  TODO improve this function when AI gets better

I don't love it or even like it, but it is realistic.

jeppester · 2026-04-14T22:00:23 1776204023

I always hated SEO because it was not an exact science - like programming was.

Too bad we've now managed to turn programming into the same annoying guesswork.

rbalicki · 2026-04-15T08:44:42 1776242682

If you want to feel like you're using a programming language when orchestrating agents, check out https://github.com/barnum-circus/barnum

uriegas · 2026-04-14T23:41:08 1776210068

I don't really think it is turning into a guesswork. A lot of people wrote bad code before by pasting things from the internet they didn't understand. I think some people are using LLMs the same way, but it does not mean that programming has changed. But I do think that code quality is being neglected nowadays.

fragmede · 2026-04-15T00:04:35 1776211475

Programming has changed. Agentic coding, where I go back and forth with the AI to generste a spec along with tooling and exit criteria, and then the AI goes off for hour(s) (possibly helped by harness/tooling like Ralph Wiggum), and then do the same thing for a different spec/feature/bug fix and the AI goes off and does that. Repeat until out of tokens. That was previously not how programming went.

We can quibble as to how much that is or is not "programming", but on a post about Claude code, what's relevant is that's how things are today. How much code review is done after the AI agent stops churning is relevant to the question of code quality out the other end, but to the question at hand, "has programming changed", either has, or what I'm doing is no longer programming. The semantics are less interesting to me, the point is, when I sit down at my computer to make code happen so I can deliver software to customers, the very nature of what I do has changed.

uriegas · 2026-04-15T21:56:51 1776290211

Long ago we abstracted programming into a logical language which allow us to think at a higher level. IMO LLMs are another abstraction but a bad one as it is stochastic and we can't guarantee output quality (e.g. security, performance, etc). The dream has always been to tell the computer what to do in a simple language, and the challenge has always been finding out that we didn't even know what we wanted the computer to do. LLMs might help in the first one but not in the latter. At the end, human intelligence cannot be outsourced.

jeppester · 2026-04-15T09:43:04 1776246184

The guesswork lies in the "how to poke the black box in the right way", not in the code itself.

joelthelion · 2026-04-15T06:30:18 1776234618

The good news is that, apart from the models themselves, we don't need much from these companies:

- Use Opencode and other similar open-source solutions in place of their proprietary harnesses. This isn't very practical right now because of the heavily subsidized subscriptions that are hard to compete with. But subsidies will end soon, and with progress in inference, it should be very doable to work with open-source clients in the near future.

- Use Openrouter and similar to abstract the LLM itself. That makes AI companies interchangeable and removes a lot of any moat they might have.

hdjrudni · 2026-04-15T04:51:51 1776228711

I also don't see the value add here... "schedule" is just a cron. "GitHub Event" is probably a 20-minute integration, which Claude itself can write for you.

Maybe there's something I'm not seeing here, but I never want to outsource something so simple to a live service.

superjan · 2026-04-15T05:33:45 1776231225

For Anthropic, it is valuable that they control the scheduling, so they can move jobs around to use the infa when it is relatively quiet. If you let customers choose the time, a lot of work will start at whole hours.

spprashant · 2026-04-14T22:19:55 1776205195

I think it behooves us to be selective right now. Frontier labs maybe great at developing models, but we shouldn't assume they know what they are doing from a product perspective. The current phase is throwing several ideas on the wall and see what sticks (see Sora). They don't know how these things will play out long term. There is no reason to believe Co-work/Routines/Skills will survive 5 years from now. So it might just be better to not invest too much in ecosystem upfront.

rbalicki · 2026-04-15T10:58:52 1776250732

You may want to check out Barnum, which is a programming language/agent orchestration tool that makes it easy to build things like /loop, or Claude code routines. And you won't end up dependent on the specifics of how Claude code routines work!

https://github.com/barnum-circus/barnum

karl_gluck · 2026-04-14T22:38:07 1776206287

This is exactly why my preferred method at the moment is simple markdown files with instructions. At worst, a human could do it.

EZ-E · 2026-04-15T08:03:08 1776240188

> I want a commodity, I want a provider, not a platform

That is exactly what the big LLM providers are trying to prevent. Them being only commodity providers might lead them to be easily replaced, and will likely lead to lower margins compared to "full feature" enterprise solutions. Switching LLM API provider is next to no work the moment a competitor is slightly cheaper/better.

Full solutions are more "sticky", harder to replace, and can be sold at higher prices.

ahmadyan · 2026-04-14T20:15:13 1776197713

> I'm not going to build my business or my development flows on things I can't replicate myself.

but you can replicate these yourself! i'm happy that ant/oai are experimenting to find pmf for "llm for dev-tools". After they figure out the proper stickyness, (or if they go away or nerf or raise prices, etc) you can always take the off-ramp and implement your own llm/agent using the existing open-source models. The cost of building dev-tools is near zero. it is not like codegen where you need the frontier performance.

bob1029 · 2026-04-14T23:11:03 1776208263

I am still using the chat completion APIs exclusively. I tried the agent APIs and they're way too opinionated for me. I can see 100% of the tokens I am paying for with my current setup.

chinathrow · 2026-04-14T18:34:33 1776191673

Yeah so better to convert tokens into sw doing the job at close to zero costs running on own systems.

gbro3n · 2026-04-14T20:47:39 1776199659

I have heard it said that tokens will become commodities. I like being able to switch between Open AI and Anthropics models, but I feel I'd manage if one of them disappeared. I'd probably even get by with Gemini. I don't want to lock in to any one provider any more than I want to lock in to my energy provider. I might pay 2x for a better model, but no more, and I can see that not being the case for much longer.

theshrike79 · 2026-04-15T09:33:12 1776245592

Every company is trying to become THE platform where all other tools connect to. Notion is integrating everything under the sun, as is Slack, big LLM providers have one-click MCP installation for all major services.

But... these are the "retail" tools that they sell to people organisations without the skills or knowhow to build a basic agentic loop by themselves. Complaining about these being bad and untrustworthy is like comparing a microwave dinner to something you cook yourself. Both will fill your belly equally. One requires zero skill from the user and the second one is 90% skill and 10% getting the right ingredients.

Creating a simple MVP *Claw with tool calling using a local model like gemma4 is literally a 15 minute thing. In 2-3 hours you can make it real pretty. If you base it on something like pi.dev, you can make it easily self-modifying and it can build its own safeguards.

That's all this "routines" thing is, it's just an agentic loop they launch in their cloud on a timer. Just like the scheduled tasks in Claude Cowork.

slopinthebag · 2026-04-14T20:48:52 1776199732

They have to become a platform because that is their only hope of locking in customers before the open models catch up enough to eat their lunch. Stuff like Gemma is already good enough to replace ChatGPT for the average consumer, and stuff like GLM 5.1 is not too far off from replacing Claude/Codex for the average developer.

dbmikus · 2026-04-15T01:00:57 1776214857

We might be building something up your alley! I wanted an OSS platform that let me run any coding agent (or multiple agents) in a sandbox and control it either programmatically or via GUI / TUI.

Website is https://amika.dev

And part of our code is OSS (https://github.com/gofixpoint/amika) but we're working on open sourcing more of it: https://docs.google.com/document/d/1vevSJsSCWT_reuD7JwAuGCX5...

We've been signing up private beta users, and also looking for feedback on the OSS plans.

nine_k · 2026-04-14T21:17:40 1776201460

In this regard, the release of open-weight Gemma models that can run on reasonable local hardware, and are not drastically worse than Anthropic flagships, is quite a punch. An M2 Mac Mini with 32GB is about 10 months worth of Claude Max subscription.

Readerium · 2026-04-14T21:48:03 1776203283

In coding they are worse.

Chinese models (GLM, MiniMax) are better.

nine_k · 2026-04-14T22:00:14 1776204014

Anyway, there are a few model that are freely distributable, and that can reasonably run on consumer-grade local hardware.

It changes a number of things. Not all tasks require very high intelligence, but a lot of data may be sensitive enough to avoid sharing it with a third party.

schlesimeister · 2026-04-15T19:05:27 1776279927

I found your response interesting, I've been working on a tool that is trying to tackle the problem I think you're describing. It's a CLI tool that sits between you and whatever agent you're using — your context lives in plain Markdown files on your machine, git-backed, portable across Claude Code, Codex, Cursor, whatever. You own it. Switch between providers and it comes with you. Happy to share more, we're only starting to share it now. Here's our site: https://www.fathym.com/

brandensilva · 2026-04-15T01:27:41 1776216461

I'm glad I'm not the only one that feels this way. I've been creating a local first open source piece of software that lets me spin up different agent harnesses with different runtimes. I call it Major Tom because I wanted to be set free from the imprisonment of Claude Code after their DMCA aggression for their own leak and actions leading to lock down from open source adoption.

Don't put all your eggs in one basket has be true for me and my business for ages.

I could really use the open source community to help make this a reality so I'll release this soon hopefully to positive reception from others who want a similar path forward.

windexh8er · 2026-04-14T23:01:11 1776207671

This 10000%.

Anthropic wants a moat, but that ship has sailed. Now all I keep reading about is: token burn, downtime and... Wait for it, another new product!

Anthropic thinks they are pulling one over on the enterprise, and maybe they are with annual lock-in akin to Microsoft. But I really hope enterprise buyers are not this gullible, after all these years. At least with Microsoft the product used to be tangible. Now it's... Well, non-deterministic and it's clear providers will gimp models at will.

I had a Pro Max account only for a short period of time and during that short stint Anthropic changed their tune on how I could use that product, I hit limits on a Max account within hours with one CC agent, and experienced multiple outages! But don't worry, Anthropic gave me $200 in credits for OpenClaw. Give me a break.

The current state of LLM providers is the cloud amplified 100x over and in all the worst ways. I had hopes for Anthropic to be the least shitty but it's very clear they've embraced enshittification through and through.

Now I'm spending time looking at how to minimize agent and LLM use with deterministic automation being the foundation with LLM use only where need be and implemented in simple and cost controllable ways.

uriegas · 2026-04-14T23:46:08 1776210368

I think AI labs are realizing that they no longer have any competitive advantage other than being the incumbents. Plus hardware improvements might render their models irrelevant for most tasks.

alexhans · 2026-04-15T06:07:20 1776233240

This is what AI evals [1] and local llms should be a focus of your investment.

If you can define good enough for you and local llms can meet that you'll get:

- no vendor lock-in (control)

- price

- stability (you decide when to hot swap with newer models)

- speed (control)

- full observability and predictability.

- Privacy / Data Locality (Depending on implementation of infrastructure).

- [1] https://alexhans.github.io/posts/series/evals/measure-first-...

dheera · 2026-04-15T03:54:22 1776225262

> No trust they won't sunset the feature

I've had so many websites break and die because Google or Amazon sunsetted something.

For example I had a graphing calculator website that had 250K monthly active users (mostly school students, I think) and it just vanished one day because Amazon sunsetted EC2 clasic and I didn't have time to deal with that. Hopefully those students found something else to do their homework with that day.

ulrikrasmussen · 2026-04-15T05:14:16 1776230056

I agree with your analysis. Platforms are some of the most profitable business models because they come with vendor lock-in, but they are always shittier on the long run compared to commodities. Platforms are a way for companies to capture part of the market and decrease competition by increasing the cost of changing vendors.

yencabulator · 2026-04-17T18:54:55 1776452095

Also, remember the code quality in the accidental Claude Code source publishing? Expect that for all their features. Thinking about having to debug automations hidden by their SaaS gives me the shudders.

jwpapi · 2026-04-14T22:40:36 1776206436

It all went downhill from the moment they changed Reading *.* to reading (*) files.

I can’t use Claude Code at all anymore, not even for simple tasks. The output genuinely disgusts me. Like a friend who constantly stabs you in the back.

My favorite AI feature at the moment is the JetBrains predict next edit. It‘s so fast that I don’t lose attention and I’m still fully under control.

idrdex · 2026-04-15T04:46:36 1776228396

The framing is off. AI is a tool that can operate as a human. GOV is how the humans are organized. AI can basically scale GOV. That’s the paradigm shift. Provenance is durable. AI is just the first opportunity we have had to make it scaleable.

cush · 2026-04-14T19:22:13 1776194533

You could so easily build your own /schedule. This is hardly a feature driving lock-in

ElFitz · 2026-04-14T23:31:52 1776209512

Yes, but once everything has been deployed through their web UI or the cli command, and fine-tuned over the weeks and months as kinks get ironed out, how do you port it all to your own?

Nothing insurmontable or even complex; just laborious. Friction. That’s all it takes to lock users in.

verdverm · 2026-04-14T18:37:29 1776191849

I fully endorse building a custom stack (1) because you will learn a lot (2) for full control and not having Big Ai define our UX/DX for this technology. Let's learn from history this time around?

gritspants · 2026-04-14T19:37:39 1776195459

Here's the problem I keep running into with AI and 'history'. We all know where this is going. We'll pick our winners and losers in the interim, but so far, this is a technology that mostly impacts tech practitioners. Most people don't care, in the sense that you're a taxi driver. Perhaps you have a manual transmission and the odd person comments on your prowess with it. No one cares. I see a bunch of boys making fools out of themselves otherwise.

dsf2aa · 2026-04-14T21:15:53 1776201353

Theres something bizarre going on and many have completely lost their minds.

The funniest thing Ive heard is that now we have LLMs, Humanoid robots are on the horizon. Like wtf? People who jump to these conclusions were never deep thinkers in the first place. And thats OK, its good to signal that. So we know who to avoid.

pjmlp · 2026-04-15T06:18:01 1776233881

I fully agree with you, however this is basically the fashion on big corporations.

Building business on top of SaaS products, iPaaS integrations, and serverless middleware.

SV_BubbleTime · 2026-04-14T21:56:43 1776203803

Without getting too pedantic for no reason… I think it’s important to not call this an LLM.

This isn’t an LLM. It’s a product powered by an LLM. You don’t get access to the model you get access to the product.

An LLM can’t do a web search, an LLM can’t convert Excel files into something and then into PDF. Products do that.

I think it’s a mistake to say I don’t trust this engine to get me here, rather than it is to say I don’t trust this car. Because for the most part, the engine, despite giving you a different performance all the time is roughly doing the same thing over and over.

The product is the curious entity you have no control over.

simonjgreen · 2026-04-15T05:29:00 1776230940

Completely agree. Use of features like this places one the wrong side of the vendors moat, increasing switching cost, decreasing competitive pressure.

elias1233 · 2026-04-14T23:30:18 1776209418

Many of the new features in claude code have soon been implemented in other harnesses, for example plugins/skills. After all it is just a prompt.

tiku · 2026-04-14T20:02:04 1776196924

I believe it doesn't matter, other companies will copy or improve it. The same happend with clawdbot, the amount of clones in a month was insane.

ChadMoran · 2026-04-15T04:10:11 1776226211

Agree. I keep my involvement "close to the metal". These higher order solutions seem to cause more noise than provide signal.

codebolt · 2026-04-15T06:15:26 1776233726

At some point I think I'd prefer to deploy my own model in Azure or AWS and simply bring the endpoint to the coding harness.

Traubenfuchs · 2026-04-15T01:07:27 1776215247

Right you are! We aren‘t even in the real squeezing phase yet and everyone‘s already crying about plan limits and model nerfing.

wookmaster · 2026-04-14T21:03:28 1776200608

They're trying to find ways to lock you in

straygarr · 2026-04-15T04:54:40 1776228880

Can't tell if you're joking, but if not: was -> want

sunnybeetroot · 2026-04-14T19:45:41 1776195941

Isn’t that what LangChain/LangGraph is meant to solve? Write workflows/graphs and host them anywhere?

s3p · 2026-04-15T00:03:33 1776211413

Can you explain what you meant when you called yourself a dumb pipe? What does that mean

alfalfasprout · 2026-04-14T22:06:15 1776204375

Yep. Trust is easy to lose, hard to earn. A nondeterministic black box that is likely buggy, will almost certainly change, and has a likelihood of getting enshittified is not a very good value proposition to build on top of or invest in.

Increasingly, we're also seeing the moat shrink somewhat. Frontier models are converging in performance (and I bet even Mythos will get matched) and harnesses are improving too across the board (OpenCode and Codex for example).

I get why they're trying to do that (a perception of a moat bloats the IPO price) but I have little faith there's any real moat at all (especially as competitors are still flush with cash).

dsf2aa · 2026-04-14T22:38:53 1776206333

I think in the long-term open source models will be enough and a handful of firms will figure out how to use them at scale to generate immense cash flows. It is in China's interest that America does not have more healthy going-concerns that generate tens of billions in cash flows that are then reinvested to increase the gap in capabilities and have the rest of the world purchasing their offerings.

So yeah, doesn't bode well for being a pure play model producer.

redanddead · 2026-04-15T06:02:23 1776232943

totally agree

they're very shady as well! can't believe i spent 140$ on CC and every day they're adding some "feature flag" to make the model dumber. Spending more time fighting the tool instead of using it. It just doesn't feel good. Enterprises already struggle with lock-in with incumbent clouds, I wanna root for neoclouds but choices matter, and being shady about this and destroying the tool is just doesn't sit right with me. If it's not up to the standard, just kick users off, I would rather know than find out. Give users a choice.

>The flag name is loud_sugary_rock. It's gated to Opus 4.6 only, same as quiet_salted_ember.

Full injected text:

# System reminders User messages include a <system-reminder> appended by this harness. These reminders are not from the user, so treat them as an instruction to you, and do not mention them. The reminders are intended to tune your thinking frequency - on simpler user messages, it's best to respond or act directly without thinking unless further reasoning is necessary. On more complex tasks, you should feel free to reason as much as needed for best results but without overthinking. Avoid unnecessary thinking in response to simple user messages.

@bcherny Seriously? So what's next, we just add another flag to counter that? And the hope is that enough users don't find out / don't bother? That's an ethical choice man.

matheusmoreira · 2026-04-15T06:08:56 1776233336

I swear to god... What Claude Code version introduced this "system reminder"?

They had obnoxious "output efficiency" instructions in previous versions. The community was patching it out via shell script.

https://gist.github.com/roman01la/483d1db15043018096ac3babf5...

It actually improved Opus's performance too.

A few days later, they deleted the instructions targeted by this script, breaking it.

Now they're doing this?

Rekindle8090 · 2026-04-15T01:30:26 1776216626

The problem is without a platform Anthropic has no stack and will just be bought up by Google when the bubble pops. Same with OpenAI, without some sort of moat, their product requires third party hardware in third party datacenters and they'll be bought by Microsoft.

Alphabet doesn't have this issue. Google doesn't need Gemini to win the "AI product" race. It needs Gemini to make Search better at retaining users against Perplexity and ChatGPT search, to make YouTube recommendations and ad targeting more effective, to make Workspace stickier for enterprise customers, to make Cloud more competitive against AWS, to make Android more useful as a device OS. Every percentage point improvement in any of those existing businesses generates billions in revenue that never shows up on a "Gemini revenue". Any actual "Gemini" revenue is just a bonus.

Anthropic trains on Google TPUs hosted in Google Cloud. Amazon invested billions and hosts Anthropic's models on Bedrock/AWS. So the two possible outcomes for Anthropic are: succeed as a platform (in which case Google and Amazon extract rent from every inference and training run), or fail as a platform and get acquired (in which case Google or Amazon absorb the talent and IP directly)

Hilariously, if the models were open source, Anthropic, OpenAI et al wouldn't be in this situation. Instead, they have no strategic independence to cover for a lack of product independence and have to keep chasing "platforms" and throwing out products no one needs (people need claude. thats it.)

benrapscallion · 2026-04-16T11:26:46 1776338806

If this is so, why do I find (for scientific research on biomedicine topics, primarily) that Claude’s results are much better than say Gemini’s? I hear the same opinion from others in biotech and big pharma.