More

great_psy · 2026-05-02T09:15:02 1777713302

This is the first time I heard about estimating tokens for a task.

I feel like you’re on to something. Management will pick this up, and make it part of the sprint planning.

Engineers will pull out their hair wondering how you can do that.

That’s like estimating how many CPU cycles a task will take. How many instructions will your laptop use while you work on something.

great_psy · 2026-05-01T09:27:46 1777627666

Would it be possible to do some prompt injection so this string has the same effect but from a website ?

So if you add some special string to the docs, it stops Claude ?

great_psy · 2026-04-27T06:15:40 1777270540

I see people highly trained engineers spend hundreds of thousand of tokens doing what can reliably be accomplished with 150 lines of python.

I think the push from management for us to use AI has made it so we don’t have to be efficient with our consumption, so now we write md files which we feed to Claude in a loop instead of python and bash scripts to do routine tasks.

jjav · 2026-04-27T07:42:28 1777275748

> I think the push from management for us to use AI has made it so we don’t have to be efficient with our consumption, so now we write md files which we feed to Claude in a loop instead of python and bash scripts to do routine tasks.

We're all being measured to AI usage, so...

Instead of doing a grep | uniq | awk which would give me an answer in 100 milliseconds for free, I launch a prompt to spend 30 seconds on it and will cost some actual money.

I hope we get over this phase of the hype soon. I want and will use AI as a tool, but it's just another (good) tool in the toolbox.

When I need to do a one-off investigation, it's great to use AI and spend 5-10 minutes querying and get my answer for $5 or so, instead of having to spend 2-3 hours writing a script which I'll discard. That's a great use case.

But using AI for routine processing done daily where writing a script would be amortized over thousands of runs, it's insane. I'd rather use AI to write the script and then don't need the AI anymore, the script will be faster and free. Oh but then my AI usage in the executives report drops. Can't have that. Waste away.

BoneShard · 2026-04-27T15:23:38 1777303418

>>> Instead of doing a grep | uniq | awk which would give me an answer in 100 milliseconds for free, I launch a prompt to spend 30 seconds on it and will cost some actual money.

I'd argue it will be even worse, you still remember how to do it, next gen of SDEs won't even try.

SlinkyOnStairs · 2026-04-27T07:06:51 1777273611

It's not just push from mangement; AI firms themselves really aggressively market this idea of AI replacing everything. It's not "allowed" to be a mere tool, useful for some but not other tasks, it's gotta (be able to) do everything.

Part of that is the ridiculous belief that they can create "AGI" by just glueing together enough LLMs.

Presumably it's also financial viability. You can't charge thousands a month without replacing those "highly trained engineers" with a bunch of kids in the developing world.

cassianoleal · 2026-04-27T14:30:48 1777300248

> AI firms themselves really aggressively market this idea of AI replacing everything. It's not "allowed" to be a mere tool, useful for some but not other tasks, it's gotta (be able to) do everything.

That's marketing. It's up to management to decide to fall for it blindly or look at it with healthy skepticism. It seems a lot of them chose the former.

lmm · 2026-04-27T06:28:34 1777271314

> I think the push from management for us to use AI has made it so we don’t have to be efficient with our consumption, so now we write md files which we feed to Claude in a loop instead of python and bash scripts to do routine tasks.

It's worse than that, in many cases management actively rewards inefficiency. It's like Friedman's "why not spoons?"

jfengel · 2026-04-29T14:14:20 1777472060

I think 150 lines of python is rather a lot. That would take at least an hour to write, and can run up to a few days (depending on complexity and familiarity with the task).

emil-lp · 2026-04-30T21:15:03 1777583703

You should do some competitive coding.

Many, if not most, medium to hard problems require at least 100–150 lines of code. With experience, you can write it, bugfree, in 15 minutes.

It's about thinking ahead and experience.

pllbnk · 2026-04-27T06:53:28 1777272808

They optimize because their work requires them to. 100k tokens is a few bucks and a couple minutes, then 15 more minutes to verify that the output does what it's intended for reasonably well, so it's more like $50 in total cost.

For an engineer paid $100/hr to write a 150 line Python script and test it to the same extent could take a few hours, so the total costs rise meaningfully.

great_psy · 2026-04-27T18:47:14 1777315634

I just suggest you use Claude to write the script for you. And then you run the script with cron. Really it’s not any more time, just takes a different view on what the goal is.

imtringued · 2026-04-27T07:32:20 1777275140

Yeah they rise from $100/hr to $150/hr.

great_psy · 2026-04-26T07:10:32 1777187432

I think the article is making the point that it is a cultural problem about cost cutting and short term thinking.

Meirambek_VIDI · 2026-04-26T08:09:43 1777190983

Yeah, agreed - short-term incentives seem to drive a lot of this. Do you think tools can help, or is it mostly cultural?

great_psy · 2026-04-26T14:13:28 1777212808

Give me a python script that takes a string representing the output of a sha256 algorithm and a plain string and compares if the sha256 of plain strig matches the sha256 provided.

solid_fuel · 2026-04-27T03:12:48 1777259568

You can't fix a cultural problem with tooling. I don't understand why silicon valley hasn't taken this lesson to heart yet.

great_psy · 2026-04-25T07:06:08 1777100768

LLM Memeory (in general, any implementation) is good in theory.

In practice, as it grows it gets just as messy as not having it.

In the example you have on front page you say “continue working on my project”, but you’re rarely working on just one project, you might want to have 5 or 10 in memory, each one made sense to have at the time.

So now you still have to say, “continue working on the sass project”, sure there’s some context around details, but you pay for it by filling up your llm context , and doing extra mcp calls

vasco · 2026-04-25T08:15:42 1777104942

And once you're being specific about what it needs to remember you are 0 steps away from having just told AI to write and read files with the "memory"

dennisy · 2026-04-25T07:45:15 1777103115

True! But this is a very naive implementation, a proper implementation could surpass these challenges.

awestroke · 2026-04-25T09:42:21 1777110141

Well let's talk again when the problems have been solved, then. Until then, manually curated skills and documentation will beat this

great_psy · 2026-04-20T04:38:30 1776659910

Is there any provided reason from anthropic why they changed the tokenizer ?

Is there a quality increase from this change or is it a money grab ?

Aurornis · 2026-04-20T05:25:26 1776662726

The tokenizer is an important part of overall model training and performance. It’s only one piece of the overall cost per request. If a tokenizer that produces more tokens also leads to a model that gets to the correct answer more quickly and requires fewer re-prompts because it didn’t give the right answer, the overall cost can still be lower.

Comparisons are still ongoing but I have already seen some that suggest that Opus 4.7 might on average arrive at the answer with fewer tokens spent, even with the additional tokenizer overhead.

So, no, not a money grab.

ChadNauseam · 2026-04-20T06:30:24 1776666624

How would it be a money grab? If the new tokenizer requires more tokens to encode the same information, it costs them more money for inference. The point of charging per token is that the cost is proportional to the number of tokens. That's my understanding anyway

abrookewood · 2026-04-20T06:47:19 1776667639

Because everyone burns through their limits much faster, forcing them to upgrade to higher limits or new tiers.

Jtarii · 2026-04-20T11:54:54 1776686094

I think someone would much sooner switch to a competitor than up their tier.

dandaka · 2026-04-20T12:40:15 1776688815

If model provider believes they have a better model, it can be a viable bet. But many (me included) started experimenting with other providers because of enshittification from Anthropic (price + uptime). Only to find, that Codex is not that worse in quality for a significantly more output per $.

simianwords · 2026-04-20T09:42:43 1776678163

They could just increase the token cost no? There’s little need for cute conspiracies like these

sumeno · 2026-04-20T13:35:44 1776692144

They would have to tell people if they did that.

svnt · 2026-04-20T17:57:26 1776707846

There are no conspiracies where a corporation has profit incentive. There is perhaps a question of planning and initial intentionality, but the metrics and motivation to continue are clear enough.

msp26 · 2026-04-20T10:11:15 1776679875

Not necessarily with speculative decoding. Whitespace would be trivial to predict and they would petty much keep using the same amount of compute as before.

I don't think that's their primary motive for doing this but it is a side effect.

Symmetry · 2026-04-20T12:59:55 1776689995

If they wanted they could always just double the $/token. They don't seem to be able to keep up with their current demand and that's what companies normally do in that circumstance if they're looking to money grab, no need for the bankshot approach.

nl · 2026-04-20T07:44:02 1776671042

It's a better model in my usage. I have benchmarks.

great_psy · 2026-04-16T03:39:55 1776310795

What kind of things would you use this to simulate ?

Is this to make sure services are wired up with permission and communicate properly before you deploy or what is the use case ?

Also, why not have a dev environment where you can test things on the real thing (ideally with smaller size instances to save $)

giza182 · 2026-04-16T05:37:51 1776317871

Intergration testing is one. You can run this in your ci/locally to speed up tests. Or just for local dev. We do this with localstack[1]

[1] https://www.localstack.cloud/

ozarkerD · 2026-04-16T13:00:55 1776344455

Yep integration is my use case as well. It’s nice to not have to worry about setting up real infrastructure for testing client code, parallel execution if you’re on a team, clearing out state each run, etc…

great_psy · 2026-03-28T16:29:01 1774715341

Does this post feel AI generated to anyone else ?

But to actually answer the question: I’ve been putting research paper pdfs in notebook llm , and turning them into ~40 minute podcasts which I listen to on my walks. Yes it’s shallow learning, and it might have some hallucinations in there but I wouldn’t have read some of those otherwise.

great_psy · 2026-03-19T16:33:34 1773938014

Thanks for working on this!

Is there any way to get those running on iPhone ? I would love to have the ability for it to read articles to me like a podcast.

rohan_joshi · 2026-03-19T16:49:50 1773938990

yes, we're releasing an official mobile sdk and inference engine very soon. if you want to use something until then, some folks from the oss community have built ways to run kitten on ios. if you search kittentts ios on github you should find a few. if you cant find it, feel free to ping me and i can help you set it up. thanks a lot for your support and feedback!

great_psy · 2026-02-24T04:07:39 1771906059

Actually I think there’s another option.

There’s the scenario where LLMs get more efficient in size, and to get 2026 SOTA performance you will be able to get it from consumer grade laptop.

Sure with a 1000B parameter you will get better performance but the average person will have it write some python script, not derive new physics equations.

So in a sense the demand for LLM intelligence with reach a plateau (arguably we are there today for avg person) so there will not be any subsidy required, because the avg person will not need the latest and greatest.

There’s not the same demand pattern for something like uber.

palmotea · 2026-02-24T05:00:51 1771909251

> There’s the scenario where LLMs get more efficient in size, and to get 2026 SOTA performance you will be able to get it from consumer grade laptop.

But isn't that bad for the AI companies, too? Because then people just run an ~2026 SOTA performance open source model on their laptop for free and not pay any subscription.

great_psy · 2026-02-24T05:17:21 1771910241

Yes and no.

Regular folks will not pay Anthropic, but NSA, NASA or research labs might.

I’m not implying this will be a good time for AI companies. I am saying AI as a technology can provide value without it being controlled by only 3 companies.

jononor · 2026-02-24T07:30:45 1771918245

In a hypothetical future with 2026 level LLMs on a (high end) consumer laptop, I still think that majority of buyers would prefer to pay 20 USD/month for a service. Just for the convenience and flexibility.

palmotea · 2026-02-24T16:59:41 1771952381

> In a hypothetical future with 2026 level LLMs on a (high end) consumer laptop, I still think that majority of buyers would prefer to pay 20 USD/month for a service. Just for the convenience and flexibility.

$20 a month is a lot of money, I don't think the "convenience and flexibility" you get would actually be worth it, unless you've 1) got money to burn, 2) lack the skills to install software, 3) the open source community totally fails to develop a reasonable installer. The LLM service would probably be akin to a scam preying on ignorance, like those companies that will rent you a water softener for like $100/month.

jononor · 2026-02-24T17:59:48 1771955988

It is a lot compared to what? I believe that a LLM capable laptop will cost considerably more than something that is good-enough for non-LLM productivity tasks. At least within the next 5 years. Say that it would cost 600 USD more, that would buy 30 months of subscription. It is this kind of scenario I think many people will favor the subscription.