I see people highly trained engineers spend hundreds of thousand of tokens doing what can reliably be accomplished with 150 lines of python.
I think the push from management for us to use AI has made it so we don’t have to be efficient with our consumption, so now we write md files which we feed to Claude in a loop instead of python and bash scripts to do routine tasks.
> I think the push from management for us to use AI has made it so we don’t have to be efficient with our consumption, so now we write md files which we feed to Claude in a loop instead of python and bash scripts to do routine tasks.
We're all being measured to AI usage, so...
Instead of doing a grep | uniq | awk which would give me an answer in 100 milliseconds for free, I launch a prompt to spend 30 seconds on it and will cost some actual money.
I hope we get over this phase of the hype soon. I want and will use AI as a tool, but it's just another (good) tool in the toolbox.
When I need to do a one-off investigation, it's great to use AI and spend 5-10 minutes querying and get my answer for $5 or so, instead of having to spend 2-3 hours writing a script which I'll discard. That's a great use case.
But using AI for routine processing done daily where writing a script would be amortized over thousands of runs, it's insane. I'd rather use AI to write the script and then don't need the AI anymore, the script will be faster and free. Oh but then my AI usage in the executives report drops. Can't have that. Waste away.
>>> Instead of doing a grep | uniq | awk which would give me an answer in 100 milliseconds for free, I launch a prompt to spend 30 seconds on it and will cost some actual money.
I'd argue it will be even worse, you still remember how to do it, next gen of SDEs won't even try.
It's not just push from mangement; AI firms themselves really aggressively market this idea of AI replacing everything. It's not "allowed" to be a mere tool, useful for some but not other tasks, it's gotta (be able to) do everything.
Part of that is the ridiculous belief that they can create "AGI" by just glueing together enough LLMs.
Presumably it's also financial viability. You can't charge thousands a month without replacing those "highly trained engineers" with a bunch of kids in the developing world.
> AI firms themselves really aggressively market this idea of AI replacing everything. It's not "allowed" to be a mere tool, useful for some but not other tasks, it's gotta (be able to) do everything.
That's marketing. It's up to management to decide to fall for it blindly or look at it with healthy skepticism. It seems a lot of them chose the former.
> I think the push from management for us to use AI has made it so we don’t have to be efficient with our consumption, so now we write md files which we feed to Claude in a loop instead of python and bash scripts to do routine tasks.
It's worse than that, in many cases management actively rewards inefficiency. It's like Friedman's "why not spoons?"
I think 150 lines of python is rather a lot. That would take at least an hour to write, and can run up to a few days (depending on complexity and familiarity with the task).
They optimize because their work requires them to. 100k tokens is a few bucks and a couple minutes, then 15 more minutes to verify that the output does what it's intended for reasonably well, so it's more like $50 in total cost.
For an engineer paid $100/hr to write a 150 line Python script and test it to the same extent could take a few hours, so the total costs rise meaningfully.
I just suggest you use Claude to write the script for you. And then you run the script with cron. Really it’s not any more time, just takes a different view on what the goal is.
Give me a python script that takes a string representing the output of a sha256 algorithm and a plain string and compares if the sha256 of plain strig matches the sha256 provided.
LLM Memeory (in general, any implementation) is good in theory.
In practice, as it grows it gets just as messy as not having it.
In the example you have on front page you say “continue working on my project”, but you’re rarely working on just one project, you might want to have 5 or 10 in memory, each one made sense to have at the time.
So now you still have to say, “continue working on the sass project”, sure there’s some context around details, but you pay for it by filling up your llm context , and doing extra mcp calls
The tokenizer is an important part of overall model training and performance. It’s only one piece of the overall cost per request. If a tokenizer that produces more tokens also leads to a model that gets to the correct answer more quickly and requires fewer re-prompts because it didn’t give the right answer, the overall cost can still be lower.
Comparisons are still ongoing but I have already seen some that suggest that Opus 4.7 might on average arrive at the answer with fewer tokens spent, even with the additional tokenizer overhead.
How would it be a money grab? If the new tokenizer requires more tokens to encode the same information, it costs them more money for inference. The point of charging per token is that the cost is proportional to the number of tokens. That's my understanding anyway
If model provider believes they have a better model, it can be a viable bet. But many (me included) started experimenting with other providers because of enshittification from Anthropic (price + uptime). Only to find, that Codex is not that worse in quality for a significantly more output per $.
There are no conspiracies where a corporation has profit incentive. There is perhaps a question of planning and initial intentionality, but the metrics and motivation to continue are clear enough.
Not necessarily with speculative decoding. Whitespace would be trivial to predict and they would petty much keep using the same amount of compute as before.
I don't think that's their primary motive for doing this but it is a side effect.
If they wanted they could always just double the $/token. They don't seem to be able to keep up with their current demand and that's what companies normally do in that circumstance if they're looking to money grab, no need for the bankshot approach.
Yep integration is my use case as well. It’s nice to not have to worry about setting up real infrastructure for testing client code, parallel execution if you’re on a team, clearing out state each run, etc…
But to actually answer the question:
I’ve been putting research paper pdfs in notebook llm , and turning them into ~40 minute podcasts which I listen to on my walks.
Yes it’s shallow learning, and it might have some hallucinations in there but I wouldn’t have read some of those otherwise.
yes, we're releasing an official mobile sdk and inference engine very soon. if you want to use something until then, some folks from the oss community have built ways to run kitten on ios. if you search kittentts ios on github you should find a few.
if you cant find it, feel free to ping me and i can help you set it up. thanks a lot for your support and feedback!
There’s the scenario where LLMs get more efficient in size, and to get 2026 SOTA performance you will be able to get it from consumer grade laptop.
Sure with a 1000B parameter you will get better performance but the average person will have it write some python script, not derive new physics equations.
So in a sense the demand for LLM intelligence with reach a plateau (arguably we are there today for avg person) so there will not be any subsidy required, because the avg person will not need the latest and greatest.
There’s not the same demand pattern for something like uber.
> There’s the scenario where LLMs get more efficient in size, and to get 2026 SOTA performance you will be able to get it from consumer grade laptop.
But isn't that bad for the AI companies, too? Because then people just run an ~2026 SOTA performance open source model on their laptop for free and not pay any subscription.
Regular folks will not pay Anthropic, but NSA, NASA or research labs might.
I’m not implying this will be a good time for AI companies. I am saying AI as a technology can provide value without it being controlled by only 3 companies.
In a hypothetical future with 2026 level LLMs on a (high end) consumer laptop, I still think that majority of buyers would prefer to pay 20 USD/month for a service. Just for the convenience and flexibility.
> In a hypothetical future with 2026 level LLMs on a (high end) consumer laptop, I still think that majority of buyers would prefer to pay 20 USD/month for a service. Just for the convenience and flexibility.
$20 a month is a lot of money, I don't think the "convenience and flexibility" you get would actually be worth it, unless you've 1) got money to burn, 2) lack the skills to install software, 3) the open source community totally fails to develop a reasonable installer. The LLM service would probably be akin to a scam preying on ignorance, like those companies that will rent you a water softener for like $100/month.
It is a lot compared to what? I believe that a LLM capable laptop will cost considerably more than something that is good-enough for non-LLM productivity tasks. At least within the next 5 years. Say that it would cost 600 USD more, that would buy 30 months of subscription. It is this kind of scenario I think many people will favor the subscription.
I feel like you’re on to something. Management will pick this up, and make it part of the sprint planning.
Engineers will pull out their hair wondering how you can do that.
That’s like estimating how many CPU cycles a task will take. How many instructions will your laptop use while you work on something.
reply