If you are paying API rates (not using Max subscriptions) there's no reason to use Anthropic's API directly, the same models are hosted by both AWS and Google with better uptime than Anthropic.
How do things like prompt caching etc play into that? Would I theoretically have a more stable harness backing my usage?
Im seriously over the current claude experience. After seemingly fixing my 4.6 usage by disabling adaptive thinking and moving to max effort, it seems that the release of 4.7 has broken that workflow and Im 99% certain that disabling adaptive thinking does nothing even on 4.6 now. Just egregious errors in 2 days this week after coming back from vacation.
Im looking at moving to Pi and I like the minimal nature, but I disagree with a handful of decisions they make. So Id likely need to maintain a fork which is less than ideal.
What decisions is Mario making that you disagree with? My impression is Pi is minimal so any changes can live on top of Pi without needing to maintain a fork?
I started developing my own coding agent after using Pi for a couple months, so I’m curious what you don’t like about pi.
When I hear Mario talk about pi and his approach I find myself agreeing with a lot of it. But I also find myself agreeing with a lot of the points from this https://www.thevinter.com/blog/bad-vibes-from-pi
the opinions in question are that bash should be enabled by default with no restrictions, that the agent should have access to every file on your machine from the start, and that npm is the only package manager worth supporting. Bold choices.
To save others a click, though the article is worth reading.
He also mentions no subagents by default in pi as well.
That (and oh-my-pi) seem like an excessive swing in the other direction. Im all for the simplicity and minimalism of pi. There are just a few fundamental things that need updated (mainly subagent context and open-by-default security model).
pi for the win, i have my own ai extend it when i want more specific features. vibe coded in 20 minutes shift+tab like claude code to add permission control.
I find it so funny that many of these harnesses sound like black magic and are completely mystical to me. I use Claude Code every day and yet i can't imagine the workflow of Pi. I also don't care to pay API rates just to experiment with them.
Largely though i'm happy with Claude Code w\ IDE integration, so i don't feel the need to migrate. Nonetheless i'm curious.
You may want to optimize the content serving a bit, since it's currently hotlinking multiple large (30MB) videos at 2K resolution from https://svs.gsfc.nasa.gov.
Yes, you're right about optimization, this what I'll do:
- Switch default to 1024p instead of 2048p so file drops from ~30 MB → ~8 MB (4× smaller)
- Proxy through existing Cloudflare Worker with edge cache
- Add a /video/* route that fetches NASA URL once, caches at the edge
- After 1st request per region, every subsequent visitor gets it from Cloudflare PoP
= NASA bandwidth: ~50 hits/day instead of ~10,000! NASA load drops by ~99%
Sonnet/Claude Code may technically be "smarter", but Qwen3-Coder on Cerebras is often more productive for me because it's just so incredibly fast. Even if it takes more LLM calls to complete a task, those calls are all happening in a fraction of the time.
We must have very different workflows, I am curious about yours. What tools are you using and how are you guiding Qwen3-Coder? When I am using Claude Code, it often works for 10+ minutes at a time, so I am not aware of inference speed.
You must write very elaborate prompts for 10 minutes to be worth the wait. What permissions are you giving it and how much do you care about the generated code? How much time did you spend on initial setup?
I‘ve found that the best way for myself to do LLM assisted coding at this point in time is in a somewhat tight feedback loop. I find myself wanting to refine the code and architectural approaches a fair amount as I see them coming in and latency matters a lot to me here.
2 minutes is the worst delay. With 10 minutes, I can and do context switch to something else and use the time productively. With 2 min, I wait and get frustrated and bored.
Context switching makes you less productive compared to if you could completely finish one task before moving to the other though. in the limit an LLM that responds instantly is still better.
> Sonnet/Claude Code may technically be "smarter", but Qwen3-Coder on Cerebras is often more productive for me because it's just so incredibly fast.
Saying "technically" is really underselling the difference in intelligence in my opinion. Claude and Gemini are much, much smarter and I trust them to produce better code, but you honestly can't deny the excellent value that Qwen-3, the inference speed and $50/month for 25M tokens/per day brings to the table.
Since I paid for the Cerebras pro plan, I've decided to force myself to use it as much as possible for the duration of the month for developing my chat app (https://github.com/gitsense/chat) and here so some of my thoughts so far:
- Qwen3 Coder is a lot dumber when it comes to prompting as Gemini and Claude are much better at reading between the lines. However since the speed is so good, I often don't care as I can go back to the message and make some simple clarifications and try again.
- The max context window size of 128k for Qwen 3 Coder 480B on their platform can be a serious issue if you need a lot of documentation or code in context.
- I've never come close to the 25M tokens per day limit for their Pro Plan. The max I am using is 5M/day.
- The inference speed + a capable model like Qwen 3 will open up use cases most people might not have thought of before.
I will probably continue to pay for the $50 dollar plan for these use cases.
1. Applying LLM generated patches
Qwen 3 coder is very much capable of applying patches generated by Sonnet and Gemini. It is slower than what https://www.morphllm.com/ provides but it is definitely fast enough for most people to not care. The cost savings can be quite significant depending on the work.
2. Building context
Since it is so fast and because the 25M token limit per day is such a high limit for me, I am finding myself loading more files into context and just asking Qwen to identify files that I will need and/or summarize things so I can feed it into Sonnet or Gemini to save me significant money.
3. AI Assistant
Due to it's blazing speed, you can analyze a lot data fast for deterministic searches and because it can review results at such a great speed, you can do multiple search and review loops without feeling like you are waiting forever.
Given what I've experienced so far, I don't think Cerebras can be a serious platform for coding if Qwen 3 Coder is the only available model. Having said that, given the inference speed and Qwen being more than capable, I can see Cerebras becoming a massive cost savings option for many companies and developers, which is where I think they might win a lot of enterprise contracts.
SlateDB offers different durability levels for writes. By default writes are buffered locally and flushed to S3 when the buffer is full or the client invokes flush().
While your technical analysis is excellent, making judgements about workload suitability based on a Preview release is premature. Preview services have historically had significantly lower performance quotas than GA releases. Lambda for example was limited to 50 concurrent executions during Preview, raised to 100 at GA, and now the default limit is 1,000.
Grok are the first models I am boycotting on purely environmental grounds. They built their datacenter without sufficient local power supply and have been illegally powering it with unpermitted gas turbine generators until that capacity gets built, to the significant detriment of the local population.
Imagine needing electricity and government contracts so much that you spend $250M to get somebody elected president, and the second thing your guy does in office is cancel all of the projects that could provide you with more electricity.
And then he posts about it literally every day moaning about the lack of power, lack of solar, etc. All the things he bitches and moans about are things he caused by helping elect the orange fella.
By his own words, Elon is not an environmentalist and doesn’t seem to believe much in humanity’s impact on the climate. His concern is with the futility of relying on a non-renewable resource. He believes there is significantly more lithium than there is oil, I guess.
In the end, incentives are all that matter. Do hotels care deeply about the environment, or are they interested in saving in energy and labor costs as your towel is cleaned? Does it matter? Does moralizing really get us anywhere if our ends are the same?
I know it’s in vogue to dump on Elon these days, and with good reason, but do I not recall him on a number of occasions quite emotionally describing our continued CO2 emissions as the dumbest experiment in human history?
Yeah, but he flip-flops on the daily. He used to post about how LGBT positive Tesla was and post pride flags on his feed and now he's trying to burn the planet to the ground every time he hears about anyone that isn't a straight white man.
You do, and then at some point, likely during a late night ketamine binge, he went full redpill on twitter and decided the only thing that matters is “owning the libs”.
If that means embracing fossil fuels, so be it. Destroy the “woke mind virus at any cost”. That being said, I think he is delusional enough that he thought allowing nazi propaganda on twitter would convince conservatives to start buying teslas and is completely lost at this point.
That's just one facet of EVs that is severely overplayed in my book. They have plenty of other benefits, but for some of us the environmental aspect is a "nice-to-have".
I'm inclined to say the exact opposite about EVs. They take up as much space as internal combustion engine vehicles (in terms of streets, highways and parking lots), are just as fatal to pedestrians, make cities and neighborhoods less livable, cost in the tens of thousands of dollars, create traffic jams... the primary benefit is reducing our dependence on fossil fuels and generating less CO2. That's the number one differentiator. Faster acceleration, etc. is a nice-to-have.
> the primary benefit is reducing our dependence on fossil fuels and generating less CO2
for many, it's not even that. I like EVs primarily because I'm a tech-savvy person and like computers on wheels. but I'm also aware of their numerous downsides.
I care enormously about protecting the environment and stopping climate change, but I'm not an environmentalist.
Environmentalists usually care about the environment for its own sake, but my concern is our own survival. Similarly, I don't intrinsically care about plastic in the ocean, but our history of harming ourselves with waste we think is harmless would justify applying the precautionary principle there too.
As far as Musk goes, it's hard to track what he actually believes versus what he has said to troll, kowtow to Trump or "own the libs", but he definitely believes in anthropogenic climate change and he has been consistent on that. He seems to sometimes doubt the predictions of how quick it will occur and, most of all, how quickly it will impact us.
I think there probably is a popular tendency to overstate the predictive value of certain forecasts by simply grouping all climate science together. In reality, the forecasts have tended to be extremely accurate for the first order high level effects (i.e. X added carbon leads to Y temperature increase), but downstream of that the picture becomes more mixed. Particularly poor have been predictions of tipping points, or anything that depends on how humans will be affected by, or react to, changes in the environment.
Yes, Elon is probably playing fast and loose with the rules, but his 150MW of turbines are right next to the TVA's 1100MW of turbines and a steel mill. Not surprising given that it's a heavy industrial area, it's about 4 miles from any significant number of houses. There are plenty of good reasons to hate on Elon, but IMO this ain't it.
If thats the yard stick you should boycott everything coming out of china, which is pretty much everything, since they are one of the largest polluters globally.
Wouldn't you expect the country with the most manufacturing and one of the biggest population to also have the biggest pollution?
I feel you'd need to adjust the sum total by something, capita, or square footage or be more specific like does a manufacturing X in China pollute more than an equivalent one in the US, etc.
Not all goods and services involve the same process, some come with more pollution.
For example, Nvidia will contribute to a big chunk of US GDP, but it only designs the chips, which won't have the same pollution impact as the country in which they'll have it manufactured.
Doesn't really make sense in my opinion. Why boycott a specific group of people for their collective emissions when their individual emissions are lower than many others? The latter is the important metric, else you're simply punishing them for having a large population.
Well Elon really worked hard to get that done. Campaigning for the guy who is cancelling in-progress solar and wind projects and claiming the feds will never approve another green energy plant.
Most people aren't programming or operating heavy machinery at 4AM, either. Most power is consumed in the day, and most AI will be leveraged in the day.
(1) the utilization factor over the obsolescence-limited "useful" life of the hardware;
(2) the short-term (sub-month) training job scheduling onto a physical cluster.
For (1) it's acceptable to, on average, not operate one month per year as long as that makes the electricity opex low enough.
For (2) yeah, large-scale pre-training jobs that spend millions of compute on what's overall "one single" job, those are often ok to wait a few days to a very few weeks as would be from just dropping HPC cluster system operation to standby power/deep sleep on the p10 worst days each year as far as renewable yield in the grid-capacity-limited surroundings of the datacenter goes.
And if you can further run systems a little power-tuned rather than performance-tuned when power is less plentiful, to where you may average only 90% theoretical compute throughput during cluster operating hours (this is in addition to turning it off for about a month worth of time), you could reduce power production and storage capacity a good chunk further.
China controls 80% of the supply chain for solar and has most of the rare earth magnets needed for wind. Since China is America’s bugbear and containing China’s influence is a bipartisan issue, this was a likely outcome whoever is in office
We don't have to guess what the most likely outcome might have been, someone else was in office 7 months ago so we can just look at what they were doing.
Were they "cancelling in-progress solar and wind projects and claiming the feds will never approve another green energy plant"? That's the "likely outcome" we're discussing.
Yes, the US has been scaling back on China-sourced renewable energy supply chains since 2023 at least, with tariffs and by removing incentives
Not exactly your wording at that time, but my point still stands that the outcome was going to be the same because the imports were heavily skewed towards China. This has all been in motion before this current admin
Technology does not exist separately from society and culture, and in the last few decades has arguably made a lot of the world and society worse. I’m all for using the biggest lever I have to address harmful behaviors from corporations. Withhold your wallet, stay off their platforms and make your reasons known.
I mean… this is part of GPs point. Here we are, playing on the lawn of private equitists, probably directly or indirectly working for the people that GGP was railing against.
Reading about mainframes feels very much like reading science fiction. Truly awesome technology that exists on a completely different plane of computing than anything else.
This thinly veiled advertisement claims it's a waste of time to understand the tradeoffs in the models you're using, and you should instead pay them to make those decisions for you. No thank you.
reply