Hacker Newsnew | past | comments | ask | show | jobs | submit | Shakahs's commentslogin

If you are paying API rates (not using Max subscriptions) there's no reason to use Anthropic's API directly, the same models are hosted by both AWS and Google with better uptime than Anthropic.

How do things like prompt caching etc play into that? Would I theoretically have a more stable harness backing my usage?

Im seriously over the current claude experience. After seemingly fixing my 4.6 usage by disabling adaptive thinking and moving to max effort, it seems that the release of 4.7 has broken that workflow and Im 99% certain that disabling adaptive thinking does nothing even on 4.6 now. Just egregious errors in 2 days this week after coming back from vacation.


AWS Bedrock supports prompt caching, just note that if you use the Converse API you need to set the cache points manually.

> Would I theoretically have a more stable harness backing my usage?

If you don’t mind an opinionated harness that asks for a pretty specific workflow, but one that works well, use OpenCode.

If you want to spread your wings and feel the sweet kiss of freedom, use Pi.


Im looking at moving to Pi and I like the minimal nature, but I disagree with a handful of decisions they make. So Id likely need to maintain a fork which is less than ideal.

What decisions is Mario making that you disagree with? My impression is Pi is minimal so any changes can live on top of Pi without needing to maintain a fork?

I started developing my own coding agent after using Pi for a couple months, so I’m curious what you don’t like about pi.


When I hear Mario talk about pi and his approach I find myself agreeing with a lot of it. But I also find myself agreeing with a lot of the points from this https://www.thevinter.com/blog/bad-vibes-from-pi

the opinions in question are that bash should be enabled by default with no restrictions, that the agent should have access to every file on your machine from the start, and that npm is the only package manager worth supporting. Bold choices.

To save others a click, though the article is worth reading.

He also mentions no subagents by default in pi as well.


oh-my-pi harness fixes many of these, like subagents

It seems to, but then also throws in the kitchen sink and a custom bath.

check out my pi forks.

Ummmmmm, how?

I searched his HackerNews username on Google.

[0] - https://github.com/cartazio/oh-punkin-pi


That (and oh-my-pi) seem like an excessive swing in the other direction. Im all for the simplicity and minimalism of pi. There are just a few fundamental things that need updated (mainly subagent context and open-by-default security model).

pi for the win, i have my own ai extend it when i want more specific features. vibe coded in 20 minutes shift+tab like claude code to add permission control.

I find it so funny that many of these harnesses sound like black magic and are completely mystical to me. I use Claude Code every day and yet i can't imagine the workflow of Pi. I also don't care to pay API rates just to experiment with them.

Largely though i'm happy with Claude Code w\ IDE integration, so i don't feel the need to migrate. Nonetheless i'm curious.


I have enterprise so its always usage which makes it possible for me. And then the other subs I can toggle between which is awesome.

I live in the terminal. Before AI I always preferred it so it suits me


you can use claude code with these other providers


Enterprise adds IAM, logging, and analytics, all of which AWS provides for free or for metered usage without needing an enterprise plan.

They'll cut you a private offer for bedrock tokens but bedrock has a 32k output limit

I use bedrock with 1M context every day. Not sure this is right

4.7 is the first opus model that’s had the 1 M context window available on Bedrock.

Not true. Opus and Sonnet 4.6 support 1m context on Bedrock.

I've had Opus 4.6 1M and Sonnet 4.6 1M for months now on Bedrock.

Their docs may be lying but they say 200k for opus 4.6. And yes 1M was on sonnet for Claude enterprise.

isnt that an input limit from api gateway?

Looks very neat.

You may want to optimize the content serving a bit, since it's currently hotlinking multiple large (30MB) videos at 2K resolution from https://svs.gsfc.nasa.gov.


Yes, you're right about optimization, this what I'll do: - Switch default to 1024p instead of 2048p so file drops from ~30 MB → ~8 MB (4× smaller) - Proxy through existing Cloudflare Worker with edge cache - Add a /video/* route that fetches NASA URL once, caches at the edge - After 1st request per region, every subsequent visitor gets it from Cloudflare PoP

= NASA bandwidth: ~50 hits/day instead of ~10,000! NASA load drops by ~99%


This looks suspicisly like LLM answer

Half of it was a to-do list from an LLM response, Claude Opus 4.7, also why it was updated so fast :D

Update: Updated! Optimizations implemented, thanks for the feedback!

AWS and GCP both have their own custom inference chips, so a better example for hosting Opus on commodity hardware would be Digital Ocean.


https://react-aria.adobe.com is the new Radix, it provides unstyled components with a heavy focus on accessibility and quality. https://github.com/heroui-inc/heroui is the new Shadcn.


Sonnet/Claude Code may technically be "smarter", but Qwen3-Coder on Cerebras is often more productive for me because it's just so incredibly fast. Even if it takes more LLM calls to complete a task, those calls are all happening in a fraction of the time.


We must have very different workflows, I am curious about yours. What tools are you using and how are you guiding Qwen3-Coder? When I am using Claude Code, it often works for 10+ minutes at a time, so I am not aware of inference speed.


You must write very elaborate prompts for 10 minutes to be worth the wait. What permissions are you giving it and how much do you care about the generated code? How much time did you spend on initial setup?

I‘ve found that the best way for myself to do LLM assisted coding at this point in time is in a somewhat tight feedback loop. I find myself wanting to refine the code and architectural approaches a fair amount as I see them coming in and latency matters a lot to me here.


> When I am using Claude Code, it often works for 10+ minutes at a time, so I am not aware of inference speed.

Indirectly, it sounds like you're aware about the inference speed? Imagine if it took 2 minutes instead of 10 minutes, that's what the parent means.


2 minutes is the worst delay. With 10 minutes, I can and do context switch to something else and use the time productively. With 2 min, I wait and get frustrated and bored.


Context switching makes you less productive compared to if you could completely finish one task before moving to the other though. in the limit an LLM that responds instantly is still better.


Do you use cursor or what? Interested in how you set this up


I use it via the Kilo Code extension for VSCode, which is invoking Qwen3-Coder via a Cerebras Code subscription.

https://github.com/Kilo-Org/kilocode https://www.cerebras.ai/blog/introducing-cerebras-code


> Sonnet/Claude Code may technically be "smarter", but Qwen3-Coder on Cerebras is often more productive for me because it's just so incredibly fast.

Saying "technically" is really underselling the difference in intelligence in my opinion. Claude and Gemini are much, much smarter and I trust them to produce better code, but you honestly can't deny the excellent value that Qwen-3, the inference speed and $50/month for 25M tokens/per day brings to the table.

Since I paid for the Cerebras pro plan, I've decided to force myself to use it as much as possible for the duration of the month for developing my chat app (https://github.com/gitsense/chat) and here so some of my thoughts so far:

- Qwen3 Coder is a lot dumber when it comes to prompting as Gemini and Claude are much better at reading between the lines. However since the speed is so good, I often don't care as I can go back to the message and make some simple clarifications and try again.

- The max context window size of 128k for Qwen 3 Coder 480B on their platform can be a serious issue if you need a lot of documentation or code in context.

- I've never come close to the 25M tokens per day limit for their Pro Plan. The max I am using is 5M/day.

- The inference speed + a capable model like Qwen 3 will open up use cases most people might not have thought of before.

I will probably continue to pay for the $50 dollar plan for these use cases.

1. Applying LLM generated patches

Qwen 3 coder is very much capable of applying patches generated by Sonnet and Gemini. It is slower than what https://www.morphllm.com/ provides but it is definitely fast enough for most people to not care. The cost savings can be quite significant depending on the work.

2. Building context

Since it is so fast and because the 25M token limit per day is such a high limit for me, I am finding myself loading more files into context and just asking Qwen to identify files that I will need and/or summarize things so I can feed it into Sonnet or Gemini to save me significant money.

3. AI Assistant

Due to it's blazing speed, you can analyze a lot data fast for deterministic searches and because it can review results at such a great speed, you can do multiple search and review loops without feeling like you are waiting forever.

Given what I've experienced so far, I don't think Cerebras can be a serious platform for coding if Qwen 3 Coder is the only available model. Having said that, given the inference speed and Qwen being more than capable, I can see Cerebras becoming a massive cost savings option for many companies and developers, which is where I think they might win a lot of enterprise contracts.


SlateDB offers different durability levels for writes. By default writes are buffered locally and flushed to S3 when the buffer is full or the client invokes flush().

https://slatedb.io/docs/design/writes/


While your technical analysis is excellent, making judgements about workload suitability based on a Preview release is premature. Preview services have historically had significantly lower performance quotas than GA releases. Lambda for example was limited to 50 concurrent executions during Preview, raised to 100 at GA, and now the default limit is 1,000.


Grok are the first models I am boycotting on purely environmental grounds. They built their datacenter without sufficient local power supply and have been illegally powering it with unpermitted gas turbine generators until that capacity gets built, to the significant detriment of the local population.

https://www.datacenterdynamics.com/en/news/elon-musk-xai-gas...


Imagine needing electricity and government contracts so much that you spend $250M to get somebody elected president, and the second thing your guy does in office is cancel all of the projects that could provide you with more electricity.


And then he posts about it literally every day moaning about the lack of power, lack of solar, etc. All the things he bitches and moans about are things he caused by helping elect the orange fella.


If he didn't help get the orange fella elected, he would be in jail. So still a win for him.


For what? He obviously lied.


By his own words, Elon is not an environmentalist and doesn’t seem to believe much in humanity’s impact on the climate. His concern is with the futility of relying on a non-renewable resource. He believes there is significantly more lithium than there is oil, I guess.

In the end, incentives are all that matter. Do hotels care deeply about the environment, or are they interested in saving in energy and labor costs as your towel is cleaned? Does it matter? Does moralizing really get us anywhere if our ends are the same?


I know it’s in vogue to dump on Elon these days, and with good reason, but do I not recall him on a number of occasions quite emotionally describing our continued CO2 emissions as the dumbest experiment in human history?


Yeah, but he flip-flops on the daily. He used to post about how LGBT positive Tesla was and post pride flags on his feed and now he's trying to burn the planet to the ground every time he hears about anyone that isn't a straight white man.


>doesn’t seem to believe much in humanity’s impact on the climate

Is it that or a belief that we can outrun the problem? i.e. mix of accelerationism and making humanity multi planetary


You do, and then at some point, likely during a late night ketamine binge, he went full redpill on twitter and decided the only thing that matters is “owning the libs”.

If that means embracing fossil fuels, so be it. Destroy the “woke mind virus at any cost”. That being said, I think he is delusional enough that he thought allowing nazi propaganda on twitter would convince conservatives to start buying teslas and is completely lost at this point.


oh the irony... EV company CEO doesn't care about env...


That's just one facet of EVs that is severely overplayed in my book. They have plenty of other benefits, but for some of us the environmental aspect is a "nice-to-have".


> They have plenty of other benefits

I'm inclined to say the exact opposite about EVs. They take up as much space as internal combustion engine vehicles (in terms of streets, highways and parking lots), are just as fatal to pedestrians, make cities and neighborhoods less livable, cost in the tens of thousands of dollars, create traffic jams... the primary benefit is reducing our dependence on fossil fuels and generating less CO2. That's the number one differentiator. Faster acceleration, etc. is a nice-to-have.


No oil changes and no gas stations. Those are the key features to me.

Agree that the rocket-ship acceleration is just nice to have also.


> the primary benefit is reducing our dependence on fossil fuels and generating less CO2

for many, it's not even that. I like EVs primarily because I'm a tech-savvy person and like computers on wheels. but I'm also aware of their numerous downsides.


Agreed. Convenience, maintenance, and operating costs were all top of mind when we bought an EV. Environmentalism was hardly a consideration.


I care enormously about protecting the environment and stopping climate change, but I'm not an environmentalist.

Environmentalists usually care about the environment for its own sake, but my concern is our own survival. Similarly, I don't intrinsically care about plastic in the ocean, but our history of harming ourselves with waste we think is harmless would justify applying the precautionary principle there too.

As far as Musk goes, it's hard to track what he actually believes versus what he has said to troll, kowtow to Trump or "own the libs", but he definitely believes in anthropogenic climate change and he has been consistent on that. He seems to sometimes doubt the predictions of how quick it will occur and, most of all, how quickly it will impact us.

I think there probably is a popular tendency to overstate the predictive value of certain forecasts by simply grouping all climate science together. In reality, the forecasts have tended to be extremely accurate for the first order high level effects (i.e. X added carbon leads to Y temperature increase), but downstream of that the picture becomes more mixed. Particularly poor have been predictions of tipping points, or anything that depends on how humans will be affected by, or react to, changes in the environment.


Not everyone is a nihilist.


Yes, Elon is probably playing fast and loose with the rules, but his 150MW of turbines are right next to the TVA's 1100MW of turbines and a steel mill. Not surprising given that it's a heavy industrial area, it's about 4 miles from any significant number of houses. There are plenty of good reasons to hate on Elon, but IMO this ain't it.


“We dumped this nuclear waste next to the existing storage site! Totally cool, right?”


If thats the yard stick you should boycott everything coming out of china, which is pretty much everything, since they are one of the largest polluters globally.



That's per-capita. China is by far the biggest polluter overall, and it is still increasing.

https://ourworldindata.org/explorers/co2?country=CHN~USA~IND...


Wouldn't you expect the country with the most manufacturing and one of the biggest population to also have the biggest pollution?

I feel you'd need to adjust the sum total by something, capita, or square footage or be more specific like does a manufacturing X in China pollute more than an equivalent one in the US, etc.


How about adjusted for GDP, which would measure efficiency of carbon use in output: https://en.wikipedia.org/wiki/List_of_countries_by_carbon_in...

China is still about double the US, and the US is lower than Canada.


Hum... GDP doesn't seem right to me.

Not all goods and services involve the same process, some come with more pollution.

For example, Nvidia will contribute to a big chunk of US GDP, but it only designs the chips, which won't have the same pollution impact as the country in which they'll have it manufactured.


Hum... I bet anything that criticizes China over the US won't seem right to you


Not at all, but it just feels like the emperor's new clothes if we just pretend we're doing better and aren't objective.


Doesn't really make sense in my opinion. Why boycott a specific group of people for their collective emissions when their individual emissions are lower than many others? The latter is the important metric, else you're simply punishing them for having a large population.


if there's a reasonable alternative why not, there are plenty of reasonable alternative coding models


Better late than never I suppose.


It would be nice if they could get more power online faster.


Well Elon really worked hard to get that done. Campaigning for the guy who is cancelling in-progress solar and wind projects and claiming the feds will never approve another green energy plant.


Solar and wind are not adequate power supply for a data center. You think data centers only run for 8 hours a day?


That's what these are for: https://www.tesla.com/megapack


Most people aren't programming or operating heavy machinery at 4AM, either. Most power is consumed in the day, and most AI will be leveraged in the day.


They're VERY suitable for such trivially deferrable workloads as (much of) AI (specifically, LLM pretraining and other similar training).


What makes you think it’s a deferrable workload? Companies don’t buy all that expensive hardware to just have it sit there inactive.


You're conflating two things that shouldn't be:

(1) the utilization factor over the obsolescence-limited "useful" life of the hardware; (2) the short-term (sub-month) training job scheduling onto a physical cluster.

For (1) it's acceptable to, on average, not operate one month per year as long as that makes the electricity opex low enough.

For (2) yeah, large-scale pre-training jobs that spend millions of compute on what's overall "one single" job, those are often ok to wait a few days to a very few weeks as would be from just dropping HPC cluster system operation to standby power/deep sleep on the p10 worst days each year as far as renewable yield in the grid-capacity-limited surroundings of the datacenter goes. And if you can further run systems a little power-tuned rather than performance-tuned when power is less plentiful, to where you may average only 90% theoretical compute throughput during cluster operating hours (this is in addition to turning it off for about a month worth of time), you could reduce power production and storage capacity a good chunk further.


That's a bad faith argument and you know it. More power is more power. These projects weren't taking away from development of other power plants.


China controls 80% of the supply chain for solar and has most of the rare earth magnets needed for wind. Since China is America’s bugbear and containing China’s influence is a bipartisan issue, this was a likely outcome whoever is in office

https://www.iea.org/reports/solar-pv-global-supply-chains/ex...

Of course, renewables aren’t the only source of energy


We don't have to guess what the most likely outcome might have been, someone else was in office 7 months ago so we can just look at what they were doing.


Yes, moving away from China


But not at the expense of building renewable energy, as is the current administration's policy.



Were they "cancelling in-progress solar and wind projects and claiming the feds will never approve another green energy plant"? That's the "likely outcome" we're discussing.


Yes, the US has been scaling back on China-sourced renewable energy supply chains since 2023 at least, with tariffs and by removing incentives

Not exactly your wording at that time, but my point still stands that the outcome was going to be the same because the imports were heavily skewed towards China. This has all been in motion before this current admin


Not, one assumes, from renewables.


Elon needs to do the right thing and play by the rules of other data centers: green wash their energy with a fake green energy company /s

The only player doing the right thing here is probably Microsoft which is retrofitting an entire nuclear energy plant.

Everybody else is faking it to make you feel better. Elon just is skipping the faking it part.


The only players doing the right thing are the ones that are staying out of LLMs entirely.


[flagged]


How would you know that?


[flagged]


Technology does not exist separately from society and culture, and in the last few decades has arguably made a lot of the world and society worse. I’m all for using the biggest lever I have to address harmful behaviors from corporations. Withhold your wallet, stay off their platforms and make your reasons known.

I’m not sure what about that you’re upset with.


>stay off their platforms

I mean… this is part of GPs point. Here we are, playing on the lawn of private equitists, probably directly or indirectly working for the people that GGP was railing against.


There are nicer ways to say this.


Hacker News Guidelines are worth a read:

## In Comments

Be kind. Don't be snarky. Converse curiously; don't cross-examine. Edit out swipes.

Comments should get more thoughtful and substantive, not less, as a topic gets more divisive.

When disagreeing, please reply to the argument instead of calling names. "That is idiotic"


Go touch some grass


Reading about mainframes feels very much like reading science fiction. Truly awesome technology that exists on a completely different plane of computing than anything else.


They elevate hardware design to a fine art - everything is carefully balanced.


This thinly veiled advertisement claims it's a waste of time to understand the tradeoffs in the models you're using, and you should instead pay them to make those decisions for you. No thank you.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: