Hacker Newsnew | past | comments | ask | show | jobs | submit | jameslk's commentslogin

I have seen this as well, except the UI ends up all looking similar, because the harness prompt and training data doesn’t change much

The average becomes the same shade of gray. Familiarity breeds contempt. New types of design will emerge that are expensive to copy, because differentiation drives competition


Which is perfectly adequate for 95% of applications. Hell, it’d probably improve most applications if they adopted some proven shade of gray.

Why do people feel that each and every tool they use needs its own unique look and feel? And why are people willing to pay more for that? In some cases, sure. For my smart sprinkler app.. I don’t give a damn if it looks like 1000 other apps.


I’d actually prefer if all of them looked and worked the same, especially useful if you have elderly family members you need to teach how to use app for XYZ. all government websites (especially functional ones that citizens use to do something on) for instance should be exactly the same

Y’all would have much more productive conversations about AI if you were even for a second able to differentiate the aspects of $x that you care about as a craft via which the majority of people care about. HN has truly become the embodiment of the “this is fine” meme.

> I queued the work and let it run. First task came back good. Second came back good. Somewhere around hour four the quality started sliding. By hour six the agent was cutting corners I’d specifically told it not to cut, skipping steps I’d explicitly listed, behaving like I’d never written any of the rules down.

> …

> When I write a prompt, the agent doesn’t just read the words. It reads the shape. A short casual question gets read as casual. A long precise document with numbered rules gets read as… not just the rules, but also as a signal. “The user felt the need to write this much.” “Why?” “What’s going on here?” “What do they really want?”

This is an interesting premise but based on the information supplied, I don’t think it’s the only conclusion. Yet the whole essay seems to assume it is true and then builds its arguments on top of it.

I’ve run into this dilemma before. It happens when there’s a TON of information in the context. LLMs start to lose their attention to all the details when there’s a lot of it (e.g. context rot[0]). LLMs also keep making the same mistakes once the information is in the prompt, regardless of attempts to convey it is undesired[1]

I think these issues are just as viable to explain what the author was facing. Unless this is happening with much less information

0. https://www.trychroma.com/research/context-rot

1. https://arxiv.org/html/2602.07338v1


It's more than context-rot.

If you ask a vague ignorant question, you get back authoritative summaries. If you make specific request, each statement is taken literally. The quality of the answer depends on the quality of the question.

And I'm not using "quality" to mean good/bad. I mean literally qualitative, not quantifiable. Tone. Affect. Personality. Whatever you call it. Your input tokens shape the pattern of the output tokens. It's a model of human language, is that really so surprising?


SpaceX and Amazon seem to be headed for competing with traditional telecoms and ISPs. I'm betting the next acquisition target will be AST SpaceMobile. I also wouldn’t be surprised to see big telecom/ISP mergers pass regulatory approval now that they have competition from the heavens

They'll try. But they are between two forces squeezing the TAM:

The anvil: satellites can't serve most people in a densely populated area, whereas terrestrial wireless can be engineered and deployed to serve any population density, even tens of thousands of people in a stadium.

The hammer: electronics get cheaper faster when they don't have to be space grade, and electronics get cheaper faster than rockets. As they get cheaper, terrestrial wireless will be deployed in more areas that are uneconomical right now.

And that is how the satellite TAM gets slammed.


> The anvil: satellites can't serve most people in a densely populated area, whereas terrestrial wireless can be engineered and deployed to serve any population density, even tens of thousands of people in a stadium.

That's if everyone is trying to connect to the satellite. Would it be possible to have regional hubs that connect and distribute the connection via a local wireless link like 5G?


You don't need to be close to having everyone connect to cause congestion on a satellite network. That congestion is caused by the amount of data used, not by the number of connections.

Every kind of network has the potential for congestion, it's just easier and much cheaper to engineer a terrestrial network to avoid congestion. There are congestion scenarios for satellite networks that are not solvable.


You can use satellite backhaul for a 5G tower. And I'm sure there are many towers with satellite backhaul.

But, once you start having multiple towers near by, you are going to link those up terrestrially (wireless or not) and pretty soon you'll end up with terrestrial backhaul.


Satellite backhaul really only happens in mobile disaster recovery truck-mounted cell sites, and the fairly rare occasions where a rural site can't use terrestrial wireless backhaul.

> SpaceX and Amazon seem to be headed for competing with traditional telecoms and ISPs.

Traditional ISPs already have a nice network of copper and fiber optic cables. I don't think satellites offer any advantage to most people here, except for those living in an area with slow wired connections.


Intercontinental latency in air/vacuum is lower than in fiber (even in total, i.e. after accounting for the extra distance from ground and the legs up and down from/space), so there’s also a market for high frequency trading.

It's all about bypassing regulations, just like Uber and AirBNB. Most US ISPs have old copper cables that only support DSL. Upgrading them means digging up the streets and that's expensive and a legal minefield. And those ISPs are local monopolies so why would they spend money just to keep the same number of customers who are locked in anyway?

I don't think that is very true in this day in age. Here in Cincinnati, the vast majority of houses now have fiber run to them. There are still some stragglers, but that's mainly because slumlord apartment owners don't feel like dealing with upgrades.

> I'm betting the next acquisition target will be AST SpaceMobile.

Or possibly viasat.


We're in the early days of agentic frameworks, like the pre-PHP web. CGI scripts and webmasters. Eventually the state-of-the-art will slow down and we'll eventually have something elegant like Rails come out.

Until then, every agent framework is completely reinvented every week due to new patterns and new models. evals, ReACT, DSPy, RLM, memory patterns, claws, dynamic context, sandbox strategies. It seems like locking in to a framework is a losing proposition for anyone trying to stay competitive. See also: LangChain trying to be the Next.js/Vercel of agents but everyone recommending building your own.

That said, Anthropic pulls a lot of weight owning the models themselves and probably an easier-to-use solution will get some adoption from those who are better served by going from nothing to something agentic, despite lock-in and the constant churn of model tech


Completely agree re: AI chatbot/RAG being just like the pre-PHP web world. There's a hundred half baked solutions floating on blogs and github but not a coherent dominant framework that puts it all together properly. Langchain is close but still feels a bit abstract and DIY.

That plus everyone is using 5 different vector DBs and reranking models from different vendors than the answer models etc.


I believe framework is simply never, ever going to work for LLM-based agentic workflows.

Framework is simply way too rigid for a non-deterministic technology.

We may see libraries that provide tools for managing agents, but then again, there's nothing that tmux can't do already.


I'm a bit at odds with this.

I agree a framework is something that sounds outdated.

I also believe an orchestrator is needed. Something that abstracts you from a specific provider. Like hardware, drivers and operating systems.

Right now, my thoughts are on that line: Who will build that operating system? Who will have it in the cloud?

It needs to be robust to operate for large organizations, open source, and sit on top of any provider.

Right now we are seeing BSD vs GNU/Linux vs DOS kind of battles.


necro-posting here, but that's kinda what we're working on! We're focused on creating cloud workspaces for sandboxed coding agents and it's built to support any agent harness. https://www.amika.dev/

Under the hood, we're open sourcing a lot of the parts for provisioning these agents, their VMs/sandboxes, and managing agent messaging + sessions. Put our open source plans here: https://docs.google.com/document/d/1vevSJsSCWT_reuD7JwAuGCX5...


I've been using the OpenAI Agents SDK for a while now and am largely happy with the abstractions - handoffs/sub-agents, tools, guardrails, structured output, etc. Building the infra and observability, and having it scale reliably, was a bigger pain for me. So I do get Anthropic's move into managed agents.

> $4 billion per launch lol

The US spends almost that much on net debt interest each day (~$3 billion/day[0]). Not that adding to the debt helps at all, but the old proverb about being penny wise and pound foolish seems relevant

0. https://www.cbo.gov/publication/61951


The absolute cost isn't the problem, it's the value that we're getting from it. SLS and Artemis are both incredibly expensive and ramshackle programs, and regardless of how bad the rest of the USG might be in terms of their cost, or value, if you are a true space fan and a true American space fan, you should want this little corner of humanity to hold itself to a higher standard.

Acceptance of over costing and under delivering is exactly why the US is stuck with SpaceX as its prime space launch provider. It's only through the miracle of the vanity of billionaires that there's even a realistic second choice (Blue Origin) that might develop.

It's also this type of attitude that let's us be in a situation where we honestly don't know how well the heat shield will work on reentry (SLS launches are so expensive, and so slow to build and prep to launch, that we cannot fit in a uncrewed mission between 1 and 2 to test or validate fixes or models).

If Artemis as a program succeeds, it will be despite the incredible graft, pork, and ass covering, not because of it. I want Artemis to succeed because the achievement will be beautiful and amazing, and I want everyone to be safe and sound. I want Artemis to fail, to force a reckoning. I still believe that America has great things to offer to the world, but it's not going to be able to do that by muddling it's way through and cobbling together random pork filled programs into a vaguely inspiring shape.


This is about to change.

New NASA administrator Isaacman has redone the Artemis program. The changes were announced at the Ignition event a few weeks ago:

https://www.nasa.gov/ignition/

If you read one thing, read the sides on building the moon base:

https://www.nasa.gov/wp-content/uploads/2026/03/2-building-t...

The goals it to fly often - adding a SLS launch to 2027 and a second launch to 2028. This drops the cost-per-launch, which is mostly fixed. It redoes SLS to make it less expensive and more capable. It moves the lunar space station down to the surface of the moon.

And it's budgeted at $10B/3 years, which fits into NASA's budget.

Isaacman took the Artemis program and fixed it. The reckoning came, and it's looking good.


There's a lot of potential in the announced changes and what SLS/Artemis might be able to become. This shouldn't prevent us from being critical of what SLS/Artemis most definitely has been for the previous 10-15 years.

And don't be fooled about the SLS launch cadence. As recently as summer 2025, Artemis III was still a nominally a 2027 manned lunar landing (https://www.nasa.gov/blogs/missions/2025/08/18/nasa-begins-p...). It got moved to a 2028 manned lunar landing in early 2026, before being converted back to a 2027 manned test flight.

The plan for SLS also does nothing to make it more capable (though hopefully less expensive). The cancelled exploration upper stage is being replaced by Centaur V, which is a less powerful stage. Isaacman refuses (I think rightfully) to really pin down on if there a future for SLS past Artemis V. If Isaacman chooses to cancel SLS after Artemis V (which I think is a defensible course of action), then SLS would represent a ~17 year long program that cost at least 41 billion dollars that netted 5 mission launches.

And characterizing it as "moving the lunar space station down to the surface of the moon" is... kinda falling into the trap. Lunar Gateway was supposed to launch ~2028 (along with Artemis IV - from the era where Artemis III was the first lunar landing). Gateway was a gongshow, and was delayed, and now cancelled. And now the new plan says the habs (the part that people think as an actual base...) happens in Phase 3 starting in 2033. The sort term elements they are trying to reuse from gateway into near term (think ~4 years) base projects are very "ancillary".

It remains unclear if NASA will infact be able to up the launch cadence of SLS to meet the double 2028 launch requirement. While it was clear that Gateway made... very limited sense for great expense, and the new plan is certainly ambitious with what I think is a stronger value proposition, it's also basically exactly as pie in the sky as gateway back in 2019.

The fact that I am doubting NASA's ability to execute now, is the very cost of SLS (and friends).


> then SLS would represent a ~17 year long program that cost at least 41 billion dollars that netted 5 mission launches

SLS will never be worth it. But I'd discount from that price tag the continuity benefits of keeping the Shuttle folks around, and aerospace engineers employed, across the chasm years of the 2010s.


Yeah, it’d be really nice if we could somehow express the strategic capabilities maintained in these discussions. Because on the face of it, SLS looks terrible, but paying that much to maintain the national capability to make something like the shuttle and SRBs feels reasonable.

Kind of similar to farm subsidies and the strategic implications there.


> paying that much to maintain the national capability to make something like the shuttle and SRBs feels reasonable

It’s reasonable to pay something. I’m unconvinced $41bn is the correct amount.

> Kind of similar to farm subsidies and the strategic implications there

There aren’t many. Countries in which farmers aren’t swing voters don’t have farm subsidies. I’ve been looking into buying some farmland and just collecting CRP on it, for example.


Yeah, there should've been a "more" in front of "reasonable". There are probably other ways to maintain knowledge of how to make SRBs.

> SLS launches are so expensive, and so slow to build and prep to launch, that we cannot fit in a uncrewed mission between 1 and 2 to test or validate fixes or models

If they’d wanted to they could have launched an empty Orion crew module into LEO on another, cheaper, rocket and tested re-entry. The crew module by itself is less than ten tonnes.


How would they get it up to the required reentry speed for it to be a valid test? They already know the heat shield works for reentry at LEO speeds. That's not where the problems occur.

>> is exactly why the US is stuck with SpaceX

For the last 20 years NASA has intentionally run their Commercial Crew Program, which has the stated goal of developing/fostering/funding the development of commercial providers for launch vehicles.

They, by plan they explicitly laid out and implemented, decided to rely on American commercial providers. And that's what they got. And in doing so, the program ended up producing the most prolific/successful launch vehicle in history.

>> It's only through the miracle of the vanity of billionaires that there's even a realistic second choice (Blue Origin) that might develop

Yes, this is another company which the NASA commercial program explicitly funded in order to get them to develop another launch vehicle.


SpaceX is an amazing success story, both as a commercial story, and as a story of government-industry cooperation. NASA should be proud and commended for fostering SpaceX.

The question is why does SpaceX stand alone? Why did ULA stagnate? Why can't NG make SRBs that don't have nozzles that fall off? Why can't Betchel build a launch tower on time? What is it about government contracts in these other areas that led to all of this under performance?

The US benefits by having SpaceX around. It would benefit even better by having many SpaceXs around.

Oh, and also I believe it's generally understood that NASA provided very little funding for New Glenn. They gave BO a lot of money for HLS, but that's relatively recent (2023). New Glenn has been in the works since 2013 and was mostly bankrolled by Bezos, with some USAF/DoD money kicked in.


>>> SpaceX is an amazing success story

100%, and something that is underappreciated and often taken for granted nowadays, especially on our little forum here.

>>> It would benefit even better by having many SpaceXs around.

That made me chuckle, sounded to me a bit like "our house would benefit from having a few cats around". Perhaps the reason why there aren't too many SpaceX-like companies around is that it's truly among the hardest companies to ever create.


If we're going to do public/private cooperation, we still need the whole competition thing.

If we don't have it, either we're subject to monopoly, or just a State owned company, at which point, why not just cut out the middlemen and go full Nationalized?


Boeing and others do complete in that area.

> Why did ULA stagnate?

ULA is the old guard made from Lockheed and Boeing. SpaceX is the snappy upstart moving fast and breaking things. Having the freedom to fail with experiments is a totally different methodology from any failure seen as very bad. SpaceX has never been involved in loss of life. If they ever have that happen, I'd imagine they'd be forced to stop moving as fast and quit breaking things.


Big space stagnated because they could. Their friends in Congress directed them lots of money and lots of political cover, and they both profited handsomely. Why would they change? They never had so, and I might argue that they still don't. Cost-plus contracts, years spent in expensive consulting and planning, all these mean they make money whether they go to space or not. Every five or six years, they trot out a "new" plan that purports to solve all the problems of the old plan, with exciting presentations and hired speakers, and the then-current administration sees a way to drum up political support, and the lobbyists and Congress see a way to make even more money and political favors.

And now it's over 50 years since we last landed on the Moon.


> Why can't NG make SRBs that don't have nozzles that fall off?

To be fair, we just saw two of them work fine, with no nozzle fall-off-ages


Compared to the absolute baffling amount of money spent for military purposes, knowing more about the moon is well worth it.

No no no no, I can't let that go. Sending astronauts around the moon has nothing to do with "knowing more about the moon". We don't need people up there to observe the moon. In fact, it's a lot easier and better to have sensors go there and automatically make measurements (e.g. pictures).

Now thinking about Mars, sending astronauts there is actually a net negative for science because it risks contaminating Mars.

We send astronauts there because it's cool, period. Science has nothing to do with it.


This FAQ from the NASA website seemed particularly intellectually dishonest:

>Why do we need astronauts to view the Moon when we have robotic observers? Human eyes and brains are highly sensitive to subtle changes in color, texture, and other surface characteristics. Having astronaut eyes observe the lunar surface directly, in combination with the context of all the advances that scientists have made about the Moon over the last several decades, may uncover new discoveries and a more nuanced appreciation for the features on the surface of the Moon.

https://www.nasa.gov/missions/nasa-answers-your-most-pressin...


The word "may" does a lot of heavy lifting in that sentence.

Also we spend that much every 4 days we're in Iran, and that's only ONE of our neo-colonialist irons in the fire, as it were.

If you want to make the US financially solvent, cut defense. Defense LAPS every other budget category. Whether you want to take the conservative position on why that is (our allies freeload on our defense spending) or the Progressive one (the U.S. is an empire in decline and every major empire through history has spent vast sums to maintain itself why would the U.S. be different) doesn't change the fact that our military budgets exceed over a dozen other nations' combined, the vast majority of whom are allies.


>Defense LAPS every other budget category.

I suppose it matters how you lump things, but for federal spending:

  - $678 B, Social Security
  - $478 B, Medicare
  - $425 B, Net Interest
  - $419 B, Health
  - $412 B, National Defense
  - $320 B, Income Security
  - $184 B, Veterans Benefits and Services
  - $75 B, Education, Training, Employment, and Social Services
  - $53 B, Transportation
  - $43 B, Administration of Justice
  - $15 B, Other
https://fiscaldata.treasury.gov/americas-finance-guide/feder...

Note there would be no veterans benefits and services without a military, so effectively the total for defense is 412 PLUS 184 = $596B, more than anything except SS.

Also note that most people consider social security to be an entirely different kind of government spending than anything else in that list.


No, if the US had no military the majority of veterans benefits and services money would still need to be spent (its mostly healthcare) it would just be bucketed under SS and Medicare/Medicaid then.

Also, without a military the US would not be even 1/3rd as wealthy as it is today, given its military created the global order that secured the last 80 years of the global economic system, shipping lanes and USD dominance. You can argue over specific wars/missions being dumb, but to pretend the overall ROI on that dominance enabling 80 years of relatively peaceful global trade hasn’t been positive is to be intellectually dishonest.

The world is currently teetering on a global economic crisis over just ONE shipping lane not being fully open for a few weeks. Read more history and you’ll see this used to be the norm.


I avoided commenting on the ROI associated with defense spending, deliberately.

Veterans get SS too, so no, costs associated with veterans wouldn't shift to SS. It is fair to suggest that the health care costs of uninjured, untraumatized veterans would just show up under Medicaid/Medicare. I don't know what percentage of veterans health care costs (not health care visits) fit in that category, versus "stuff that wouldn't be an issue if they hadn't been in the military".


Not all of those are discretionary spending? Maybe not equivalent to include, for example, Social Security.

It is relevant if you want to attack one of the greatest achievements of the new deal and hate working Americans.

People can have motivations for wanting to cut back Social Security other than "they hate working Americans". I would prefer commenters make more of an effort to understand their opponents' perspective rather than painting them in the worst light possible.

I think the common miscommunication here is that defense is the largest part of the US discretionary budget (about half overall), but that doesn't include those non-negotiable things like Social Security, Medicare, etc .

Trump doesn't want to do Medicare etc anymore. The states can do that now.

"Please note: Values displayed are outlays, which is money that is actually paid out by the government. Other sources, such as USAspending, may display spending as obligations, which is money that is promised to be paid, but may not yet be delivered."

The Biden administration's FY2025 defense budget request was $850 billion for the DoD, with the total national security budget reaching over $895 billion. The FY2026 proposal submitted by the Trump admin is 1.5 trillion for DoD.


> LAPS every other budget category.

Except for social security, health, medicare, debt interest


We spend more on debt interest than we spend on the military or anything really other than social security. That isn't a useful comparison anymore.

> The US spends almost that much on net debt interest each day

Spends, or accrues?


Same thing (for now, at least). The U.S. has only defaulted a handful of times, none that I'm aware of since 1971.


I don't think MCP is going anywhere, as much as I prefer CLIs or skills generally. Where MCP really shines is reducing friction and footguns for using a service, but at the expense of less versatility and expressiveness. You get a cookie-cutter way of using tools to interact with that service, which is easy to set up, doesn't require the user to download a CLI or have their agent interact with an API

For power users or technical users that want agents to compose data or use tools programmatically, that's less valuable, but for most people, a one-size-fits-all MCP service that is easy to use is probably best

There's the issues of dumping a bunch of tool definitions into context and eating a ton of tokens as well, but that seems more solvable

If anything, MCP needs to evolve or MCP client tooling to improve, and I could see the debate going away


There’s nothing stopping agents from composing MCP requests and responses, or from them writing programs to process the responses. MCP tools and resources are just as composable and programmable than any CLI - and more so than most because they are structured data.


It's nearly perfect. My only complaint is I wish it would keep playing on repeat, and rotate through more smooth jazz. Then I could have this on a screen in my living room, fall asleep on my couch in a snuggie, and wake up to its garish light and jazz at 3am just like old times


If you like the smooth jazz, I can't recommend Watercolors on Sirius XM highly enough! Basically this sound, but 24/7.


I have a old tv that I want to have this running , thinking of the cheapest way to get this playing (with new hardware prices increasing).

Probably still a rpi.


Yeah I'm not sure how the author came to the conclusion using the meta description and JSON-LD are so important. It reminds me modern day keyword stuffing. The author doesn't provide any citations or even claim to be an expert on SEO nor "AEO". It's fine to have some theories on things on the internet. But why is this being upvoted?


> You are an expert analyst evaluating how exposed different occupations are to AI. You will be given a detailed description of an occupation from the Bureau of Labor Statistics.

> Rate the occupation's overall AI Exposure on a scale from 0 to 10.

Are LLMs good at scoring? In my experience, using an LLM for scoring things usually produces arbitrary results. I'm surprised to see Karpathy employ it


The fact that the LLM appears to never assign an actual 0 or 10 makes me suspicious. Especially when the prompt includes explicit examples of what counts as a 10.


In my experience LLMs often have really solid insights in the thinking chains then vomit a nonsense score that doesn't make sense.

Now I'm not sure if this is actually an LLM only thing. Because I think people probably do similar when you ask them to give a number to things without providing a concrete grading rubric...


No. LLMs aren't experts in subjects. They can answer things in a confident manner, but nobody optimized LLMs to perform good analysis yet.


Let's ask the LLM to score how good it would be at scoring jobs from LLM exposure... /s


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: