>> One important reason people like to write code is that it has well-defined semantics, allowing to reason about it and predict its outcome with high precision. Likewise for changes that one makes to code. LLM prompting is the diametrical opposite of that.
> You’re still allowed to reason about the generated output. If it’s not what you want you can even reject it and write it yourself!
You missed the key point. You can't predict and LLM's "outcome with high precision."
Looking at the output and evaluating it after the fact (like you describe) is an entirely different thing.
For many things you can though. If I ask an LLM to create an alert in terraform that triggers when 10% of requests fail over a 5 minute period and sends an email to some address, with the html on the email looking a certain way, it will do exactly the same as if I looked at the documentation, and figured out all of the fields 1 by 1. It’s just how it works when there’s one obvious way to do things. I know software devs love to romanticize about our jobs but I don’t know a single dev who writes 90% meaningful code. There’s always boilerplate. There’s always fussing with syntax you’re not quite familiar with. And I’m happy to have an AI do it
I don’t think I am. To me, it doesn’t have to be precise. The code is precise and I am precise. If it gets me what I want most of the time, I’m ok with having to catch it.
"Boring but right" generally means that this prediction is already priced in to our current understanding of the world though. Anyone can reliably predict "the sun will rise tomorrow", but I'm not giving them high marks for that.
I'm giving them higher marks than the people who say it won't.
LLMs have seen huge improvements over the last 3 years. Are you going to make the bet that they will continue to make similarly huge improvements, taking them well past human ability, or do you think they'll plateau?
right, because if there is one thing that history shows us again and again is that things that have a period of huge improvements never plateau but instead continue improving to infinity.
Improvement to infinity, that is the sober and wise bet!
The prediction that a new technology that is being heavily researched plateaus after just 5 years of development is certainly a daring one. I can’t think of an example from history where that happened.
Claiming that AI in anything resembling its current form is older than 5 years is like claiming the history of the combustion engine started when an ape picked up a burning stick.
Your analogy fails because picking up a burning stick isn’t a combustion engine, whereas decades of neural-net and sequence-model work directly enabled modern LLMs. LLMs aren’t “five years old”; the scaling-transformer regime is. The components are old, the emergent-capability configuration is new.
Treating the age of the lineage as evidence of future growth is equivocation across paradigms. Technologies plateau when their governing paradigm saturates, not when the calendar says they should continue. Supersonic flight stalled immediately, fusion has stalled for seventy years, and neither cared about “time invested.”
Early exponential curves routinely flatten: solar cells, battery density, CPU clocks, hard-disk areal density. The only question that matters is whether this paradigm shows signs of saturation, not how long it has existed.
I don't think it would make much sense to hunt large predators prior to the invention of agriculture, even though early humans were probably plenty smart enough to build traps capable of holding animals like tigers. But after that (less than 40k years ago, more than 10k years ago), I'd bet it was a common-ish thing for humans to try to hunt predators that preyed upon their livestock.
Tigers are terrifying, though. I think it takes extreme or perverse circumstances to make hunting a tiger make any sense at all. And even then, traps and poisons make more sense than stalking a tiger to kill it!
LaunchHN: Announcing Twoday, our new YC backed startup coming out of stealth mode.
We’re launching a breakthrough platform that leverages frontier scale artificial intelligence to model, predict, and dynamically orchestrate solar luminance cycles, unlocking the world’s first synthetic second sunrise by Q2 2026. By combining physics informed multimodal models with real time atmospheric optimisation, we’re redefining what’s possible in climate scale AI and opening a new era of programmable daylight.
You joke, but, alas, there is a _real_ company kinda trying to do this. Reflect Orbital[1] wants to set up space mirrors, so you can have daytime at night for your solar panels! (Various issues, like around light pollution and the fact that looking up at the proposed satellites with binoculars could cause eye damage... don't seem to be on their roadmap.) This is one idea that's going to age badly whether or not they actually launch anything, I suspect.
Battery tech is too boring, but seems more likely to manage long-term effectiveness.
Reflecting sunlight from orbit is an idea that had been talked about for a couple of decades even before Znamya-2[1] launched in 1992. The materials science needed to unfurl large surfaces in space seems to be very difficult, whether mirrors or sails.
A lot of the press likes to paint “AI” as a uniform field that continues to improve together. But really it’s a bunch of related subfields. Once in a blue moon a technique from one subfield crosses over into another.
“AI” can play chess at superhuman skill. “AI” can also drive a car. That doesn’t mean Waymo gets safer when we increase Stockfish’s elo by 10 points.
They're already better than you at reciting historical facts. I'd guess they're probably better at composing poems (they're not great but far better than the average person).
Or you agree with me? I'm not looking for prescience marks, I'm just less convinced that people really make the more boring and obvious predictions.
What is an intellectual task? Once again, there's tons of stuff LLMs won't be trained on in the next 3 years. So it would be trivial to just find one of those things and say voila! LLMs aren't better than me at that.
I'll make one prediction that I think will hold up. No LLM-based system will be able to take a generic ask like "hack the nytimes website and retrieve emails and password hashes of all user accounts" and do better than the best hackers and penetration testers in the world, despite having plenty of training data to go off of. It requires out-of-band thinking that they just don't possess.
I'll take a stab at this: LLMs currently seem to be rather good at details, but they seem to struggle greatly with the overall picture, in every subject.
- If I want Claude Code to write some specific code, it often handles the task admirably, but if I'm not sure what should be written, consulting Claude takes a lot of time and doesn't yield much insight, where as 2 minutes with a human is 100x more valuable.
- I asked ChatGPT about some political event. It mirrored the mainstream press. After I reminded it of some obvious facts that revealed a mainstream bias, it agreed with me that its initial answer was wrong.
These experiences and others serve to remind me that current LLMs are mostly just advanced search engines. They work especially well on code because there is a lot of reasonably good code (and tutorials) out there to train on. LLMs are a lot less effective on intellectual tasks that humans haven't already written and published about.
To be clear, you are suggesting “huge improvements” in “every intellectual task”?
This is unlikely for the trivial reason that some tasks are roughly saturated. Modest improvements in chess playing ability are likely. Huge improvements probably not. Even more so for arithmetic. We pretty much have that handled.
But the more substantive issue is that intellectual tasks are not all interconnected. Getting significantly better at drawing hands doesn’t usually translate to executive planning or information retrieval.
Sorry, I now realize this thread is about whether LLMs can improve on tasks and not whether AI can. Agreed there’s a lot of headroom for LLMs, less so for AI as a whole.
> They're already better than you at reciting historical facts.
They're better at regurgitating historical facts than me because they were trained on historical facts written by many humans other than me who knew a lot more historical facts. None of those facts came from an LLM. Every historical fact that isn't entirely LLM generated nonsense came from a human. It's the humans that were intelligent, not the fancy autocomplete.
Now that LLMs have consumed the bulk of humanity's written knowledge on history what's left for it to suck up will be mainly its own slop. Exactly because LLMs are not even a little bit intelligent they will regurgitate that slop with exactly as much ignorance as to what any of it means as when it was human generated facts, and they'll still spew it back out with all the confidence they've been programed to emulate. I predict that the resulting output will increasingly shatter the illusion of intelligence you've so thoroughly fallen for so far.
> At what? They're already better than me at reciting historical facts.
I wonder what happens if you ask deepseek about Tiananmen Square…
Edit: my “subtle” point was, we already know LLMs censor history. Trusting them to honestly recite historical facts is how history dies. “The victor writes history” has never been more true. Terrifying.
Surely you meant the latter? The boring option follows previous experience. No technology has ever not reached a plateau, except for evolution itself I suppose, till we nuke the planet.
LLMs aren't getting better that fast. I think a linear prediction says they'd need quite a while to maybe get "well past human ability", and if you incorporate the increases in training difficulty the timescale stretches wide.
Prediction markets have pretty much obviated the need for these things. Rather than rely on "was that really a hot take?" you have a market system that rewards those with accurate hot takes. The massive fees and lock-up period discourage low-return bets.
FWIW Polymarket (which is one of the big markets) has no lock-up period and, for now while they're burning VC coins, no fees. Otherwise agree with your point though.
As opposed to the current world of brigading social media threads to make consensus look like it goes your way and then getting journalists scraping by on covering clickbait to cover your brigading as fact?
I don't understand how the location of a 377 foot tall tree could be kept secret. Wouldn't that type of thing be visible in satellite imagery at the very least?
"The exact location of Hyperion is nominally secret but is available via internet search.[12] However, in July 2022, the Redwood Park superintendent closed the entire area around the tree, citing "devastation of the habitat surrounding Hyperion" caused by visitors. Its base was trampled by the overuse and as a result ferns no longer grow around the tree.[13]
Measures to protect the Hyperion tree were officially implemented in 2022 when the National Park Service (NPS) closed public access to its location in Redwood National Park.[14][15] Anyone who gets too close could face up to six months in jail and a $5,000 maximum fine.[13][16][17]"
Airpods are by far the best mass-market headphones in existence for apple device owners. The noise cancellation is unparalleled (which is huge if you use public transit or use them in the gym). The audio quality is also among the best you can get for a wireless headphone. This is true of both the Airpods Pros and the Max
Airpods are a joke. Apple killed the headphone jack for no reason, then sold the "solution", and people ate it up. Great business strategy for them to screw their customers for cash, but an abjectly terrible product. They are worse than wired headphones in every way except "they are wireless", which isn't actually a benefit.
> I don't knock it out of my head by having the wire catching on something
> Dealing with the cable and having to pack it back up when I'm done
> It auto connects to both my phone and laptop 99% of the time
> It easily swap between the 2 as I change the focus
Now they aren't perfect, charging can be a bit fiddly over time but they certainly are nicer than the normal headphones. Maybe you just aren't the target audience but clearly they are popular enough for most people.
Whenever a no ads tier is offered, a few ads always get shoved into the premium subscription eventually (see: spotify) because companies want to be able to reach the premium customers, who have more disposable income on average.
This is just the cloud provider taking the dependency on their logging service for you. It doesn’t change the shape of the graph.
reply