bubblelicious's comments

bubblelicious · 2025-11-08T15:33:01 1762615981

Why do people think this is any different than other major economic revolutions like electricity or the Industrial Revolution? Society is not going to collapse, things will just get weirder in both unbelievably positive ways and then also unbelievably negative ways, like the internet.

wartywhoa23 · 2025-11-08T15:44:05 1762616645

The question is why must the humankind strive for unbelievably positive things at the expense of being forever plagued with unbelievably negative?

I'd much rather live in a world of tolerable good and bad opposing each other in moderate ways.

bubblelicious · 2025-11-08T15:46:19 1762616779

Right let’s not have done the Industrial Revolution or the Internet or electricity

wartywhoa23 · 2025-11-08T15:54:56 1762617296

If that undoes the suffering of dozens of millions of human beings killed and maimed in WWI and WWII enabled by the Industrial Revolution, let us have not!

shortrounddev2 · 2025-11-08T16:30:44 1762619444

I think the value of the internet has proven to be pretty dubious. It seems to have only made things worse

ToValueFunfetti · 2025-11-08T15:52:58 1762617178

Electricity doesn't remove the need for human labor, it just increases productivity. If we produced AGI that could match top humans across all fields, it would mean no more jobs (knowledge jobs at least; physical labor elimination depends on robotics). That would make the university model obsolete- training researchers would be a waste of money, and the well-paid positions that require a degree and thus justify tuition would vanish. The economy would have to change fundamentally or else people would have to starve en masse.

If we produced ASI, things would become truly unpredictable. There are some obvious things that are on the table- fusion, synthetic meat, actual VR, immortality, ending hunger, global warming, or war, etc. We probably get these if they can be gotten. And then it's into unknown unknowns.

Perfectly reasonable to believe ASI is impossible or that LLMs don't lead to AGI, but there is not much room to question how impactful these would be.

bubblelicious · 2025-11-08T23:54:21 1762646061

I disagree, you have to take yourself back to when electricity was not widely available. How much labor did electricity eliminate? A LOT I imagine.

AI will make a lot of things obsolete but I think that is just the inherent nature of such a disruptive technology.

It makes labor cost way lower for many things. But how the economy reorganizes itself around it seems unclear but I don’t really share this fear of the world imploding. How could cheap labor be bad?

Robotics for physical labor lag way behind e.g. coding but only because we haven’t mastered how to figure out the data flywheel and/or transfer knowledge sufficiently and efficiently (though people are trying).

ToValueFunfetti · 2025-11-09T22:25:18 1762727118

>How much labor did electricity eliminate? A LOT I imagine.

90% or even 99.9% are in an entirely separate category from 100%. If a person can do 1000x labor per time and you have a use for the extra 999x labor, they and you can both benefit from the massive productivity gains. If that person can be replaced by as many robots and AIs as you like, you no longer have any use for them.

Our economy runs on the fact that we all have value to contribute and needs to fill; we exchange that value for money and then exchange that money for survival necessities plus extra comforts. If we no longer have any value versus a machine, we no longer have a method to attain food and shelter other than already having capital. Capitalism cannot exist under these conditions. And you can't get the AGI manager or AGI repairman job to account for it- the AGI is a better fit for these jobs too.

The only jobs that can exist under those conditions are government mandated. So we either run a jobs program for everybody or we provide a UBI and nobody works. Electricity didn't change anything so fundamental.

skywhopper · 2025-11-08T15:47:54 1762616874

The promise of AGI is that no human would have a job anymore. That is societal collapse.

cess11 · 2025-11-08T15:58:20 1762617500

Famously expressed as 'socialism or barbarism' by Rosa Luxemburg, who traced it back to Engels.

shortrounddev2 · 2025-11-08T16:29:51 1762619391

Because if you replace all of humans with machines, what jobs will be left?

bubblelicious · 2025-11-08T15:31:48 1762615908

Where does this view come from? I’m not aware of any real evidence for this. Also consider our data center buildouts in 26 and 27 will be absolutely extraordinary, and scaling is only at the beginning. You have a growing flywheel and plenty of synthetic data to break the data wall

ModernMech · 2025-11-08T15:36:09 1762616169

Let me put it this way: when ChatGPT tells me I've hit the "Free plan limit for GPT-5", I don't even notice a difference when it goes away or when it comes back. There's no incentive for me to pay them for access to 5 if the downgraded models are just as good. That's a huge problem for them.

riffraff · 2025-11-08T15:43:03 1762616583

Ditto for Gemini Pro and Flash, which I have on my phone.

I've been traveling in a country where I don't speak the language and or know the customs, and I found LLMs useful.

But I see almost zero difference between paid and unpaid plans, and I doubt I'd pay much or often for this privilege.

bubblelicious · 2025-11-08T15:43:00 1762616580

This based on any non anecdotal evidence by chance?

ModernMech · 2025-11-08T15:48:26 1762616906

Of course not but explain how I am ever going to pay OpenAI, a for-profit company any dollars? Sam Altman gets explosive angry when he's asked about how he's going to collect revenue, and that is why. He knows when push comes to shove, his product isn't worth to people what it costs him to operate it. It's Homejoy at trillion dollar scale, the man has learned nothing. He can't make money off this thing which is why he's trying to get the government to back it. First through some crazy "Universal Basic Compute" scheme, now I guess through cosigning loans? I dunno, I just don't buy that this thing has any legs as a viable business.

bubblelicious · 2025-11-09T00:37:45 1762648665

I think you’re welcome to that opinion and are far from alone but (1) I am very happy to pay for Claude, even $200/mo is worth it and (2) idk if people just sort of lose track or what of how far things have come in the span of literally a single year, with the knowledge that training infra is growing insanely and people are solving on fundamental problem after another.

ModernMech · 2025-11-09T02:07:23 1762654043

We live in a time when you can't even work for an hour and afford to eat a hamburger. You having the liquid cash to spend $200 a month on a digital assistant is the height of privilege, and that's the whole problem the AI industry has.

The pool of people willing to pay for these premium services for their own sake is not big. You've got your power users and your institutional users like universities, but that's it. No one else is willing to shell out that kind of cash for what it is. You keep pointing to how far it's come but that's not really the problem, and in fact that makes everything worse for OpenAI et al. Because, as they don't have a moat, they don't have customer lock-in, and they also soon will not have technological barriers either. The models are not getting good enough to be what they promise, but they are getting good enough to put themselves out of business. Once this version of ChatGPT gets small enough to fit on commodity hardware, OpenAI et al will have a very hard time offering a value proposition.

Basically, if OpenAI can't achieve AGI before ChatGPT4-type LLM can fit on desktop hardware, they are toast. I don't like those odds for them.

noir_lord · 2025-11-08T15:58:32 1762617512

Sell at a loss and make it up in volume.

It's been tried before, it generally ends in a crater.

_aavaa_ · 2025-11-08T15:40:28 1762616428

It is a problem easily solved with advertising.

ModernMech · 2025-11-08T15:43:50 1762616630

No, because as the history of hardware scaling shows us, things that run on supercomputers today will run on smartphones tomorrow. Current models already run fairly well on beefy desktop systems. Eventually models the quality of ChatGPT 4 will be open sourced and running on commodity systems. Then what? There's no moat.

treis · 2025-11-08T16:12:09 1762618329

10-20 years of your data in the form of chat history

Billions of users allowing them to continually refund their models

Hell by then your phone might be the OpenAI 1. The world's first AI powered phone (tm)

overfeed · 2025-11-08T17:53:25 1762624405

> The world's first AI powered phone

Do you remember the Facebook phone? Not many people do, because it was a failed project, and that was back when Android was way more open. Every couple of years, a tech company with billions has the brilliant idea: "Why don't we have a mobile platform that we control?", followed by failure. Amazon is the only qualified success in this area.

treis · 2025-11-09T03:19:14 1762658354

I agree that a slight twist on android doesn't make sense. A phone with a in integrated LLM with apps that are essentially prompts to the LLM might be different enough to gain market share.

overfeed · 2025-11-09T04:51:05 1762663865

HP, Microsoft, and Samsung all had a go with non-Android OSes.

candiddevmike · 2025-11-08T15:33:39 1762616019

We need a fundamental paradigm shift beyond transformers. Throwing more compute or data at it isn't pushing the needle.

marcosdumay · 2025-11-08T15:45:43 1762616743

Just to point, but there's no more data.

LLMs would always bottleneck on one of those two, as computing demand grows crazy quickly with the data amount, and data is necessarily limited. Turns out people threw crazy amounts of compute into it, so the we got the other limit.

Mistletoe · 2025-11-08T15:55:36 1762617336

Yeah I’m constantly reminded of a quote about this- you can’t make another internet. LLMs already digested the one we have.

bubblelicious · 2025-11-08T22:34:49 1762641289

Epoch has a pretty good analysis of bottlenecks here:

https://epoch.ai/blog/can-ai-scaling-continue-through-2030

There is plenty of data left, we don’t just train with crawled text data. Power constraints may turn out to be the real bottleneck but we’re like 4 orders of magnitude away

bigyabai · 2025-11-08T15:46:49 1762616809

Synthetic data works.

marcyb5st · 2025-11-08T16:03:18 1762617798

There's a limit to that according to: https://www.nature.com/articles/s41586-024-07566-y . Basically, if you use an LLM to augment a training dataset it will become "dumber" every subsequent generation and I am not sure how you can generate synthetic data for a language model without using a language model

yorwba · 2025-11-08T16:36:26 1762619786

Synthetic data doesn't have to come from an LLM. And that paper only showed that if you train on a random sample from an LLM, the resulting second LLM is a worse model of the distribution that the first LLM was trained on. When people construct synthetic data with LLMs, they typically do not just sample at random, but carefully shape the generation process to match the target task better than the original training distribution.

bubblelicious · 2025-11-08T15:36:56 1762616216

And you don’t think that’s already happening? Also where is your evidence for this?

bigyabai · 2025-11-08T15:46:16 1762616776

> Also where is your evidence for this?

The fact that "scaling laws" didn't scale? Go open your favorite LLM in a hex editor, oftentimes half the larger tensors are just null bytes.

bubblelicious · 2025-11-08T15:57:40 1762617460

Show me a paper, this makes no sense of course scaling laws are scaling

skywhopper · 2025-11-08T15:45:56 1762616756

There is zero evidence that synthetic data will provide any real benefit. All common sense says it can only reinforce and amplify the existing problems with LLMs and other generative “AI”.

bubblelicious · 2025-11-08T15:47:05 1762616825

Sounds like someone has no knowledge of the literature, synthetic data isn’t like asking ChatGPT to give you a bunch of fake internet data.

bubblelicious · 2025-11-08T14:56:28 1762613788

I work on LLM benchmarks and human evals for a living in a research lab (as opposed to product). I can say: it’s pretty much the Wild West and a total disaster. No one really has a good solution, and researchers are also in a huge rush and don’t want to end up making their whole job benchmarking. Even if you could, and even if you have the right background you can do benchmarks full time and they still would be a mess.

Product testing (with traditional A/B tests) are kind of the best bet since you can measure what you care about _directly_ and at scale.

I would say there is of course “benchmarketing” but generally people do sincerely want to make good benchmarks it’s just hard or impossible. For many of these problems we’re hitting capabilities where we don’t even have a decent paradigm to use,

bjackman · 2025-11-08T15:27:17 1762615637

For what it's worth, I work on platforms infra at a hyperscaler and benchmarks are a complete fucking joke in my field too lol.

Ultimately we are measuring extremely measurable things that have an objective ground truth. And yet:

- we completely fail at statistics (the MAJORITY of analysis is literally just "here's the delta in the mean of these two samples". If I ever do see people gesturing at actual proper analysis, if prompted they'll always admit "yeah, well, we do come up with a p-value or a confidence interval, but we're pretty sure the way we calculate it is bullshit")

- the benchmarks are almost never predictive of the performance of real world workloads anyway

- we can obviously always just experiment in prod but then the noise levels are so high that you can entirely miss million-dollar losses. And by the time you get prod data you've already invested at best several engineer-weeks of effort.

AND this is a field where the economic incentives for accurate predictions are enormous.

In AI, you are measuring weird and fuzzy stuff, and you kinda have an incentive to just measure some noise that looks good for your stock price anyway. AND then there's contamination.

Looking at it this way, it would be very surprising if the world of LLM benchmarks was anything but a complete and utter shitshow!

sky2224 · 2025-11-09T01:28:52 1762651732

> we completely fail at statistics (the MAJORITY of analysis is literally just "here's the delta in the mean of these two samples". If I ever do see people gesturing at actual proper analysis, if prompted they'll always admit "yeah, well, we do come up with a p-value or a confidence interval, but we're pretty sure the way we calculate it is bullshit")

Sort of tangential, but as someone currently taking an intro statistics course and wondering why it's all not really clicking given how easy the material is, this for some reason makes me feel a lot better.

hansvm · 2025-11-09T05:44:36 1762667076

FWIW, I don't think intro stats is easy the way I normally see it taught. It focuses on formulae, tests, and step-by-step recipes without spending the time to properly develop intuition as to why those work, how they work, which ones you should use in unfamiliar scenarios, how you might find the right thing to do in unfamiliar scenarios, etc.

Pair that with skipping all the important problems (what is randomness, how do you formulate the right questions, how do you set up an experiment capable of collecting data which can actually answer those questions, etc), and it's a recipe for disaster.

It's just an exercise in box-ticking, and some students get lucky with an exceptional teacher, and others are independently able to develop the right instincts when they enter the class with the right background, but it's a disservice to almost everyone else.

eucyclos · 2025-11-09T10:30:25 1762684225

I found the same when I was taking intro to stats - I did get a much better intuition for what stuff meant after reading 'superforecasting' by tetlock and gardner - I find I'm recommending that book a lot come to think of it.

jiggawatts · 2025-11-08T21:41:16 1762638076

“Here’s the throughout at sustained 100% load with the same ten sample queries repeated over and over.”

“The customers want lower latency at 30% load for unique queries.”

“Err… we can scale up for more throughput!”

ಠ_ಠ

EdwardDiego · 2025-11-09T07:18:41 1762672721

And then when you ask if they disabled the query result cache before running their benchmarking, they blink and look confused.

nopurpose · 2025-11-09T08:02:19 1762675339

Then you see 25% cache hit rate in production and realise that disabling it for benchmark is not a good option either.

fragmede · 2025-11-08T22:11:41 1762639901

In AI though, you also have the world trying to compete with you, so even if you do totally cheat and put the benchmark answers in your training set and over fit, if it turns out that you model sucks, it doesn't matter how much your marketing department tells everyone you scored 110% on SWE bench, if it doesn't work out that well in production, your announcement's going to flow as users discover it doesn't work that well on their personal/internal secret benchmarks and tell /r/localLLAMA it isn't worth the download.

Whatever happened with Llama 4?

bofadeez · 2025-11-08T16:27:05 1762619225

Even a p-value is insufficient. Maybe can use some of this stuff https://web.stanford.edu/~swager/causal_inf_book.pdf

bjackman · 2025-11-08T17:04:36 1762621476

I have actually been thinking of hiring some training contractors to come in and teach people the basics of applied statistical inference. I think with a bit of internal selling, engineers would generally be interested enough to show up and pay attention. And I don't think we need very deep expertise, just a moderate bump in the ambient level of statistical awareness would probably go a long way.

It's not like there's a shortage of skills in this area, it seems like our one specific industry just has a weird blindspot.

stogot · 2025-11-08T22:35:02 1762641302

Don’t most computer science programs require this? Mine had a statistics requirement

Al-Khwarizmi · 2025-11-09T00:31:14 1762648274

I don't know how it is in the US and other countries, but in my country I would say statistics is typically not taught well, at least in CS degrees. I was a very good student, always had good understanding at the subjects at university, but in the case of statistics they just taught us formulae and techniques as dogmas without much explanation of where they came from, why, and when to use them. It didn't help either that the exercises we did always applied them to things outside CS (clinical testing, people's heights and things like that) with no application we could directly relate to. As a result, when I finished the degree I had forgotten most of it, and when I started working I was surprised that it was actually useful.

When I talk about this with other CS people in my own country (Spain) they tend to refer similar experiences.

grugagag · 2025-11-09T18:13:10 1762711990

I had the same experience in the US

cyanydeez · 2025-11-08T23:06:20 1762643180

Id say your experience is being more monetized for growth for growth sake.

bjackman · 2025-11-09T11:32:27 1762687947

Actually I disagree that that's what's going on in the world of hyperscaler platforms. There is genuinely a staggering amount of money on the line with the efficiency of this platform. Plus, we have extremely sophisticated and performance-sensitive customers who are directly and continuously comparing us with our competitors.

This isn't just that nobody cares about the truth. People 100% care! If you actually degrade a performance metric as measured post-hoc in full prod, someone will 100% notice, and if you want to keep your feature un-rolled-back, you are probably gonna have to have a meeting with someone that has thousands of reports, and persuade them it's worth it to the business.

But you're always gonna have more luck if you can have that meeting _before_ you degrade it. But... it's usually pretty hard to figure out what the exact degradation is gonna be, because of the things in my previous comment...

ACCount37 · 2025-11-08T15:05:40 1762614340

A/B testing is radioactive too. It's indirectly optimizing for user feedback - less stupid than directly optimizing for user feedback, but still quite dangerous.

Human raters are exploitable, and you never know whether the B has a genuine performance advantage over A, or just found a meat exploit by an accident.

It's what fucked OpenAI over with 4o, and fucked over many other labs in more subtle ways.

bubblelicious · 2025-11-08T15:27:54 1762615674

Are you talking about just preferences or A/B tests on like retention and engagement? The latter I think is pretty reliable and powerful though I have never personally done them. Preferences are just as big a mess: WHO the annotators are matters, and if you are using preferences as a proxy for like correctness, you’re not really measuring correctness you’re measuring e.g. persuasion. A lot of construct validity challenges (which themselves are hard to even measure in domain).

ACCount37 · 2025-11-08T17:10:19 1762621819

Yes. All of them are poisoned metrics, just in different ways.

GPT-4o's endless sycophancy was great for retention, GPT-5's style of ending every response in a question is great for engagement.

Are those desirable traits though? Doubt it. They look like simple tricks and reek of reward hacking - and A/B testing rewards them indeed. Direct optimization is even worse. Combining the two is ruinous.

Mind, I'm not saying that those metrics are useless. Radioactive materials aren't useless. You just got to keep their unpleasant properties in mind at all times - or suffer the consequences.

scuff3d · 2025-11-08T23:19:05 1762643945

The big problem is that tech companies and journalist aren't transparent about this. They tout benchmark numbers constantly, like they're an object measure of capabilities.

mbesto · 2025-11-09T15:40:23 1762702823

HN members do too. Look at my comment history.

The general populace doesn't care to question how benchmarks are formulated and what their known (and unknown) limitations are.

That being said, they are likely decent proxies. For example, I think the average user isn't going to observe a noticeable difference between Claude Sonnet and OpenAI Codex.

ACCount37 · 2025-11-09T05:47:22 1762667242

That's because they are as close to "object measure capabilities" as anything we're ever going to get.

Without benchmarks, you're down to evaluating model performance based on vibes and vibes only, which plain sucks. With benchmarks, you have numbers that correlate to capabilities somewhat.

achierius · 2025-11-09T06:10:58 1762668658

That's assuming these benchmarks are the best we're ever going to get, which they clearly aren't. There's a lot to improve even without radical changes to how things are done.

ACCount37 · 2025-11-09T17:11:09 1762708269

The assumption I make is that "better benchmarks" are going to be 5% better, not 5000% better. LLMs are getting better capabilities faster than the benchmarks get better at measuring them accurately.

So, yes, we just aren't going to get anything that's radically better. Just more of the same, and some benchmarks that are less bad. Which is still good. But don't expect a Benchmark Revolution when everyone suddenly realizes just how Abjectly Terrible the current benchmarks are, and gets New Much Better Benchmarks to replace them with. The advances are going to be incremental, unimpressive, and meaningful only in aggregate.

scuff3d · 2025-11-09T07:55:25 1762674925

So because there isn't a better measure it's okay that tech companies effectively lie and treat these benchmarks like they mean more then they actually do?

ACCount37 · 2025-11-09T17:14:17 1762708457

Sorry, pal, but if benchmarks were to disagree with opinions of a bunch of users saying "tech companies bad"? I'd side with benchmarks at least 9 times out of 10.

scuff3d · 2025-11-09T19:25:29 1762716329

How does that have anything to do with what we're talking about?

ACCount37 · 2025-11-10T12:39:19 1762778359

What that has to do is: your "tech companies are bad for using literally the best tool we have for measuring AI capabilities when talking about AI capablities" take is a very bad take.

It's like you wanted to say "tech companies are bad", and the rest is just window dressing.

kingstnap · 2025-11-09T01:07:06 1762650426

In my experience everyone openly talks about how benchmarks are bullshit. On Twitter or on their podcast interviews or whatever everyone knows benchmarks are a problem. It's never praise.

Of course they tout benchmark numbers because let's be real, if they didn't tout benchmarks your not going to bother using it. For example if someone posts some random model on huggingface with no benchmarks you just won't proceed.

Humans have a really strong prior to not waste time. We always always evaluate things hierarchally. We always start with some prior and then whatever is easiest goes next even if its a shitty unreliable measure.

For example, for Gemini 3 everyone will start with a prior that it is going to be good. Then they will look at benchmarks, and only then will they move to harder evaluations on their own use cases.

scuff3d · 2025-11-09T02:55:45 1762656945

I don't use them regardless of the benchmarks, but I take your point.

Regardless though, I think the marketing could be more transparent

liqilin1567 · 2025-11-09T05:18:22 1762665502

> Brittle performance – A model might do well on short, primary school-style maths questions, but if you change the numbers or wording slightly, it suddenly fails. This shows it may be memorising patterns rather than truly understanding the problem

This finding really shocked me

bofadeez · 2025-11-08T15:31:54 1762615914

Has your lab tried using any of the newer causal inference–style evaluation methods? Things like interventional or counterfactual benchmarking, or causal graphs to tease apart real reasoning gains from data or scale effects. Wondering if that’s something you’ve looked into yet, or if it’s still too experimental for practical benchmarking work.

andy99 · 2025-11-08T22:44:30 1762641870

I also work in LLM evaluation. My cynical take is that nobody is really using LLMs for stuff, and so benchmarks are mostly just make up tasks (coding is probably the exception). If we had real specific use cases it should be easier to benchmark and know if one is better, but it’s mostly all hypothetical.

The more generous take is that you can’t benchmarks advanced intelligence very well, whether LLM or person. We don’t have good procedures for assessing a person's fit-for-purpose e.g. for a job, certainly not standardized question sets. Why would we expect to be able to do this with AI?

I think both of these takes are present to some extent in reality.

brookst · 2025-11-09T03:42:27 1762659747

Do you not have massive volumes of customer queries to extract patterns for what people are actually doing?

We struggle a bit with processing and extracting this kind of insight in a privacy-friendly way, but there’s certainly a lot of data.

Kostchei · 2025-11-09T04:20:09 1762662009

We have 20+ services in prod that use llms. So I have 50k (or more) per service per day of data to evaluate. The question is- do people actually evaluate properly.

And how do you do an apples to apples evaluation of such squishy services?

econ · 2025-11-09T05:15:36 1762665336

You could have the world expert debate the thing. Someone who can be accused of knowing things. We have many such humans, at least as many as topics.

Publish the debate as~is so that others vaguely familiar with the topic can also be in awe or disgusted.

We have many gradients of emotion. No need to try quantify them. Just repeat the exercise.

jimmySixDOF · 2025-11-09T11:05:46 1762686346

Terminal Bench 2.0 just dropped and a big success factor they stress is the hand crafted phd level rollout tests they picked aprox 80 out of 120 with the incentive that anyone who contributed 3 would get listed as a paper author this resulted in high quality participation equivalent to foundation labs proprietary agentic RL data but it's FOSS.

j45 · 2025-11-08T19:39:59 1762630799

What gets measured, gets managed and improved, though.

bubblelicious · 2025-11-08T01:42:45 1762566165

Can you be more specific? What are the trivial cases you’re talking about? AI just doesn’t work? Coding agents are not saving anyone any time?

bdangubic · 2025-11-08T01:46:09 1762566369

don’t bother with these questions, same people will say excel can’t get anything done and it sucks :) people that know and (more importantly) take time to learn are doing amazing sh*t with it

zerosizedweasle · 2025-11-08T01:47:53 1762566473

It's not that it doesn't have some use cases that "work", it's that a lot of the output is at "AI slop quality" It's more work to turn it into something good than start from scratch. Look at all those lawyers and judges submitting stuff that has laughable citations on non-existent cases.

slyall · 2025-11-08T01:59:59 1762567199

Sure but OP said that it doesn't even work in trivial cases.

Most of the anti-AI people have conceded it sometimes works but they still say it is unreliable or has other problems (copyright etc). However there are still a few that say it doesn't work at all.

candiddevmike · 2025-11-08T02:08:08 1762567688

If something isn't reliable, I don't think it works at all. I'm trying to work, not play a slot machine.

slyall · 2025-11-08T02:23:48 1762568628

Are all the tools you use 100% reliable?

Cause I use things like computers, applications, search engines and websites that regularly return the wrong result or fail

bubblelicious · 2025-11-08T02:45:07 1762569907

I’m not really sure how you envision AI use at your job but AI can be the extremely imperfect tool it is now and also be extremely useful. What part of AI use to you feels like a slot machine?

bdangubic · 2025-11-08T02:51:43 1762570303

damn! with this attitude I’d be left using abacus…

bubblelicious · 2025-11-08T02:13:07 1762567987

It just totally is different from my own personal experience which leads me to believe people just are lamenting poor usage of AI tools which is very understandable.

But nuanced and effective AI use, even today with current models, is incredible for productivity in my experience

bdangubic · 2025-11-08T02:55:09 1762570509

I’ve been hacking since the ‘90’s, it is the most remarkable productivity boost we’ve ever had. I feel awful for people that don’t take time to learn…

l1ng0 · 2025-11-08T08:05:01 1762589101

I expect it makes a big difference what kind of work one does. For me, working with a legacy codebase for firmware, with 1000s of lines of C in each module, AI is very slow (~5-10s response time) and almost none of the code is acceptable.

I do however find it useful for getting an overview of dense chunks of confusing code.

Supermancho · 2025-11-08T02:07:54 1762567674

Intellij guesses the functions I want to write plenty. I don't think it's useful to try to use AI for complex or nuanced needs (although it gets close in middling cases). I think it's useful enough.

bubblelicious · 2025-11-08T01:34:39 1762565679

What’s the controversy, unless people are straw manning or pulling from some bad personal experience?

If you are not leveraging the best existing tools for your job (and understanding their limitations) then your output will be lower than it should be and company leadership should care about that.

Claude reduces my delivery time at my job like 50%, not to mention things that get done that would never have been attempted before. LLMs do an excellent job seeding literature reviews and summarizing papers. Would be a pretty bad move for someone in my position to not use AI, and would be pretty unreasonable of leadership not to recognize this.

CivBase · 2025-11-08T02:00:51 1762567251

Crazy idea: Evaluate me based on my output and not which tools I use. If AI is the killer productivity boost you claim, then I'll have no choice in order to keep up.

bubblelicious · 2025-11-08T02:04:01 1762567441

I think that’s perfectly fair.

However, if you were leadership in this scenario, and you see people using various AI tools are systematically more productive then the people that aren’t, what would you do?

CivBase · 2025-11-08T02:29:21 1762568961

Ask questions instead of making demands. Presumably you hired your engineers because they're smart. If you hired dumb engineers then you have a much bigger problem than a lack of AI utilization.

hooverd · 2025-11-08T01:50:00 1762566600

at what point do you actually not know anything?

bubblelicious · 2025-11-08T02:03:42 1762567422

What do you mean?

bubblelicious · 2025-10-12T21:47:59 1760305679

Great take! Certainly resonates with me a lot

- this is war path funding

- this is geopolitics; and it’s arguably a rational and responsible play

- we should expect to see more nationalization

- whatever is at the other end of this seems like it will be extreme

And, the only way out is through

bubblelicious · 2025-10-09T18:43:46 1760035426

> asking where today’s AI bubble – because that’s what it clearly is – fits in a 1990s timeline

Yet evidence is:

- CAPE is high - NVIDIA has a very large market cap - enormous capital investment in AI and relatively few companies

I assume also a chorus of “AI actually doesn’t make you more productive!” And “AI capex and opex vastly outweighs realized profits!”

Seems a little less than “clearly”.

All of these are very much RISK factors, yet you need to assume the market is being irrational and assume that AI is NOT going to have the impact the market thinks it will. Personally for me I don’t understand: pretty clear trend in capabilities without a clear and insurmountable roadblock, so I totally get the “I think it’s a bubble” argument, it’s just that I think people underestimate what’s to come

lepus · 2025-10-10T01:23:06 1760059386

"This time is different"

bubblelicious · 2025-10-12T16:34:30 1760286870

It’s more like people pretend it’s “obvious” that the market is going to do something or that AI developments are not going to support the spend, it’s just not obvious to me. I think people overweight probability on a bubble. Saying it’s “clearly a bubble” is hubris.

bubblelicious · 2025-08-27T16:32:21 1756312341

Really hard to believe articles like this and even more hard to believe this is the hive mind of hacker news today.

Work for a major research lab. So much headroom, so much left on the table with every project, so many obvious directions to go to tackle major problems. These last 3 years have been chaotic sprints. Transfusion, better compressed latent representations, better curation signals, better synthetic data, more flywheel data, insane progress in these last 3 years that somehow just gets continually denigrated by this community.

There is hype and bullshit and stupid money and annoying influencers and hyperbolic executives, but “it’s a bubble” is absurd to me.

It would be colossally stupid for these companies to not pour the money they are pouring into infrastructure buildouts and R&D. They know it’s going to be a ton of waste, nobody in these articles are surprising anyone. These articles are just not very insightful. Only silver lining to reading the comments and these articles is the hope that all of you are investing optimally for your beliefs.

michaeldoron · 2025-08-27T16:59:04 1756313944

I agree completely.

I work as a ML researcher in a small startup researching, developing and training large models on a daily basis. I see the improvements done in my field every day in academia and in the industry, and newer models come out constantly that continue to improve the product's performance. It feels as if people who talk about AI being a bubble are not familiar with AI which is not LLMs, and the amazing advancements it already did in drug discovery, ASR, media generation, etc.

If foundation model development stopped right now and chatgpt would not be any better, there would be at least five if not ten years of new technological developments just to build off the models we have trained so far.

ivm · 2025-08-27T17:12:15 1756314735

Yes, HN discussions of LLMs are quite tiresome. I make indie apps, but it has been getting worse and worse over the years, as the API surfaces and UI variety of iOS and Android have grown.

Claude Code and ChatGPT brought me back to the early 2010s golden age when indies could be a one-man army. Not only code, but also for localizations, marketing. I'm even finally building some infrastructure for QA automation! And tests, lots of tests. Unimaginable for me before because I never had that bandwidth.

Not to mention that they unblock me and have basically fixed a large part of my ADHD issues because I can easily kickstart whatever task or delegate the most numbing routine work to an agent.

Just released a huge update of my language-learning app that I would never dreamed of without LLM assistance (lots of meticulous grammar-related work over many months) and have been getting a stream of great reviews. And all of that for only $100+20 a month – I was paying almost twice as much for Unity3d subscription a decade ago.

ivape · 2025-08-27T17:51:56 1756317116

All that is fine. The bubble only happens if in your ecstasy you manage to think more of your indie apps, in which case Wallstreet has no qualms about taking any rando AI app public. When this is done at scale, you create the toxic asset that 401ks pile into.

In short, you and others like you will enjoy your time, but will care very little of the systemic risk you are introducing.

But hey, whatever, gotta nut, right?

—-

I don’t mean you specifically. Companies like Windsurf, Cursor, many, they are all currently building the package for Wallstreet with literally no care that it will pull in retail investment en masse. This is going to be a fucked up rug pull for regular investors in a few years.

We’re in a much wilder financial environment since 2008. It’s very normal for crypto to be seen as a viable investment. AI is going to appear even more viable. Things are primed.

dehrmann · 2025-08-27T16:46:19 1756313179

Upvoted for a different perspective.

The thing to remember about the HN crowd is it can be a bit cynical. At the same time, realize that everyone's judging AI progress not on headroom and synthetic data usage, but on how well it feels like it's doing, external benchmarks, hallucinations, and how much value it's really delivering. The concern is that for all the enthusiasm, generative AI's hard problems still seem unsolved, the output quality is seeing diminishing returns, and actually applying it outside language settings has been challenging.

bubblelicious · 2025-08-27T17:39:38 1756316378

Yea a lot of this I understand and appreciate!

- offline and even online benchmarks are terrible unless actually a standard product experiment (a/b test etc). Evaluation science is extremely flawed.

- skepticism is healthy!

- measure on delivered value vs promised value!

- there are hard problems! Possibly ones that require paradigm shifts that need time to develop!

But

- delivered value and developments alone are extraordinary. Problems originally thought unsolvable are now completely tractable or solved even if you rightfully don’t trust eval numbers like LLMArena, market copy, and offline evals.

- output quality is seeing diminishing returns? I cannot understand this argument at all. We have scaled the first good idea with great success. People really believe this is the end of the line? We’re out of great ideas? We’ve just scratched the surface.

- even with a “feels” approach, people are unimpressed?? It’s subjective, you are welcome to be unimpressed. But I just cannot understand or fathom how

MattGrommes · 2025-08-27T17:58:07 1756317487

The way I've been thinking about this is that there is The Tech and The Business. The Tech is amazing and improving all the time at the core, then there are the apps being built to take advantage of the Tech, a lot of which are also amazing.

But The Business is the bubble part. Like all the companies during the first internet boom/bubble who did stuff like lay tons of fiber and raise tons of money for rickety business plans. Those companies went out of business but the fiber was still there and still useful. So I think you're right in that the Tech part is being shafted a little in the conversation because the Business part is so bubbly.

dang · 2025-08-27T18:03:23 1756317803

The community is divided about this. There's no one hivemind.

There's a general negativity bias on the internet (and probably in humans at large) which skews the discourse on this topic as any other - but there are plenty of active, creative LLM enthusiasts here.

bubblelicious · 2025-08-28T15:20:38 1756394438

I agree — probably my own selective memory and straw-manning. It just feels in my mind like the “vibe” on HN (in terms of articles that reach the front page and top rated comments) is very anti-AI. But of course even if true it is a biased picture of HN readers.

Would be interesting to see some analysis from HN data to understand just how accurate my perception is; of course doesn’t clear up the bias issue.

colinmorelli · 2025-08-27T20:59:58 1756328398

I'll take a shot at rationale for this perspective, which is similar to a peer comment:

The tech is undoubtedly impressive, and I'm sure has a ton of headroom to grow (although I have no direct knowledge of this, but I'd take you at your word, because I'm sure it's true).

But at least my perception of the idea that this is a "bubble" presently is rooted in the businesses that are created using the technology. Tons of money spent to power AI agents to conduct tasks that would be 99% less expensive to conduct via a simple API call, or because the actual unstructured work is 2 or 3 levels higher in the value chain, and given enough time, there will be new vertically integrated companies that use AI to solve the problem at the root and eliminate the need for entire categories of companies at the level below.

In other words: the root of the bubble (to me) is not that the value will never be realized, but that many (if not most) of this crop of companies, given the amount of time the workflows and technology have had to take hold in organizations, will almost certainly not be able to survive long enough to be the ones to realize it.

This also seems to be why folks draw comparison to the dot com bubble, because it was quite similar. The tech was undoubtedly world changing. But the world needed time to adapt, and most of those companies no longer exist, even though many of the problems were solved a decade later by a new startup who achieved incredible scale.