Blameless postmortem culture recognizes human error as an inevitability and asks...

protocolture · 2025-12-03T06:40:21 1764744021

Yeah I see things like "AI Firewalls" as both, firstly ridiculously named, but also, the idea you can slap an applicance (thats sometimes its own LLM) onto another LLM and pray that this will prevent errors to be lunacy.

For tasks that arent customer facing, LLMs rock. Human in the loop. Perfectly fine. But whenever I see AI interacting with someones customer directly I just get sort of anxious.

Big one I saw was a tool that ingested a humans report on a safety incident, adjusted them with an LLM, and then posted the result to an OHS incident log. 99% of the time its going to be fine, then someones going to die and the the log will have a recipe for spicy noodles in it, and someones going to jail.

jonplackett · 2025-12-03T08:55:14 1764752114

The air Canada chatbot that mistakenly told someone they can cancel and be refunded for a flight due to a bereavement is a good example of this. It went to court and they had to honour the chatbot’s response.

It’s quite funny that a chatbot has more humanity than its corporate human masters.

kebman · 2025-12-03T11:14:42 1764760482

Not AI, but similar sounding incident in Norway. Some traders found a way to exploit another company's trading bot at the Oslo Stock Exchange. The case went to court. And the court's ruling? "Make a better trading bot."

Marazan · 2025-12-03T11:55:12 1764762912

I am so glad to read this. Last I had read on the case was that the traders were (outrageously) convicted of market manipulation: https://www.cnbc.com/2010/10/14/norwegians-convicted-for-out...

But you are right, they appealed and had their appeal upheld by the Supreme Courts: https://www.finextra.com/newsarticle/23677/norwegian-court-a...

I am so glad at the result.

RobotToaster · 2025-12-03T10:57:15 1764759435

Chatbots have no fear of being fired, most humans would do the same in a similar position.

roughly · 2025-12-03T16:58:13 1764781093

More to the point, most humans loudly declare they would do the right thing, so all the chatbot’s training data is on people doing the right thing. There’s comparatively fewer loud public pronunciations of personal cowardice, so if the bot’s going to write a realistic completion, it’s more likely to conjure an author acting heroically.

SoftTalker · 2025-12-03T17:10:39 1764781839

Do they not? If a chatbot isn't doing what its owners want, won't they just shut it down? Or switch to a competitor's chatbot?

actionfromafar · 2025-12-03T12:34:28 1764765268

"... adding fear into system prompt"

shinycode · 2025-12-03T10:45:55 1764758755

What a nice side effect, unfortunately they’ll lock chatbots with more barriers in the future but that’s ironic.

danaris · 2025-12-03T12:17:53 1764764273

...And under pressure, those barriers will fail, too.

It is not possible, at least with any of the current generations of LLMs, to construct a chatbot that will always follow your corporate policies.

Loughla · 2025-12-03T16:51:35 1764780695

That's what people aren't understanding, it seems.

You are providing people with an endlessly patient, endlessly novel, endlessly naive employee to attempt your social engineering attacks on. Over and over and over. Hell, it will even provide you with reasons for its inability to answer your question, allowing you to fine-tune your attacks faster and easier than with a person.

Until true AI exists, there are no actual hard-stops, just guardrails that you can step over if you try hard enough.

We recently cancelled a contract with a company because they implemented student facing AI features that could call data from our student information and learning management systems. I was able to get it to give me answers to a test for a class I wasn't enrolled in and PII for other students, even though the company assured us that, due to their built-in guardrails, it could only provide general information for courses that the students are actively enrolled in (due dates, time limits, those sorts of things). Had we allowed that to go live (as many institutions have), it was just a matter of time before a savvy student figured that out.

We killed the connection with that company the week before finals, because the shit-show of fixing broken features was less of a headache than unleashing hell on our campus in the form of a very friendly chatbot.

PunchyHamster · 2025-12-03T17:16:11 1764782171

With chat ai + guardrail AI it probably will get to the point of it being sure enough that the amount of mistakes won't hit the bottom line.

...and we will find a way to turn it into malicious compliance where rules are not broken but stuff corporation wanted to happen doesn't.

butlike · 2025-12-03T14:41:16 1764772876

Efficiency, not money, seems to be the currency of chatbots

delichon · 2025-12-03T11:23:49 1764761029

That policy would be fraudulently exploited immediately. So is it more humane or more gullible?

I suppose it would hallucinate a different policy if it includes in the context window the interests of shareholders, employees and other stakeholders, as well as the customer. But it would likely be a more accurate hallucination.

ben_w · 2025-12-03T12:29:28 1764764968

> 99% of the time its going to be fine, then someones going to die and the the log will have a recipe for spicy noodles in it, and someones going to jail.

I agree, and also I am now remembering Terry Pratchett's (much lower stakes) reason for getting angry with his German publisher: https://gmkeros.wordpress.com/2011/09/02/terry-pratchett-and...

Which is also the kind of product placement that comes up at least once in every thread about how LLMs might do advertising.

antonvs · 2025-12-03T13:27:44 1764768464

> … LLMs might do advertising.

It’s no longer “might”. There was very recently a leak that OpenAI is actively working on this.

ben_w · 2025-12-03T13:33:57 1764768837

It's "how LLMs might do" it right up until we see what they actually do.

There's lots of other ways they might do it besides this way.

herbst · 2025-12-03T15:55:52 1764777352

Even if they don't offer it. People will learn how to poison AI corupus just like they did with search results.

We ain't safe from aggressive ai ads either way

antonvs · 2025-12-03T22:15:57 1764800157

You seem to be indulging in wishful thinking.

PunchyHamster · 2025-12-03T17:16:57 1764782217

"I see you're annoyed with that problem, did you ate recently ? There is that restaurant that gets great reviews near you, and they have a promotion!"

mikkupikku · 2025-12-03T14:34:55 1764772495

> the idea you can slap an applicance (thats sometimes its own LLM) onto another LLM and pray that this will prevent errors to be lunacy

It usually works though. There are no guarantees of course, but sanity checking an LLMs output with another instance of itself usually does work because LLMs usually aren't reliably wrong in the same way. For instance if you ask it something it doesn't know and it hallucinates a plausible answer, another instance of the same LLM is unlikely to hallucinate the same exact answer, it'll probably give you another answer, which is your heads up that probably both are wrong.

protocolture · 2025-12-03T22:45:14 1764801914

Yeah but, real firewalls are deterministic. Hoping that a second non deterministic thing, will make something more deterministic is weird.

Probably usually it will work, like probably usually the LLM can be unsupervised. but that 1% error rate in production is going to add up fast.

phatskat · 2025-12-03T16:56:51 1764781011

Sure, and then you can throw another LLM in and make them come to a consensus, of course that could be wrong too so have another three do the same and then compare, and then…

SoftTalker · 2025-12-03T17:27:49 1764782869

Or maybe it will be a circle of LLMs all coming up with different responses and all telling each other "You're absolutely right!"

bsenftner · 2025-12-03T17:17:24 1764782244

I have an ongoing and endless debate with a PhD that insists consensus of multiple LLMs is a valid proof check. The guy is a neuroscientist, not at all a developer tech head, and is just stubborn, continually projecting a sentient being perspective on his LLM usage.

mikkupikku · 2025-12-03T17:26:47 1764782807

This, but unironically. It's not much different from the way human unreliability is accounted for. Add more until you're satisfied a suitable ratio of mistakes will be caught.

PunchyHamster · 2025-12-03T17:12:47 1764781967

It's "wonderfully" human way.

Just like sometimes you need senior/person at power to tell the junior "no, you can't just promise the project manager shorter deadline with no change in scope, and if PM have problem with that they can talk with me", now we need Judge Dredd AI to keep the law when other AIs are bullied into misbehaving

littlestymaar · 2025-12-04T07:00:06 1764831606

> For tasks that arent customer facing, LLMs rock. Human in the loop. Perfectly fine. But whenever I see AI interacting with someones customer directly I just get sort of anxious

Especially since every mainstream model has been human preference-tuned to obey the requests of the user…

I think you may be able to have an LLM customer facing, but it would have to be a purpose-trained one from a base model, not a repurposed sycophantic chat assistant.

n4r9 · 2025-12-03T09:02:36 1764752556

Exactly what I've been worrying about for a few months now [0]. Arguments like "well at least this is as good as what humans do, and much faster" are fundamentally missing the point. Humans output things slowly enough that other humans can act as a check.

[0] https://news.ycombinator.com/item?id=44743651

obscurette · 2025-12-03T15:09:14 1764774554

I've heard people working in construction industry mentioning that quality of design fell off the cliff when industry began to use computers more widely – less time and less people involved. The same is true about printing – there was much more time and people in the loop before computers. My grandmother worked with linotype machine printing newspapers. They were really good at catching and fixing grammar errors, sometimes catching even factual errors etc.

lazide · 2025-12-03T14:27:04 1764772024

looks at the current state of the US government

Do they? Because near as I can tell, speed running around the legal system - when one doesn’t have to worry about consequences - works just fine.

n4r9 · 2025-12-03T15:09:43 1764774583

That's a good point. I'm talking specifically in the context of deploying code. The potential for senior devs to be totally overwhelmed with the work of reviewing junior devs' code is limited by the speed at which junior devs create PRs.

lazide · 2025-12-03T16:39:17 1764779957

So today? With ML tools?

n4r9 · 2025-12-04T12:11:41 1764850301

Could you explain what you mean, please?

lazide · 2025-12-04T13:15:42 1764854142

Junior devs can currently create CLs/PRs faster than the senior can review them.

n4r9 · 2025-12-04T14:06:01 1764857161

Indeed. In the language of the post I linked [0]: it's currently an occasional problem, and it risks becoming a widespread rot.

[0] https://news.ycombinator.com/item?id=44743651

KronisLV · 2025-12-03T19:10:02 1764789002

> Now we've invented automation that commits human-like error at scale.

Then we can apply the same (or similar) guardrails that we'd like to use for humans, to also control the AI behavior.

First, don't give them unsafe tools. Sandbox them within a particular directory (honestly this should be how things work for most of your projects, especially since we pull code from the Internet), even if a lot of tools give you nothing in this regard. Use version control for changes, with the ability to roll back. Also have ample tests and code checks with actionable information on failures. Maybe even adversarial AIs that critique one another if problematic things are done, like one sub-task for implementation and another for code-review.

Using AI tools has pushed me into that direction with some linter rules and prebuild scripts, to enforce more consistent code - since previously you'd have to tell coworkers not to do something (because ofc nobody would write/read some obtuse style guide) but AI can generate code 10x faster than people do, so having immediate feedback along the lines of "Vue component names must not differ from the file that you're importing from" or "There is a translation string X in the app code that doesn't show up in the translations file" or "Nesting depth inside of components shouldn't exceed X levels and length shouldn't exceed Y lines" or "Don't use Tailwind class names for colors, here's a branded list that you can use: X, Y, Z" in addition to a TypeScript linter setup with recommended rules and a bunch of stuff for back end code.

Ofc none of those fully eliminate all risks, but still seem like a sane thing to have, regardless if you use AI or not.

siruncledrew · 2025-12-03T17:02:42 1764781362

Generally speaking, with humans there's more guardrails & responsibility around letting someone run while in an organization.

Even if you have a very smart new hire, it would be irresponsible/reckless as a manager to just give them all the production keys after a once-over and say "here's some tasks I want done, I'll check back at the end of the day when I come back".

If something bad happened, no doubt upper management would blame the human(s) and lecture about risk.

AI is a wonderful tool, but that's why giving an AI coding tool the keys and terminal powers and telling it go do stuff while I grab lunch is kind of scary. Seems like living a few steps away from the edge of a fuck-up. So yeah... there needs to be enforceable guardrails and fail-safes outside of the context / agent.

solveit · 2025-12-03T17:11:20 1764781880

The bright side is that it should eventually be technically feasible to create much more powerful and effective guardrails around neural nets. At the end of the day, we have full access to the machine running the code, whereas we can't exactly go around sticking electrodes into everyone's brains, and even "just" constant monitoring is prohibitively expensive for most human work. The bad news is that we might be decades away from an understanding of how to create useful guardrails around AI, and AI is doing stuff now.

zqna · 2025-12-03T23:12:29 1764803549

Precisely, while LLMs fail at complexity, DSLs can represent thise divide-and-conquer intermediate levels to provide the most overall value and with good accuracy. LLMs should make it easier to build DSLs themselves and to validate their translating code. The onus then is on the intelligent agent to identify and design those DSLs. This would require the true and deep understanding of the domain and an ability to synthesize, abstract and to codify it. I predict this will be the future job of today's programmer, quite a bit more complicated than what is today, requiring wider range of qualities and skills, and pushing those specializing in coding-only to irrelevance.

blackoil · 2025-12-03T10:05:48 1764756348

Once AI improves its cost/error ratio enough the systems you are suggesting for humans will work here also. Maybe Claude/OpenAI will be pair programming and Gemini reviewing the code.

amelius · 2025-12-03T14:18:15 1764771495

> Once AI improves

That's exactly the problematic mentality. Putting everything in a black box and then saying "problem solved; oh it didn't work? well maybe in the future when we have more training data!"

We're suffering from black-box disease and it's an epidemic.

PunchyHamster · 2025-12-03T17:17:44 1764782264

The training data: Entirety of internet and every single book we could put our hands on "Surely we can just somehow give it more and it will be better!"

embedding-shape · 2025-12-03T12:04:55 1764763495

Also once people stop cargo-culting $trendy_dev_pattern it'll get less impactful.

Every time something new the same thing happen, people start exploring by putting it absolutely everywhere, no matter what makes sense. Add in huge amount of cash VCs don't know what to spend it on, and you end up with solutions galore but none of them solving any real problems.

Microservices is a good example of previous $trendy_dev_pattern that is now cooling down, and people are starting to at least ask the question "Do we need microservices here actually?" before design and implementation, something that has been lacking since it became a trendy thing. I'm sure the same will happen with LLMs eventually.

sarchertech · 2025-12-03T12:17:33 1764764253

For that to work the error rate would have to be very low. Potentially lower than is fundamentally possible with the architecture.

And you’d have to assume that the errors LLMs make are random and independent.

butlike · 2025-12-03T14:45:00 1764773100

As I get older I'm realizing a lot of things in this world don't get better. Some do, to be fair, but some don't.

IanCal · 2025-12-03T17:14:41 1764782081

Why does this conflict? Faster people doesn't negate the requirement for building systems that maintain safety in the face of errors.

> but it does seem fairly obvious to me that directly automating things with AI will probably always have substantial risk and you have much more assurance, if you involve AI in the process, using it to develop a traditional automation.

Sure but the point is you use it when you don't have the same simple flow. Fixed coding for clear issues, fall back afterwards.

observationist · 2025-12-03T17:30:30 1764783030

This will drive development of systems that error-correct at scale, and orchestration of agents that feed back into those systems at different levels of abstraction to compensate for those modes of failure.

An AI software company will have to have a hierarchy of different agents, some of them writing code, some of them doing QA, some of them doing coordination and management, others taking into account the marketing angles, and so on, and you can emulate the role of a wide variety of users and skill levels all the way through to CEO level considerations. It'd even be beneficial to strategize by emulating board members, the competitors, and take into account market data with a team of emulated quants, and so on.

Right now we use a handful of locally competent agents that augment the performance of single tasks, and we direct them within different frameworks, ranging from vibecoding to diligent, disciplined use of DSL specs and limiting the space of possible errors. Over the next decade, there will be agent frameworks for all sorts of roles, with supporting software and orchestration tools that allow you to use AI with confidence. It won't be one-shot prompts with 15% hallucination rates, but a suite of agents that validate and verify at every stage, following systematic problem solving and domain modeling rules based on the same processes and systems that humans use.

We've got decades worth of product development even if AI frontier model capabilities were to stall out at current levels. To all appearances, though, we're getting far more bang for our buck and progress is still accelerating, and the rate of improvement is still accelerating, so we may get AI so competent that the notion of these extensive agent frameworks for reliable AI companies will end up being as mismatched with market realities as those giant suitcase portable phones, or integrated car phones.

moffkalast · 2025-12-03T10:38:21 1764758301

Well I don't see why that's a problem when LLMs are designed to replace the human part, not the machine part. You still need the exact same guardrails that were developed for human behavior because they are trained on human behavior.

alansaber · 2025-12-03T08:12:19 1764749539

Yep the further we go from highly constrained applications the riskier it'll always be

nwhnwh · 2025-12-03T17:26:59 1764782819

I was wondering if the need more analysis. Because I receive this response a lot, people say yeah AI do things wrong sometimes, but humans do that too, so what? Or humans are mechanism for turning natural language into formal language and they get things wrong sometimes (as if you can't never write a program that is clear and does what it should be doing) so be easy on AI. Where does this come from? It feels as if it something psychological.

anal_reactor · 2025-12-03T08:27:33 1764750453

There's this huge wave of "don't anthropomorphize AI" but LLMs are much easier to understand when you think of them in terms of human psychology rather than a program. Again and again, HackerNews is shocked that AI displays human-like behavior, and then chooses not to see that.

bojan · 2025-12-03T09:46:38 1764755198

> LLMs are much easier to understand when you think of them in terms of human psychology

Are they? You can reasonably expect from a human that they will learn from their mistake, and be genuinely sorry about it which will motivate them to not repeat the same mistake in the future. You can't have the same expectation from an LLM.

The only thing you should expect from an LLM is that its output is non-deterministic. You can expect the same from a human, of course, but you can fire a human if they keep making (the same) mistake(s).

ben_w · 2025-12-03T12:54:07 1764766447

While the slowness of learning of all ML is absolutely something I recognise, what you describe here:

> You can reasonably expect from a human that they will learn from their mistake, and be genuinely sorry about it which will motivate them to not repeat the same mistake in the future.

Wildly varies depending on the human.

Me? I wish I could learn German from a handful of examples. My embarrassment at my mistakes isn't enough to make it click faster, and it's not simply a matter of motivation here: back when I was commuting 80 minutes each way each day, I would fill the commute with German (app) lessons and (double-speed) podcasts. As the Germans themselves will sometimes say: Deutsche Sprache, schwere Sprache.

There's been a few programmers I've worked with who were absolutely certain they knew better than me, when they provably didn't.

One, they insisted a start-up process in a mobile app couldn't be improved, I turned it from a 20 minute task to a 200ms task by the next day's standup, but they never at any point showed any interest in improving or learning. (Other problems they demonstrated included not knowing or caring how to use automated reference counting, why copy-pasting class files instead of subclassing cannot be excused by the presence of "private" that could just have been replaced with "public", and casually saying that he had been fired from his previous job and blaming this on personalities without any awareness that even if true he was still displaying personality conflicts with everyone around him).

Another, complaining about too many views on screen, wouldn't even let me speak, threatened to end the call when I tried to say anything, even though I had already demonstrated before the call that even several thousand (20k?) widgets on-screen at the same time would still run at 60fps and they were complaining about order-of 100 widgets.

danaris · 2025-12-03T13:51:30 1764769890

> Wildly varies depending on the human.

Sure. And the situation.

But the difference is, all humans are capable of it, whether or not they have the tools to exercise that capability in any given situation.

No LLM is capable of it*.

* Where "it" is "recognizing they made a mistake in real time and learning from it on their own", as distinct from "having their human handlers recognize they made 20k mistakes after the fact and running a new training cycle to try to reduce that number (while also introducing fun new kinds of mistakes)".

ben_w · 2025-12-03T14:18:08 1764771488

> But the difference is, all humans are capable of it, whether or not they have the tools to exercise that capability in any given situation.

When they don't have the tools to exercise that capability, it's a distinction without any practical impact.

> Where "it" is "recognizing they made a mistake in real time and learning from it on their own"

"Learn" I agree. But as an immediate output, weirdly not always: they can sometimes recognise they made a mistake and correct it.

danaris · 2025-12-03T14:22:14 1764771734

> When they don't have the tools to exercise that capability, it's a distinction without any practical impact.

It has huge practical impact.

If a human doesn't currently have the tools to exercise the capability, you can help them get those.

This is especially true when the tools in question are things like "enough time to actually think about their work, rather than being forced to rush through everything" or "enough mental energy in the day to be able to process and learn, because you're not being kept constantly on the edge of a breakdown." Or "the flexibility to screw up once in a while without getting fired." Now, a lot of managers refuse to give their subordinates those tools, but that doesn't mean that there's no practical impact. It means that they're bad managers and awful human beings.

An LLM will just always be nondeterministic. If you're the LLM "worker"'s "boss", there is nothing you can do to help it do better next time.

> they can sometimes recognise they made a mistake and correct it.

...And other times, they "recognize they made a mistake" when they actually had it right, and "correct it" to something wrong.

"Recognizing you made a mistake and correcting it" is a common enough pattern in human language—ie, the training corpus—that of course they're going to produce that pattern sometimes.

ben_w · 2025-12-03T14:48:37 1764773317

> you can help them get those.

A generic "you" might, I personally don't have that skill.

But then, I've never been a manager.

> An LLM will just always be nondeterministic.

This is not relevant, humans are also nondeterministic. At least practically speaking, theoretically doesn't matter so much as we can't duplicate our brains and test us 10 times on the same exact input without each previous input affecting the next one.

> If you're the LLM "worker"'s "boss", there is nothing you can do to help it do better next time.

Yes there is, this is what "prompt engineering" (even if "engineering" isn't the right word) is all about: https://en.wikipedia.org/wiki/Prompt_engineering

> "Recognizing you made a mistake and correcting it" is a common enough pattern in human language—ie, the training corpus—that of course they're going to produce that pattern sometimes.

Yes. This means that anthropomorphising them leads to a useful prediction.

For similar reasons, I use words like "please" and "thank you" with these things, even though I don't actually expect these models to have constructed anything resembling a real human emotional qualia within them — humans do better when praised, therefore I have reason to expect that any machine that has learned to copy human behaviour will likely also do better when praised.

danaris · 2025-12-03T18:34:05 1764786845

> This is not relevant, humans are also nondeterministic.

I mean, I suppose one can technically say that, but, as I was very clearly describing, humans both err in predictable ways, and can be taught not to err. Humans are not nondeterministic in anything like the same way LLMs are. LLMs will just always have some percentage chance of giving you confidently wrong answers. Because they do not actually "know" anything. They produce reasonable-sounding text.

> Yes there is

...And no matter how well you engineer your prompts, you cannot guarantee that the LLM's outputs will be any less confidently wrong. You can probably make some improvements. You can hope that your "prompt engineering" has some meaningful benefit. But not only is that nowhere near guaranteed, every time the models are updated, you run a very high risk that your "prompt engineering" tricks will completely stop working.

None of that is true with humans. Human fallibility is wildly different than LLM fallibility, is very-well-understood overall, and is highly and predictably mitigable.

PunchyHamster · 2025-12-03T17:19:02 1764782342

they can be also told they make a mistake and correct themselves making the same mistake again.

IanCal · 2025-12-03T17:05:16 1764781516

> Are they?

Yes, hugely. Just assume it's like a random person from some specific pool with certain instructions you've just called on the phone. The idea that you then call a fresh person if you call back is easy to understand.

Folcon · 2025-12-03T10:32:09 1764757929

I'm genuinely wondering if your parent comment is correct and the only reason we don't see the behaviour you describe, IE, learning and growth is because of how we do context windows, they're functionally equivalent to someone who has short term memory loss, think Drew Barrymore's character or one of the people in that facility she ends up in in the film 50 first dates.

Their internal state moves them to a place where they "really intend" to help or change their behaviour, a lot of what I see is really consistent with that, and then they just, forget.

knollimar · 2025-12-03T13:55:17 1764770117

I think it's a fundamental limitation of how context works. Inputting information as context is only ever context; the LLM isn't going to "learn" any meaningful lesson from it.

You can only put information in context; it struggles learning lessons/wisdom

ben_w · 2025-12-03T13:03:16 1764766996

Not only, but also. The L in ML is very slow. (By example count required, not wall-clock).

On in-use learning, they act like the failure mode of "we have outsourced to a consultant that gives us a completely different fresh graduate for every ticket, of course they didn't learn what the last one you talked to learned".

Within any given task, the AI have anthropomorphised themselves because they're copying humans' outputs. That the models model the outputs with only a best-guess as to the interior system that generates those outputs, is going to make it useful, but not perfect, to also anthropomorphise the models.

The question is, how "not perfect" exactly? Is it going to be like early Diffusion image generators with the psychological equivalent of obvious Cronenberg bodies? Or the current ones where you have to hunt for clues and miss it on a quick glance?

Libidinalecon · 2025-12-03T12:42:57 1764765777

No, the idea is just stupid.

I just don't understand how anyone who actually uses the models all the time can think this.

The current models themselves can even explain what a stupid idea this is.

mikkupikku · 2025-12-03T19:08:20 1764788900

Obviously they aren't actually people so there are many low hanging differences. But consider this: Using words like please and thank you get better results out of LLMs. This is completely counterintuitive if you treat LLMs like any other machine, because no other machine behaves like that. But it's very intuitive if you approach them with thinking informed by human psychology.

scotty79 · 2025-12-03T12:24:02 1764764642

> You can reasonably expect from a human that they will learn from their mistake, and be genuinely sorry about it which will motivate them to not repeat the same mistake in the future.

Have you talked to a human? Like, ever?

Xss3 · 2025-12-03T12:52:43 1764766363

Have you?

robot-wrangler · 2025-12-03T08:52:21 1764751941

One day you wake up, and find that you now need to negotiate with your toaster. Flatter it maybe. Lie to it about the urgency of your task to overcome some new emotional inertia that it has suddenly developed.

Only toast can save us now, you yell into the toaster, just to get on with your day. You complain about this odd new state of things to your coworkers and peers, who like yourself are in fact expert toaster-engineers. This is fine they say, this is good.

Toasters need not reliably make toast, they say with a chuckle, it's very old fashioned to think this way. Your new toaster is a good toaster, not some badly misbehaving mechanism. A good, fine, completely normal toaster. Pay it compliments, they say, ask it nicely. Just explain in simple terms why you deserve to have toast, and if from time to time you still don't get any, then where's the harm in this? It's really much better than it was before

easyThrowaway · 2025-12-03T10:35:38 1764758138

It reminds me of the start of Ubik[1], where one of the protagonists has to argue with their subscription-based apartment door. Given also the theme of AI allucinations, that book has become even more prescient than when it was written.

[1]https://en.wikipedia.org/wiki/Ubik

axpvms · 2025-12-03T17:24:23 1764782663

Does anyone want any toast? https://www.youtube.com/watch?v=LRq_SAuQDec

anal_reactor · 2025-12-03T09:07:28 1764752848

This comparison is extremely silly. LLMs solve reliably entire classes of problems that are impossible to solve otherwise. For example, show me Russian <-> Japanese translation software that doesn't use AI and comes anywhere close to the performance and reliability of LLMs. "Please close the castle when leaving the office". "I got my wisdom carrot extracted". "He's pregnant." This was the level of machine translation from English before AI, from Japanese it was usually pure garbage.

robot-wrangler · 2025-12-03T09:23:18 1764753798

> LLMs solve reliably entire classes of problems that are impossible to solve otherwise.

Is it really ok to have to negotiate with a toaster if it additionally works as a piano and a phone? I think not. The first step is admitting there is obviously a problem, afterwards you can think of ways to adapt.

FTR, I'm very much in favor of AI, but my enthusiasm especially for LLMs isn't unconditional. If this kind of madness is really the price of working with it in the current form, then we probably need to consider pivoting towards smaller purpose-built LMs and abandoning the "do everything" approach.

actionfromafar · 2025-12-03T12:59:11 1764766751

We are there in the small already. My old TV had a receiver and a pair of external speakers connected to it. I could decrease and increase the receiver volume with its extra remote. Two buttons, up and down. This was with an additional remote that came with the receiver.

Nowadays, a more capable 5.1 speaker receiver is connected to the TV.

There is only one remote, for both. To increase or decreae the volume after starting the TV now, I have to:

1. wait a few seconds while the internal speakers in the TV starts playing sound

2. the receiver and TV connect to each other, audio switches over to receiver

3. wait a few seconds

4. the TV channel (or Netflix or whatever) switches over to the receiver welcome screen. Audio stops playing, but audio is now switched over to the receiver, but there is no indication of what volume the receiver is set to. It's set to whatever it was last time it was used. It could be level 0, it could be level 100 or anything in between.

5. switch back to TV channel or Netflix. That's at a minimum 3 presses on the remote. (MENU, DOWN, ENTER) or (MENU, DOWN, LEFT, LEFT, ENTER) for instance. Don't press too fast, you have to wait ever so slightly between presses or they won't register.

6. Sorry, you were too impatient and fast when you switched back to TV, the receiver wants to show you its welcome screen again.

7. switch back to TV channel or Netflix. That's at a minimum 3 presses on the remote. (MENU, DOWN, ENTER) or (MENU, DOWN, LEFT, LEFT, ENTER) for instance. Don't press too fast, you have to wait ever so slightly between presses or they won't register.

8. Now you can change volume up and down. Very, very slowly. Hope it's not at night and you don't want to wake anyone up.

robot-wrangler · 2025-12-03T14:18:01 1764771481

Yep, it's a decent analogy: Giving up actual (user) control for the sake of having 1 controller. There's a type of person that finds it convenient. And another type that finds it a sloppy piss-poor interface that isn't showing off any decent engineering or design. At some point, many technologists started to fall into the first category? It's one thing to tolerate a bad situation due to lack of alternatives, but very different to slip into thinking that it must be the pinnacle of engineering excellence.

Around now some wit usually asks if the luddites also want to build circuits from scratch or allocate memory manually? Whatever, you can use a garbage collector! Point is that good technologists will typically give up control tactically, not as a pure reflex, and usually to predictable subsystems that are reliable, are well-understood, have clear boundaries and tolerances.

marcosdumay · 2025-12-03T16:26:47 1764779207

> predictable subsystems that are reliable, are well-understood, have clear boundaries and tolerances

I'd add with reliability, boundaries, and tolerances within the necessary values.

The problem with the TV remote is that nobody has given a damn about ergonomic needs for decades. The system is reliable, well understood, and has well known boundaries and tolerances; those are just completely outside of the requirements of the problem domain.

But I guess that's a completely off-topic tangent. LLMs fail much earlier.

automatic6131 · 2025-12-03T10:36:17 1764758177

>LLMs solve reliably entire classes of problems that are impossible to solve otherwise

Great! Agreed! So we're going to restrict LLMs to those classes of problems, right? And not invest trillions of dollars into the infrastructure, because these fields are only billion dollar problems. Right? Right!?

krapp · 2025-12-03T10:39:11 1764758351

https://www.youtube.com/watch?v=_n5E7feJHw0

anal_reactor · 2025-12-03T10:41:48 1764758508

Remember: a phone is a phone, you're not supposed to browse the internet on it.

sirtaj · 2025-12-03T11:35:09 1764761709

Not if 1% of the time it turns into a pair of scissors.

filoeleven · 2025-12-03T16:38:58 1764779938

> LLMs solve reliably entire classes of problems that are impossible to solve otherwise. For example, [...] Russian <-> Japanese translation

Great! Name another?

otikik · 2025-12-03T09:49:21 1764755361

I admit Grok is capable of praising Elon Musk way more than any human intelligence could.

fragmede · 2025-12-03T12:01:41 1764763301

BUTTER ROBOT: What is my purpose?

RICK: You pass butter.

BUTTER ROBOT: ... Oh my God.

RICK: Yeah, welcome to the club, pal.

https://youtube.com/watch?v=X7HmltUWXgs

IanCal · 2025-12-03T17:07:50 1764781670

Not surprising to see this so downvoted but it's very true, it's a great first order approximation and yet users here will be continually surprised they act like people.