More

iandanforth · 2026-06-07T13:11:13 1780837873

Wut? I pilot LLMs all day but there's no way in hell I'd agree to be at the helm of a finance product. That first pillar is still there. Maybe the author isn't aware of the impact they have, but I know, with the evidence of reverted PRs, that when I step outside my area of deep knowledge I can no longer call BS on the agents. Our most capable agent, with access to the same kind of distributed systems the author talks about, is regularly wrong, frequently myopic, and just outright dumb constantly. It's the expertise of engineers on the team that push it back on track.

t34t34r43 · 2026-06-07T13:46:14 1780839974

Posting this under a burner so I don't dox myself: I work in FinTech on a regulated product. We have access to Mythos. Mythos identified part of our codebase that it confidently asserted was not complaint with a particular regulation and we were at grave risk by allowing it to operate the way it was.

Except this was not the case, it had of course hallucinated what the regulation actually required (I know this because the code in question had already been reviewed by human counsel). This is (supposedly) the most bleeding-edge model available.

We use a lot of genAI to help us write code, but there is no way in the mid-term we could ever rely on these tools to actually build compliant financial products. We'd have to be totally mad. Yes, lots of Fintech companies are using these agents to accelerate, but anyone who's using them to actually ship product without a human actually digging into it is opening themselves up to a world of risk.

PeterStuer · 2026-06-07T16:21:05 1780849265

I have worked on highly regulated areas in finance (risk). Compliance is a highly creative art, often requiring lots of out-of-the-box thinking and non-obvious solutions. The people I found worst at this were IT. They tend to over-interpret regulation, and super-restrict beyond what is needed for actual de-facto compliance.

My guess is the model makes the same mistakes as the programmers: taking 'rules' literally, unaware of sectoral joint understanding, validated interpretations and habits. (btw. this is often on the non-tech side also a difference between regulatory and legal. The former are much more result oriented while the latter are primarily risk averse.

davedx · 2026-06-07T18:34:19 1780857259

Ha. I've worked in a fairly strongly regulated sector (energy, in the Netherlands), where I collaborated closely with our head of compliance, and she heavily over-interpreted the regulations while I often tried to find more pragmatic solutions.

I think adherence to regulation and compliance is nothing to do with whether you're a SWE, a risk officer, or C-level, and everything to do with your own principles, ethics, professional attitude, and pragmatism.

jval43 · 2026-06-08T06:26:32 1780899992

I've found two things to matter:

1. experience, i.e. knowing why and how a rule matters (in general, but also to auditors)

2. willingness to think

If these aren't present, you get overly restrictive compliance that at the same time accomplishes nothing.

treszkai · 2026-06-08T09:49:26 1780912166

I'd also add a willingness to follow the (perceived) intent of the rules as opposed to gaming the letter of the rule. An experienced folk might say, "yeah it won't matter legally", and continue with "but it will matter to me because this rule is in place for this-and-that reason".

notarobot123 · 2026-06-08T08:15:29 1780906529

scruples are also often a surface of friction that get in the way of business objectives.

Markstar · 2026-06-08T12:29:27 1780921767

It's not even limited to SWE - there are so many regulations in almost every trade (which sometimes contradict each other) that it often eventually boils down to your experience and your decision which rule to follow and which to 'bend' (for example in "traditional" architecture there are so many rules that the building authority itself has to compromise on following the law by the book).

thewebguyd · 2026-06-07T16:51:53 1780851113

> IT. They tend to over-interpret regulation, and super-restrict beyond what is needed for actual de-facto compliance.

IME this is less the fault of IT and more so bad auditors that won't consider, or just don't understand, what compensating controls are. If it doesn't meet their little checklist exactly, they fail the audit.

antonvs · 2026-06-07T18:22:42 1780856562

> IT. They tend to over-interpret regulation, and super-restrict beyond what is needed for actual de-facto compliance.

This is such a nonsensical claim. If a company is asking someone from IT to read the regulations and implement them, then obviously you’re going to get something that conforms to the written specification they were provided.

But a company that does that is basically delegating both compliance and legal functions to IT. No sane company does that.

VonGallifrey · 2026-06-08T01:31:34 1780882294

> But a company that does that is basically delegating both compliance and legal functions to IT. No sane company does that.

I was a Software Dev in a small (but fully regulated and licensed) stock exchange. We used to have guidance from legal experts, market experts, and traders, but in the last project I worked on, they just dumped 300 pages of laws and regulations on my desk and asked me what needed to be done. Why? Because the experts we used to have were either fired or left. Along with any product managers. I guess company leadership thought they were no longer needed.

Insane is right. I told them that this is not how it is supposed to work. I can't tell them what needs to be done. I am not a legal expert who can just interpret these regulations.

I was forced out of the company after that, but honestly, no one would want to work in such an environment anyway.

thewebguyd · 2026-06-08T01:45:59 1780883159

> But a company that does that is basically delegating both compliance and legal functions to IT

This actually happens scarily often, especially in smaller companies. No F500 is doing this, but there are tons of "mid market" sized non-tech companies (think 80 to 150 employees in size) that basically rely on the IT department of 1 or 2 people, or an MSSP for pretty much everything. No legal team, maybe an attorney they consult with once or twice a year if you're lucky.

mlinhares · 2026-06-08T02:10:37 1780884637

That's actually pretty important, if you're an eng doing compliance work and you don't have legal counsel working side by wide with you you might be putting yourself up for legal troubles down the line. I'm glad I can always rely on legal do do their job here when doing this kind of work, I wouldn't want to do work like this just out of my ass.

DaedalusII · 2026-06-08T00:11:10 1780877470

perfect IT response

regulation are written ambiguously and the specifications do not match the industry

I have even seen regulators refuse to specific legislated laws because "thats not what the government meant", giving a company the choice of following the law and being fined, or breaking the law to please the regulatory agency

dumah · 2026-06-08T13:00:26 1780923626

Regulators don’t want to provide arbitrarily detail into their interpretation or likely judgements on issues that may come up for many reasons, good and bad.

One good one is that providing concrete razors for compliant and non-compliant behavior accelerates the gaming of the rules.

fragmede · 2026-06-07T21:44:09 1780868649

> compensating controls

How to say you deal with PCI compliance without saying you steal with PCi compliance.

mgh95 · 2026-06-07T23:35:28 1780875328

Compensating controls also come up in the context of BSA/AML.

ValtteriL · 2026-06-08T08:39:44 1780907984

Also in IEC 62443

mgh95 · 2026-06-08T08:54:59 1780908899

Also in OFAC compliance. It just comes up in a lot of places where workflows are compliance heavy.

hparadiz · 2026-06-07T17:07:29 1780852049

It's cause IT never has to live with the consequences of their decisions. Who cares if the other department keeps bleeding talent because you twisted the knobs so hard no one wants to work in your system?

JimBlackwood · 2026-06-07T17:33:42 1780853622

Sounds like communication between departments sucks. If IT develops for them, you’d expect there to be a feedback loop?

hparadiz · 2026-06-07T17:38:03 1780853883

Yes. Exactly. This is not a reflection of where I am now in any way shape or form. Just my observation of previous places I've worked.

lmz · 2026-06-08T03:42:46 1780890166

Why would they care? They take the blame when it gets hacked, but don't really get any upside for bending the rules to make people work easier. CYA rule-following is just to be expected.

raducu · 2026-06-07T19:17:43 1780859863

> he people I found worst at this were IT.

My experience as IT in modern banks was the opposite. The legal department were absolute assholes when it came to software features. And I'm talking absolutely 100% ok features, like paying your bills from the banking application.

The least fun, trigger happy, cover their buts people I've ever seen.

Like all they could ever say was NO. I guess they were heavily incentivized to just say NO to everything.

arethuza · 2026-06-08T08:08:04 1780906084

I remember working with a corporate customer whose in house architecture team was so difficult to work with that I joked that they could be replaced by a rule in their email system that simply replied "No" as the message if I ever emailed them.

ziml77 · 2026-06-08T00:11:58 1780877518

That was my experience too. Legal wanted things locked down hard. IT was more than happy to make things easier for users as long as legal wasn't getting in the way. Usually meant the systems were simpler too if we had fewer rules and controls to enforce.

fireflash38 · 2026-06-07T22:33:11 1780871591

Auditing is assumption of blame or responsibility to another party.

They are incentivized to strike the best balance of certifying (who wants to go to a place that never certifies) and validity (rubber stamp mills reflect the blame).

Yes, it is meant to be adversarial, to a point.

CSSer · 2026-06-07T19:35:37 1780860937

I don’t think you’re going to find a consensus on this because it really just comes down to the quality of the employees in each discipline. Actions speak louder than words. I’ve seen the IT people GP is describing. I’ve also seen yours. In GP’s scenario, they often even mean well but are very overwhelmed because they’ve come to exist in a space where _everything is IT_ because no one else is remotely qualified to fill the specialty gap. When I found myself in your scenario, the opposite was true and it completely matched what you described.

protocolture · 2026-06-08T02:05:50 1780884350

I once worked on a data compliance job, and the auditor would fail everything he possibly could. He was there for data destruction compliance, but like many such people, he came from an engineering background. He would complain about everything. Gaps between pallets. OHS. Whatever he could to justify his decisions. He never found a bit on a disk out of place and he still made our lives hell. Failure for the floor didnt feel solid enough. Failure because he didnt feel comfortable in a warehouse environment. And when the management had had enough and decided to refuse him entry and ask for someone else, we had to hold ourselves to an even higher standard to compensate.

Later I worked in a role, attempting to achieve PCI compliance. The Auditor was a really nice guy, but there was always a short list of 10 things that he wasn't quite happy with. We kept increasing the scope of compliance to keep up with him. Everyone talked about him (Semi famous local celebrity security consultant/researcher/lecturer) and claimed that if we just stuck it out we would be super duper compliant and basically unassailable. Except that it never ended. Went 12 months with the guy. Then they just stopped paying his bills and brought in another auditing firm. Compliant immediately. You never know in a situation like that whether we were actually compliant or if there was graft. But we got there. Knowing that organisation I lean towards graft. They then failed their first audit after achieving compliance.

I have done a few PCI compliance operations since. And what I have found that you cant control for the auditor, so what good IT management does, is make every single requirement completely unassailable. If you cant write a very obvious compensating control in 5 sentences, then you just move heaven and earth to comply with the letter of the requirement (even if the project to become compliant, is itself a compensating control for a while). If you get an over achieving auditor, you wont spend 200 billable hours arguing about compensating controls. If you have a shit auditor, you know you are compliant even if they aren't being as thorough as they could possibly be. Its the only ethical way to navigate the situation.

jayd16 · 2026-06-07T16:34:01 1780850041

Who gets in trouble if it turns out you are actually held to the literal rule?

PeterStuer · 2026-06-07T16:47:10 1780850830

Contrary to what you indicate rules are not declared in a vacuum, for people to read and then algorithmically 'implement'. There are many ways to interpret regulation, and there will be both accompanying clarifications, as well as compliance departments negotiating with regulators on what is an acceptable and sufficient compliance action. Then there furthermore is a risk that will be calculated vs the cost and opportunity costs etc.

As an enterprise architect, these are all part of the meetings you have with compliance when you are working on major projects. I have had the privilege of working with some excellent compliance officers, and they are the opposite of the nay-saying caricature that is often painted of them. I found these people to be extremely creative and helpful, working together towards solutions rather than stalling or nixing viable progress.

logicalmind · 2026-06-07T17:43:11 1780854191

I also work in finance and my recent experience with regulators is really discouraging. DOGE wiped out a large amount of the regulators in government. It seems like most of the regulators remaining are the inexperienced and low tenure. Within the past few months we've attempted to roll out new financial products. When we attempt to send our proposal to them, they can't even tell us who we're supposed to send it to.

It doesn't feel like we're living in the same world of regulation that existed prior to DOGE.

throwaway2037 · 2026-06-08T10:42:59 1780915379

    > I also work in finance... DOGE wiped out a large amount of the regulators in government.

I found an insanely detailed Wiki page about all of the gov't divisions affected by DOGE: https://en.wikipedia.org/wiki/US_federal_agencies_targeted_b...

However, I don't see anything about finance there. I'm confused by your comment. Can you provide more specifics?

logicalmind · 2026-06-09T23:07:21 1781046441

Not sure why it's not covered in that wiki article, but I'm specifically referencing the FDIC and related agencies. They have been so decimated by DOGE that it is possible that multiple of those related agencies will be consolidated into one (FDIC, NCUA, OCC, etc.).

I can't go into too much detail, but for a financial institution to offer certain financial products, you have to submit a proposal to one of the above regulatory bodies to get their approval. We were attempting to do just that and we couldn't even find the proper person at the given agency who should be receiving said proposal. It was even rumored that regulatory agency who would normally review such proposals didn't have the staff to review them. And the review would be done by an entirely different group of regulators who have not done such things historically.

Additionally, these agencies do regular exams of financial institutions to ensure they are complying with regulations and handling fraud properly. These cuts have led to those exams either not happening, or happening at a fraction of the depth they had been previously.

jimbokun · 2026-06-08T00:30:27 1780878627

So the DOGE geniuses failed to remove the regulations they allegedly thought were hampering legitimate businesses, while removing the people capable of verifying whether or not your business is in compliance.

What a win!

jrflowers · 2026-06-08T09:55:06 1780912506

They wiped out anybody that was hampering their businesses. Leaving the rest as an impossible barrier to entry for everybody else is a feature, not a bug.

jayd16 · 2026-06-07T16:55:13 1780851313

The point was about who is on the hook and why they might be less permissive.

I'm not implying anything else. I used your own "literal" wording to refer to the "more strict than yours" interpretation.

I suppose I should have used scare quotes around "literal".

PeterStuer · 2026-06-07T17:01:58 1780851718

'The company' would be on the hook. Inside, it might be the compliance team that signed off on the solution, but it usually is not the sort of blame game at that point. I'm not saying these scapegoat trails do not exist, but they are far less common than you would imagine if you only read about them in the press.

Company politics, feudal wars, fiefdom protections, backstabbing and outright sabotaging, now there's a daily occurrence and many minions are cannon fodder in those skirmishes, but they usually stay clear of regulatory issues minefields.

rectang · 2026-06-07T17:08:05 1780852085

I am skeptical that developers who implement a non-compliant solution that gets a company in trouble get off scot-free.

If the company you work for actually had such a no-fault culture, I doubt you'd be criticizing programmers so aggressively for being sticklers, but would instead be trying to understand and account for the systemic factors (including human factors) behind their behavior.

fauigerzigerk · 2026-06-07T18:12:10 1780855930

>I am skeptical that developers who implement a non-compliant solution that gets a company in trouble get off scot-free.

I don't see why developers should be in trouble. Developers don't make unilateral decisions on non-trivial compliance matters. A finding of non-compliance at a financial institution would typically be the result of an investigation, a disagreement with the regulator or a court ruling. It would come years after the organisation as a whole decided to adopt the interpretation in question.

rectang · 2026-06-07T19:05:49 1780859149

But here we're talking about developers being asked to implement decisions which they don't understand to be compliant.

Engineers are not shielded by their implementer role if they participate in illegal activity. James Robert Liang was a rank-and-file engineer for Volkswagen and he got jailed for his role the VW emissions scandal[1].

No matter how much an enterprise architect or compliance officer promises "it'll be fine" to the developer, the developer needs documented CYA. An enlightened organization would perhaps find ways to expedite that CYA documentation rather than demonizing programmers as a class.

[1] https://apnews.com/general-news-988ea2ae45694b37b320e68cefe3...

ThrowawayR2 · 2026-06-07T20:56:00 1780865760

> "...don't understand to be compliant."

Liang got prison time because he _did understand_ that the engine wasn't compliant with regulations and chose to build the system to falsify the emissions output during tests anyway. He was not a scapegoat.

"On 9 September 2016, James Robert Liang, a Volkswagen engineer working at Volkswagen's testing facility in Oxnard, California, admitted as part of a plea deal with the US Department of Justice that the defeat device had been purposely installed in US vehicles with the knowledge of his engineering team: 'Liang admitted that beginning in about 2006, he and his co-conspirators started to design a new "EA 189" diesel engine for sale in the United States. ... When he and his co-conspirators realized that they could not design a diesel engine that would meet the stricter US emissions standards, they designed and implemented [the defeat device] software.'" from https://en.wikipedia.org/wiki/Volkswagen_emissions_scandal

rectang · 2026-06-07T22:01:13 1780869673

Yes, and that demonstrates that developers are not immune. And so, developers who suspect they're being asked to do something illegal (but aren't sure) are going to act as sticklers who irritate enterprise architects until you take concrete action to reassure them.

Complain about them, denigrate them, upbraid them for performing analysis outside their primary expertise, fire and replace them.... none of that changes the incentive structure that shunts people in the implementation role towards conservatism out of a perceived need for self-preservation.

md224 · 2026-06-07T23:04:28 1780873468

I think you misread the person you replied to.

"decisions which they don't understand to be compliant" = "decisions which they don't believe to be compliant"

In other words, they understand that the decisions are not compliant. There's no contradiction with what you said.

fauigerzigerk · 2026-06-08T07:55:25 1780905325

You're talking about two very different situations but your wording doesn't make that clear:

a) Engineers don't know and cannot be expected to know whether what they are being asked to implement complies with all regulations. This is completely normal.

b) Engineers know or can be expected to know based on their expertise that they are being asked to cheat. That's when they are on the hook.

VW was a case of (b). It was clear-cut criminal behaviour on a very technical level. But that's not what typically happens in financial services and many other domains.

But if your point is merely that engineers are not automatically in the clear just because someone higher up told them what to do then I agree with you.

kanbankaren · 2026-06-07T17:39:19 1780853959

> There are many ways to interpret regulation,

Then the rules should enumerate all the ways. From your posts, you come across as if programmers don't know what they are doing which is insulting to those who work in mission critical industries like aviation where a programmer could be criminally charged if he/she didn't implement the specs STRICTLY.

habinero · 2026-06-08T09:09:37 1780909777

> Then the rules should enumerate all the ways

It's nice to want things, but rules are much squishier in real life. There's rarely any truly bright line.

patrulek · 2026-06-09T07:42:25 1780990945

It isnt programmers fault though.

PeterStuer · 2026-06-07T17:47:11 1780854431

"you come across as if programmers don't know what they are doing"

Is neither what I said nor believe.

scott_w · 2026-06-07T16:40:38 1780850438

That's why you work with your Legal/Compliance Team to make sure you stay in line. They can explain when a rule applies and when it doesn't. This needs the engineering side to be able to explain what's happening, and translate it into the business process as closely as possible, and the legal side to be able to apply the law to the case.

tsunamifury · 2026-06-07T16:35:46 1780850146

If you think rules are literal than you aren’t aware how the world works.

There’s a reason it’s called “judgement”

rectang · 2026-06-07T16:49:48 1780850988

In your world, do subordinates ever get scapegoated for bending the rules at a boss's behest?

jayd16 · 2026-06-07T16:38:50 1780850330

...And that judgement could take them literally. So what is your point?

My point was simply that it's easy to scoff at someone else being careful if it's their neck and not yours.

parineum · 2026-06-07T16:54:12 1780851252

They could but they don't. That's pretty much the whole job. You can also appeal decisions to a more reasonable party if you draw RobotJudge3000 for your trial

patrulek · 2026-06-09T07:21:17 1780989677

> mistakes as the programmers: taking 'rules' literally

Isnt it how we make stable, deterministic and predictable system? How do you want to create one with ambiguous rules?

wanderlust123 · 2026-06-08T16:47:17 1780937237

It’s not a highly creative art at all, genuinely don’t get this weird compliance glazing here.

ericmcer · 2026-06-07T15:58:57 1780847937

The dynamic of agent codes human reviews does seem like the only sane one for the foreseeable future. Even Anthropic themselves still fall back to this.

The problem is that sucks, even if all software engineers keep their jobs and salaries, the floor is still pulled out from under us. Imagine if a surgeons job was to supervise robot surgeons from a remote computer, or a woodworker just signs off on work before the machines do all the cutting and assembly. Sure they still have important jobs in their field but the soul & humanity of their skill is gone.

hax0ron3 · 2026-06-07T16:32:00 1780849920

I never found there to be much soul and humanity in the job to begin with. Coding personal projects has soul, but for me at least the demands of high-velocity sprint-based software development to match business needs removed most of the soul and humanity long before AI got good at coding. And I mean, I totally understand why it has to be like that. In most businesses, you do better by shipping decent software fast than by shipping great software slowly. I don't have a problem with that in principle. But it does mean that for me, the software development side of things has never had much soul and humanity to begin with. It was just being a glorified assembly line worker, with the sprints being the assembly line. Of course, others may have had very different experiences, but that has been mine.

For me, AIs have actually made the job more soulful, not less. For one thing, it lets me use the part of my mind that is good at human language, not just the part of my mind that is good at software. This makes the job feel a bit less one-dimensional in terms of what parts of me are engaged while doing it. For another, I find it liberating to no longer have to think much about boilerplate code or to spend time roaming around the Internet looking up documentation of various language syntax and API details, the vast majority of which are arbitrary rather than being based on any kind of mathematical beauty. For me it makes the job more soulful that I can think of the job on a higher level instead of having to spend effort on arbitrary and tedious details.

Of course there is still the question of "will the job even exist in a few years, at least for more than a relatively small number of people?". But that's a separate question. For now at least, I am finding that for me AIs have brought a lot more soul and humanity to the job than it ever had before.

abalashov · 2026-06-07T17:44:23 1780854263

That's an interesting perspective. It's hard for me to relate to it because I haven't worked in a job where I just have to ship code 'for work' in so long. Being a more or less one-man software company, all my work projects, but especially our products, feel like personal projects.

However, if I were just having to do things for the man, I might have a rather different take on all this.

hax0ron3 · 2026-06-07T17:59:41 1780855181

Yup, I can definitely imagine that it's different if you're working directly for customers and have the freedom to do things however you want to do them as long as you still make a living.

abalashov · 2026-06-07T20:59:21 1780865961

The flip side is that if you have that creative control, then LLMs have _definitely_ sucked the joy out of programming, and in the worst way.

auggierose · 2026-06-08T00:23:05 1780878185

I don't think that is true. If you have the creative control, and LLMs suck the joy out of programming, then you and I have very different ideas about what that joy was in the first place. I enjoy programming both on a very high and a very low level, and both are more fun with LLMs. On the low-level, you can use that to create the building blocks that the LLM then just has to combine. And on the high-level, you can use that to steer the design in a way the LLM would never be able to, but with the help of the LLM you can connect that high-level design much faster to the low-level building blocks.

Chu4eeno · 2026-06-08T01:30:40 1780882240

Maybe try using them differently (I tend to use them like static analyzers I can yell at/argue with, and honestly less straining than trying to parse a Coverity report), or just avoid them. Mental health is more important than 20% gain or loss (depending on which study supports your prejudices) in productivity.

oblio · 2026-06-07T19:15:37 1780859737

You're probably 1 in 10 000 programmers. Most programmers are just regular employees, the vast majority in non tech companies.

abalashov · 2026-06-07T20:58:54 1780865934

Yeah. It's easy to forget that sometimes.

odeono · 2026-06-07T16:08:04 1780848484

"Soul and humanity" is doing a lot of work here.

Does the woodworker who shape using a handsaw use less "soul" than the one who uses a machine?

Does the musician who use a DAW and VSTs instead of analogue tape recorders create music with less "soul"?

Does the painter who buys acryllic paint instead of synthesizing their own dye from plants use less "soul"?

As technological innovation progresses, the barrier to creation falls. The process of creating something is not to be conflated with the final piece of art itself.

runarberg · 2026-06-07T17:24:20 1780853060

Your analogies are flawed. DAWs and skill saws generate nothing. They take skill to operate, and a novice cannot use these tools at all unless they know the craft.

Compare to this to prompting an LLM: “Generate a third person where game with a view from above where you can steal cars, shoot at people, run from the police, etc.” Anybody with access to the tool can do this, and the results are just another uninspiring GTA clone that you would imagine.

The latter is more like a carpenter ordering their “work” from alibaba then it is like using a skill saw.

customguy · 2026-06-08T00:04:25 1780877065

> Does the musician who use a DAW and VSTs instead of analogue tape recorders create music with less "soul"?

https://www.youtube.com/watch?v=gkqNWNLKpZg

I'm critical of many "AI" developments but I can't and don't want to argue with this. I say we still need to struggle for humanity and we do need to save our souls, but that "it's a machine" is not where the battlefield is.

computerdork · 2026-06-08T07:12:14 1780902734

Hmm, but AI isn't just a really-good tool, it's also doing creative work too. As an aspiring composer, music generators are in my opinion really quite good, and often matches what composers and song writers can do. So if you ask me does a person who creates music with gen AI miss out on the soul of creating music? In my opinion at least, for the most part, yes.

To write a piece of music, you're working at so many different levels, the analytical, emotional, and structural as well as drawing on years of training and experience. When a person with little (or even just a moderate amount of) music training generates a piece of music in a couple hours, are they actually a composer? I personally would say no. I mean it does take a good ear (which is important) to use a music generator well, but still, would say they are more of an editor or an evaluator or a practical critic of music instead of a composer.

Yeah, at what point in a discipline does the increasing skill of AI overtake the need for our contributions from the working being done? In music, it's happened already. Looks like it's happening in coding too.

hatsix · 2026-06-07T16:46:28 1780850788

Does the carpenter who used to build custom fit cabinets with hand and power tools put in the same creativity when he just carries around a scanner, scans the area, the customers use software to select the layout, approve the work, then the CNC cuts out the wood, then all that's left is to put the screws in the holes and go home.

This isn't like the step from hand saws to power saws, and it's disingenuous to pretend like it is. This is what the startup machine has been doing to every industry... finding... "inefficiencies" and "optimizing" them.

TurdF3rguson · 2026-06-07T23:35:10 1780875310

Yes, hand made goods come with a story and a memory of the artesan who created them and those things have value.

It's not the item's soul that's at stake when you stop recognizing that, it's your own.

jadbox · 2026-06-07T16:19:20 1780849160

Not _my_ opinion, but I just wanted to share that many people (in the Midwest) do believe that anything synthetic that it not readily made from simple materials has "less soul". It's a sorta test of "if I dropped you off in the jungle, can you still produce works of soul? Or are you just another cog in the machine.".

ImprobableTruth · 2026-06-07T16:25:54 1780849554

Except it's not just a tool.

It's when a woodworker, musician or painter completely outsources their work and just marks what's wrong, sending those parts back. Yes, the final art piece might be the same, but the artist definitely uses less of their "soul".

afro88 · 2026-06-07T22:51:42 1780872702

> The dynamic of agent codes human reviews does seem like the only sane one for the foreseeable future. Even Anthropic themselves still fall back to this.

Do they? I saw some crazy stat from the guy who built claude code that he was pushing hundreds of PRs a day. There's no way you can human review that much code. It's probably closer to heavily AI assisted review and planning.

davedx · 2026-06-07T18:39:22 1780857562

I don't know if I agree that's the only sane workflow; the problem is, I am way less invested doing code reviews of agents than I am reviewing code by human colleagues.

I would love to be able to say I pay the same amount of attention and am just as diligent and communicate as clearly with an agent, but it wouldn't be honest: I scan agent PRs for obvious mistakes or misinterpretation of what they've implemented.

With human colleagues I usually know them and their style, their way of working, so have a better idea what to look for. You also have a genuine return on providing feedback that helps coworkers learn and improve, whereas with agents, all the feedback you write is gone when the thing gets merged (unless your org has some kind of shared memory for its agents).

I don't have the answer for what the future looks like, but I suspect agent-type-1 reviews agent-type-2 is actually where we'll end up.

lubujackson · 2026-06-07T16:52:56 1780851176

I think there is a big difference between a surgeon, who is performing a specific task with a clear outcome, to a woodworker, who might produce a unique piece of art or a functional chair. I think the surgeon-type tasks will be replaced eventually. More interesting are the woodworker types, which has some similarities to SWEs.

When industrialization hit, we definitely lost a ton of craftsmanship and craftsman, but a standard Ikea chair is less likely to wobble than the average chair at a much better price (for a random example). Yes, we traded artistry for convenience, but what we really did was bifurcate our needs between "some place stable to sit" from "a beautiful chair for my home". Most people wanted the former more than the latter, and the same applies to software.

If we split the roles into buckets, many woodworkers disappeared, some became artisans, some became designers for industrially-produced products, and some catered to Luddites for a long transitional period. Despite Anthropic's claims, SWEs won't disappear in a year but over a generation or two, no matter how good LLMs become.

Obviously software is much more complicated and integrated into other elements of business, which in a way makes it more vulnerable to AI taking over and in another way will be at the mercy of larger shifts to how businesses organize human roles and responsibilities. What we call "taste" comes down to "intent" - what the hell does a company do? What should it be doing and how should it operate? These will be the only questions that matter and the one thing LLMs can't replace since they will always choose the most default path. So I think human's roles will be to inject intent/taste at different levels of abstraction throughout an organization.

Melatonic · 2026-06-07T18:54:22 1780858462

Im not sure your assumption holds at all. If anything the ikea chair could argued to be a very efficient use of resources producing a minimally useful chair. But why is it less likely to wobble ?

In addition the incentives are misaligned - the "artisan" made chair (in the past) wasn't likely made for aesthetic reasons - it was made to last long term and function. And if it wobbled or had any problems the original woodworker was probably around to fix it.

kuschku · 2026-06-08T06:07:50 1780898870

And yet, I ended up throwing the IKEA coffee table out, and instead making one myself.

I had no prior experience with woodworking, and while it's a little bit wobbly, it's much more robust and it's the exact shape and size I need. It cost the same and it'll last much longer.

And there's emotions and a story attached.

adrianN · 2026-06-07T16:03:22 1780848202

After a couple of years of this their expertise will be gone too and then nobody is qualified to supervise the clankers.

ozgrakkurt · 2026-06-08T07:19:52 1780903192

Only sane dynamic is human writes code, human reviews code, ai also reviews code.

I don't believe there is any point in having ai generate code

sumedh · 2026-06-08T08:08:18 1780906098

> but the soul & humanity of their skill is gone.

If robots make life saving surgery cheaper so that many people can afford it, isnt that a good thing?

rvz · 2026-06-07T14:24:09 1780842249

100%. Unfortunately those not in the depths of mission critical systems or regulated products will continue to believe that producing tons of code quickly using LLMs without humans in these systems is acceptable.

Here's an example of what we will continue to see with folks fully immersed in gen AI psychosis:

"The creator of claude code said that he no longer writes code for about 6 months and now has Claude doing all his work now. He also said recently that he no longer prompts Claude and now has it running in loops and it is self-improving itself and performing better than a human!"

If the code produced by the LLM is perfect, the LLM takes the credit. But when a disaster happens, you cannot blame the LLM and it then falls on the human who did it.

I don't think SWEs heavily vibe-coding with LLMs realize the risk in not understanding what the code the LLM being produced is doing even after generating tests (lol). We will see more of this too. [0]

[0] https://sketch.dev/blog/our-first-outage-from-llm-written-co...

oceanplexian · 2026-06-07T15:09:06 1780844946

Why is it such a dramatic statement for Boris to claim that he no longer writes code?

Are people on HN still typing out functions by hand one character at a time?

It would be like a developer in 2020 claiming that he only writes assembly because compilers can’t be trusted. No one is taking that person seriously. If you chose a career in tech you made a decision to work in one of the fastest moving fields in human history. Now it’s time to get over it, learn the new tools and adapt.

bigstrat2003 · 2026-06-07T15:25:01 1780845901

> Now it’s time to get over it, learn the new tools and adapt.

No, thank you. I have used the new tools, determined that they aren't helpful to me, and set them aside as I would with any other bad tool. I don't feel the need to let hype take the steering wheel.

chipsrafferty · 2026-06-08T03:01:40 1780887700

Let me guess, you used them 12 months ago?

rvz · 2026-06-07T15:31:39 1780846299

> Now it’s time to get over it, learn the new tools and adapt.

Exactly. You are free to use openclaw or a coding agent to build a competing bank, hedge-fund, hospital or even a new airliner because the previous ones were built by humans. Surely an AI can do it better by itself.

So why haven't you done it yet?

matkoniecz · 2026-06-07T16:37:18 1780850238

> Are people on HN still typing out functions by hand one character at a time?

Yes, me. Yes, I tried LLMs for what I am doing and will try again in few months. No, there was no noticeable or clear improvement over doing it manually.

Yes, I am using some LLMs for some purposes but Claude Code had slight improvement, if any, not worth introducing proprietary dependency.

troupo · 2026-06-07T16:16:48 1780849008

> Why is it such a dramatic statement for Boris to claim that he no longer writes code?

Because we can actually see the disjointed slop that Anthropic produces. And when issues happen, they can't fix them for weeks on end because no one understands what code does anymore, and all of their "hard problems causing issues" they blog about are literally "if we had actual engineers this wouldn't even be an issue to begin with". Like this bullshit they had in spring: https://www.anthropic.com/engineering/april-23-postmortem

> It would be like a developer in 2020 claiming that he only writes assembly because compilers can’t be trusted.

LLMs are not compilers. For a few very obvious reasons I'll leave as an exercise to figure out

lelanthran · 2026-06-07T16:43:25 1780850605

> Now it’s time to get over it, learn the new tools and adapt.

If the AI is producing what you tell it to, why are you needed?

pyth0 · 2026-06-07T19:18:04 1780859884

You answered your own question... you are needed to tell it what to do. Let's not pretend that someone with prior software skills will be able to produce larger scale and/or higher quality work compared to someone with no experience.

camdenreslink · 2026-06-08T00:46:47 1780879607

> Let's not pretend that someone with prior software skills will be able to produce larger scale and/or higher quality work compared to someone with no experience.

This seems like a really confident prediction. It isn't true right now, why do you think it will be true in the future? Right now having knowledge and experience is a huge benefit to steering the LLM (it makes dumb decisions all the time still).

pyth0 · 2026-06-08T11:56:13 1780919773

I totally intended to say "let's not pretend that ... will NOT be able to". Typo and poorly worded, my mistake. I completely agree that the prior experience significantly amplifies what you can do with an LLM over someone without experience.

msm_ · 2026-06-07T16:32:24 1780849944

>Are people on HN still typing out functions by hand one character at a time?

Well I use tab completion, of course. And I copy-paste snippets from LLM more often than from SO now. But otherwise not much has changed in my career in the last 5 years. Is this different for you?

I'm not fundamentally opposed to code generation, and I use LLMs for some taks, but I don't see myself vibecoding whole pages of production code. I vibecoded a throwaway note-taking app for myself though.

chipsrafferty · 2026-06-08T03:00:46 1780887646

Yes. If you're copy posting code from LLMs you're in the minority of people who are "in the past". Most people are generating most of their code in ~1-200 LOC chunks, reviewing it, adjusting as needed (usually with another prompt) and then opening a PR, which gets reviewed both my LLM and other teammates.

sensanaty · 2026-06-08T09:34:16 1780911256

Citations very much needed on the "most" here. I'm in a huge corpo with ~1500ish devs and the large majority of code is still very much handwritten, and we have access to all the AI tools imaginable (it's not forced on us because they treat us like professionals that will use the most appropriate tools to do the job, thankfully).

latentsea · 2026-06-08T04:02:25 1780891345

Our whole team has transitioned to agentic engineering around the start of this year. We've been heavily investing in harness engineering to ensure final outputs are production quality and not slop.

These days it very much feels like the task has shifted from "building the system" to "building the system that builds the system".

It only looks to be trending one way.

rjrjrjrj · 2026-06-07T15:59:18 1780847958

C'mon, the LLM/compiler false analogy? In 2026?

solenoid0937 · 2026-06-07T16:18:17 1780849097

It is because HN is contrarian and behind the times.

I work at a big tech company and I don't know a single person that still hand writes code. Most people haven't hand written code for at least half a year now.

I do wonder what sort of bug is making its rounds on HN that people here find this so shocking and unbelievable.

troupo · 2026-06-07T21:03:09 1780866189

The bug is called "applying actual engineering principles and critical thinking".

The absolute vast majority of people who point out AI's downsides have used it and use it. People who uncritically write things like "I work at a big tech company and I don't know a single person that still hand writes code." scare the shit out of us for a good reason.

solenoid0937 · 2026-06-07T23:40:15 1780875615

Using it is not enough. You have to explore its boundaries, see what it can do when cost is not a constraint. This is only really possible at the big tech companies and well funded startups.

Any eng that is only using Claude Code or Codex or whatever, is frankly not entitled to talk about AI's limits since they are using the most basic harnesses. They literally don't know better.

When I see Claude Code or Codex users on HN talking about how coding with AI is risky, it's like watching someone that has only ever seen a catapult argue about how space travel must be impossible.

camdenreslink · 2026-06-08T00:49:14 1780879754

> Any eng that is only using Claude Code or Codex or whatever, is frankly not entitled to talk about AI's limits since they are using the most basic harnesses. They literally don't know better.

I really doubt big tech has much better harnesses than are publicly available. Definitely not "catapult vs space travel". They have the same base models we all have access to.

Chu4eeno · 2026-06-08T01:52:59 1780883579

They are utterly trivial tools, and most of them are extremely over-engineered to the point of absurdity (that was the most embarassing about the Claude Code source map leak imho). That the for loop concatenating LLM output from POST calls to an internal buffer has more code than what is doing training or inference should tell them something. Hermes is the only one that tried to do something novel (low bar), and at least when I looked at it it hadn't succumbed to the vibes yet.

troupo · 2026-06-08T12:33:57 1780922037

You only explore boundaries of an AI if you are actually training and evaluating your own models.

Harnesses aren't testing boundaries of AI. They inject "make no mistakes" in various forms and provide some session management tools.

And, as with any people fully buying into and promoting hype, "my harness is in another castle" (they never show anything they boast about") lathered with huge amounts of crude demagoguery and analogies.

SpicyLemonZest · 2026-06-08T02:57:08 1780887428

Would you care to share the name of a good harness which might qualify someone to talk about AI's limits? There's quite a lot of big tech companies and well funded startups using Claude Code and Codex, although I suppose it's possible that none of them know what they're doing.

arkadiytehgraet · 2026-06-08T11:40:25 1780918825

You got baited by an Anthropic shill, they are not working at any big tech company. See https://news.ycombinator.com/item?id=48270186 for more info.

solenoid0937 · 2026-06-08T03:20:38 1780888838

I'd probably break NDA if I said anything about ours, sadly. I don't know of any publicly available harness on the same level as what big tech companies use internally.

habinero · 2026-06-08T09:33:22 1780911202

Ah, of course. The classic trust me bro.

troupo · 2026-06-08T12:14:20 1780920860

My girflriend goes to a different school, you wouldn't know her

arkadiytehgraet · 2026-06-08T11:39:34 1780918774

solenoid0937 is not working at any big tech company, instead they are being paid by Anthropic to spread unfounded LLM-hype. They have a history of doing that. See https://news.ycombinator.com/item?id=48270186 for more info.

solenoid0937 · 2026-06-08T13:03:45 1780923825

Ah, my stalker has returned! I was wondering where you were. Yup, I can confirm that Dario personally hands me a check every evening.

trumpdong · 2026-06-07T14:47:35 1780843655

It was my impression that a whole lot of products are only pretending to be compliant, and that it's much more profitable to operate like that.

InsideOutSanta · 2026-06-07T16:00:03 1780848003

I've worked in fintech for 30 years. I've never seen a product that was intentionally "only pretending to be compliant" with laws.

I've seen accidental non-compliance. I've seen what I would call negligent compliance, where a company attempted to be compliant but didn't meet full, correct compliance (one example I've seen is that a company assigned resources to compliance and forgot to increase resources as workload increased, causing them to be increasingly behind on compliance work), but I've never seen a company that just decided to pretend to be compliant knowing that they were not.

lowbloodsugar · 2026-06-07T19:12:45 1780859565

Never seen one or never worked at one? Because gestures broadly

rpicard · 2026-06-07T14:49:42 1780843782

In my experience this is not representative of most fintechs. Of course there are both cases of real intentional noncompliance, and accidental, but by and large it seems like everyone’s trying to innovate within the law.

scott_w · 2026-06-07T16:43:41 1780850621

This makes sense because these companies want to become large companies and contract with large companies. Large companies, by and large, try to follow the law (while trying to bend it to the limit) because they're aware they have a big target on their back and no CEO wants to be on the front page of the papers for tanking a company in such a stupid fashion.

saghm · 2026-06-07T15:14:15 1780845255

Even if that's the case, I feel like accurately knowing which regulations you're in compliance with and not is would be kind of important from a risk management perspective. From a "maximize profits" perspective (which I'm not saying is good but what you're saying you thought they operated with), you'd want to know the potential gain from ignoring a given regulation and the likelihood of getting caught (along with the cost of the punishment if that's happens). This is the kind of math that I'd expect a finance company to be pretty familiar with, and giving that up for a fuzzy "idk if we're in compliance or not" check seems like a pretty huge liability (unless there's confidence in not being liable for blindly trusting the LLM, which I hope is not the future we're headed for but I guess I can never be totally confident in us not somehow ending up with rules that defy common sense).

sandworm101 · 2026-06-07T15:00:55 1780844455

Companies that are growing tend towards faking compliance. Many financial rules like pci only kick in at certain scales. So a company growing very quickly will often be behind the curve but will do everything to seem like they are compliant. Then they would hire people like me to come in and make them actually compliant. More often than not, making an effort at improvement was enough to keep the ball rolling.

mattmanser · 2026-06-07T15:17:45 1780845465

I think it's the same throughout startup software to be honest. It's just easier to point out when there's clear rules.

Security, GDPR, backups, build pipelines, disaster recovery, most of it will be faked, half-heartedly done once or ignored entirely.

Then there's the more abstract things like scalability, idempotency when integrating with external APIs, error recovery, accessibility, UX, etc.

Almost always that sort of stuff will have been entirely ignored, or there will be a fig leaf over a real mess of misunderstood standards or manual intervention steps.

Startup developers usually have to be generalists as they often wear many hats, so things that need deeper domain knowledge get done to a bare minimum.

habinero · 2026-06-08T09:19:16 1780910356

Nah. There are for-real consequences to that sort of thing and it's so not worth it.

Early stage startups, maybe, but most of those are three failsons in a trenchcoat. That sort generate more PR than revenue.

jimbokun · 2026-06-08T00:31:27 1780878687

In the same sense that selling illegal drugs is a very profitable business to be in, until it isn’t.

IAmGraydon · 2026-06-07T16:05:59 1780848359

Where did you get this impression from?

parineum · 2026-06-07T16:56:26 1780851386

A worldview built on reading comments from news aggregators.

deanc · 2026-06-07T13:50:10 1780840210

I've worked on projects in the airline and health industry which are highly regulated too. The regulations can be incredibly difficult to process and implement, and make sure you adhere to everything correctly. I've been involved in multiple scenarios where people have made false assertions about compliance or lack of. I'd still place a bet that the SOA models make _far_ less mistakes than humans.

genxy · 2026-06-07T13:59:20 1780840760

They might make fewer mistakes, but they aren't evenly distributed. They don't use logic when making mistakes, it is gaps in the training data and now large of a span they have to bridge in the latent space. Just as they aren't smart like humans, they aren't stupid like humans. Don't mistake rate for quality.

Terr_ · 2026-06-07T17:39:21 1780853961

Yeah, this starts to overlap with some autonomous vehicle stuff, where I like to say that the rate of errors is not the shape or distribution of errors.

We have long historical experience and innate tools for detecting and mitigating errors made by humans. If we can't apply those to automation, then even fewer total mistakes may end up being a worse outcome.

sillyfluke · 2026-06-07T14:51:16 1780843876

>I'd still place a bet that the SOA models make _far_ less mistakes than humans.

Genuine question: your top coder seems to be producing the most error-free code from your perspective, has the deepest knowledge of the architecture and codebase, and is faster on the trigger than the others.

But your top coder has proven and verifiable dementia, where they will confidently assume the existence of apis and code that do not exist, mix up the purpose of others and forget other things, and you can't predict when and how they will introduce errors into the system or the severity of such errors.

Are you really comfortable letting this person with dementia generate most of your codebase in the airline and health industry?

I also hope you have an iron-clad agreement that prevents the model provider from doing silent updates because all your evidence of correctness you collected thus far goes out the window in that case.

Another genuine question:

You have witnessed a human coder and the AI you're using make the same important mistake. Assuming you do not have the time and resources to retrain, fine tume, and test your frontier model:

Who would you trust not to make the same mistake multiple times in the future after you have warned them that their job depends on it, the AI or the human?

deanc · 2026-06-07T15:04:14 1780844654

Your top coder has guard rails in place to prevent him autonomously going free - right? This is how you should approach agentic development with LLMs. Like it or not, we are the final bastion, the gatekeepers. The hallucination thing I think is mostly overblown and from speaking to colleagues it seems to vary wildly depending on which model and harness you are using - always go for SOA. In the last 3 months I can count on one hand where it's done something wrong and that's primarily as I'm operating it with guard rails and giving it context.

sillyfluke · 2026-06-07T15:42:05 1780846925

>Your top coder has guard rails in place to prevent him autonomously going free - right?

The parent is implying they would prefer an AI when working in the airline and health industry because it makes less errors. Read the comment again.

They have not said, "Hey, I work in the airline and health industry and I'd love to use AI for a couple of the bullshit IT UIs we have as long as we can put guardrails on the AI to stay in its lane."

I asked a yes or no question. The guardrails you can put to mitigate errors are the same guardrails pre-AI for the humans (tests, regressions, reviews). If you were wary of employing a top lead engineer with verifiable dementia prior to AI for a mission critical system, logic implies you should think twice giving that much responsibility to an AI as well.

> The hallucination thing I think is mostly overblown

Can you predict when and how the SOTA model will hallucinate? Yes or no. Can you predict the severity impact of that error beforehand? Yes or no.

>from speaking to colleagues it seems to vary wildly depending on which model and harness you are using

You have partially answered my question it would seem.

deanc · 2026-06-07T16:14:21 1780848861

> Can you predict when and how the SOTA model will hallucinate? Yes or no. Can you predict the severity impact of that error beforehand? Yes or no.

No, but the same can be said for your colleagues. You might call what the LLM does hallucinations, I'd call them mistakes. I think we have totally forgotten that humans make them all the time and are confidently wrong too.

Your original question, doesn't really get to the bottom of the point I'm trying to make, and I don't really feel it fairly represents the issue we are talking about here. They are not the same things.

suttontom · 2026-06-07T18:03:38 1780855418

This is such a tired, meaningless argument. I've never seen a human in 10 years of professional software engineering at a large company ever so confidently, consistently create and send out seemingly well-reasoned code that's as wrong as what SOTA models using CC or Codex do. If a human did this, they would be fired or perpetually remain a junior who no one wants to work with.

Also, if a human does this, you can replace them and get a human who will not do it. The default for an LLM is to generate plausible-looking text that may or may not be completely incoherent. That is not the default for a human. Again, if you find that your colleague consistently fabricates APIs, you can hire someone who isn't crazy instead, but you cannot do the same with LLMs.

vor_ · 2026-06-07T18:55:37 1780858537

If a human was hallucinating and polluting a codebase with errors, they would be fired and possibly treated for dementia. Even worse, an LLM is trained to produce plausible-looking results, so it's harder to detect the mistakes.

sillyfluke · 2026-06-07T17:11:10 1780852270

>No, but the same can be said for your colleagues.

That's absolutely false. My collegues don't routinely and confidently invent apis that are not there, or spectacularly and repeatedly misunderstand the purpose of certain functions or exhibit extreme forgetfullness. Especially when I've warned them. Hallucinations and confabulations in otherwise healthy individuals are mental disorders. When I ask them why they made an certain kind of error, I can expect to get a reasonable answer. No one has uttered the phrase "Bob hallucinated again while writing those tests" when the Bob in question is a human.

deanc · 2026-06-07T17:16:09 1780852569

Well, your experience doesn't align with mine. I have been using, and in part of an organisation that is extensively using, Claude with Opus for everything for about 3 months now and I am not experiencing the problems you describe. We'll have to agree to disagree here.

sillyfluke · 2026-06-07T17:44:35 1780854275

That is fine. "Your experience may vary" is the crux of my argument amusingly. You can't have just realized that people are having different experiences using AI, or even that the same person has different experiences when they change domains or technical contexts. There's been lots of comments littered on this forum to that effect.

Calling hallucinations simply mistakes does not seem to me to be a healthy way to reason about LLMs. I can ask a collegue how well they can program in Ada and adjust my expectations on productivity and bug rates. I can't ask an LLM how well they can code in Ada (just a throwaway example), or even how much of Ada was in its training data. I have to actually spend money and spend time code reviewing before I can even formulate any expectations at all.

rfgplk · 2026-06-07T22:41:56 1780872116

Not only have I never ran across a hallucination in the past ~6 months or so; the latest Opus models have gotten to the point where they can emit inline assembly that is _superior_ to what gcc or clang can generate from optimized cpp. Had it rewrite a hot simd loop that took it from ~10 flops/cyc to ~14 by shaving off broadcasts. I _could not_ get any compiler to do this, no matter which flags I tried to use. So I literally have no idea what these people are talking about when they claim that SOA models hallucinate constantly.

shakna · 2026-06-08T04:27:28 1780892848

Last week, Opus gave me a decrement instead of an increment, on one particular line. Where I already had the decrement, but it was changing the width of the datatype everywhere.

And it took "convincing" that it had made a mistake.

CamperBob2 · 2026-06-07T22:37:04 1780871824

But your top coder has proven and verifiable dementia

Dementia gets worse. AI gets better. Nothing matters except d/dt.

csallen · 2026-06-07T14:52:18 1780843938

For some reason, tons of people seem to be in camps at both extremes. It's either "AI sucks don't trust it!" or "AI is so much better than humans!"

But the most reasonable take, which I'm happy to see reflected in so many comments in this thread, is… use both.

Do an AI pass, and have humans verify, and vice versa. Let the humans drive the AI. Then the unique shortcomings of each party can be covered by the other's strengths.

hammock · 2026-06-07T14:56:33 1780844193

AI review is never going to beat a fully resourced human review.

It might beat an underresourced human review, on time, efficiency, cost metrics. But on the metric of accuracy, throwing unlimited humans at a problem will still beat throwing unlimited AI at it

esafak · 2026-06-07T15:39:42 1780846782

That's an irrelevant comparison because cost is always a constraint, so there are not going to be unlimited AI or humans. The question is how to optimally combine them for a given cost.

bigstrat2003 · 2026-06-07T15:22:34 1780845754

> Do an AI pass, and have humans verify, and vice versa. Let the humans drive the AI.

You can do that, sure. But doing so negates any improvements in speed the LLM brought. And at that point, you may as well just do it yourself to begin with.

jghn · 2026-06-07T16:13:25 1780848805

When Google showed up on the scene I found I no longer needed to memorize basic syntax and other such things. If I couldn't remember on the fly, i'd just do a quick google search and move on. This freed space in my mind to instead focus on bigger & better things.

I use GenAI tools when coding a lot, but I do not vibe code. I go through everything it generated, and we iterate. And yes, it doesn't save me a lot of time. But what it does do is free up mental capacity in a similar manner. But instead of syntax, it's more complicated patterns. Maybe I don't remember how to stitch something together, but i know it can be done. Instead of spending the time to look it up and then code it, I just tell it to do it for me.

klibertp · 2026-06-07T23:55:13 1780876513

> Maybe I don't remember how to stitch something together, but i know it can be done.

That's how I use the current AI, too. I never ask them to do something without specifying how it should be done. I ask questions first, use /plan to let the model ask me questions, then I let it execute the plan while reviewing the results. More and more often, I get something close enough to what I would have written. In the opposite case, I at least know exactly how to rewrite the result, if needed.

I observe the same effect as you: while it does sometimes speed up the implementation a bit, it's not very noticeable; however, it frees me from having to recall all the obscure little details up front. Instead, I can describe them, have the model implement them, and then recognize them (and refresh my memory) when reviewing. The effect is that it's easier to start a task because I don't need to prepare as much to execute it. It's especially notable on things that I haven't touched for some time. I know, more or less, how my Elixir projects are set up, but after ~2 years of not working on them, getting back into them had been a hassle - with AI, it's no longer that. I think the biggest difference comes from the AI lowering the cost of context switching for me - I used to have huge problems with that, and AI certainly helped a lot.

skillina · 2026-06-07T16:22:37 1780849357

Yeah, humans reviewing the AI review can only detect the false positives, where the LLM claims something is non-compliant and flags it for review/correction by a human or another agent. Human review can’t find the false negatives (true deficiencies not flagged) unless you do a full audit yourself to find whatever deficiencies the AI missed.

coldtea · 2026-06-07T20:09:55 1780862995

>But doing so negates any improvements in speed the LLM brought.

We could do with less speed.

csallen · 2026-06-07T16:14:09 1780848849

I feel like you're missing the point that it's more thorough to use both. Speed isn't the only factor that matters.

BurningFrog · 2026-06-07T16:23:17 1780849397

This makes sense, but a logical next step is to have one AI write code, and then have another AI, instead of humans, verify it.

Or are current AIs too similar for that to be fruitful?

suttontom · 2026-06-07T17:53:39 1780854819

This is commonly known as "LLM-as-a-judge" and anecdotally multiple people I know who write code using OpenRouter or using multiple models say it's surprisingly effective. It's strange that there don't appear to be any major papers on it since ~early 2025, which at this point is basically ancient history.

criticalfault · 2026-06-07T15:23:45 1780845825

not according. to my experience.

regulation questions. even the simple ones, AI gets all the time wrong. it wasn't Mythos, but other models like opus.

I can adjust the view on this topic if/when we get access to mythos.

realusername · 2026-06-07T14:51:12 1780843872

> I'd still place a bet that the SOA models make _far_ less mistakes than humans.

Well too bad, the problem is that they also produce things much faster than humans so errors will compound quicker.

bobkb · 2026-06-07T16:11:19 1780848679

IMHO even if we are using auditing tools I believe we must use deterministic tools for critical analysis like this. Such rule and pattern based systems may not scale beyond certain point but they can be accurate.

tenthirtyam · 2026-06-08T09:58:26 1780912706

> anyone who's using them to actually ship product without a human actually digging into it is opening themselves up to a world of risk.

Maybe it's just me, but it seems that companies will happily take existential risks to get a better bottom line short term. Either you're too big to fail or you've already privatised any profits and subsequent losses (due to the risks becoming manifest) are socialised. The motor industry seems to be particularly egregious in this aspect, but also the food industry, construction industry etc.

Seems to me even governments make the same choices in many ways - cut back health-care, policing, education, public transport and let the next government deal with the consequences.

philipallstar · 2026-06-08T10:01:12 1780912872

> Seems to me even governments make the same choices in many ways - cut back health-care, policing, education, public transport and let the next government deal with the consequences.

Definitely - defund the police was an astonishing rallying cry that made the communities it pretended to help much more dangerous for their residents.

But the opposite is far more likely, and far more destructive long-term: it's easier to buy votes by spending more on social programmes and rack up debt and/or inflation to cover it, and then spend even more to fix that problem for enough voters that the people paying for it all can't vote it away, and the people who vote for it over the decades just don't understand why their pot feels so uncomfortably warm all of a sudden.

solenoid0937 · 2026-06-07T16:13:17 1780848797

I use Opus 4.8 and GPT 5.5 and haven't suffered from hallucinations in months. But we also put a lot of effort into our harness.

Aeolun · 2026-06-07T16:27:49 1780849669

Opus 4.8 and gpt constantly hallucinate stuff as well. If you haven’t encountered or caught it that’s something different. Of course these days it’s mostly confidently asserting a wrong thing.

latentsea · 2026-06-08T03:49:03 1780890543

They said that they hadn't "suffered" from hallucinations in months due to effort they put into harness engineering. It's not quite the same thing as saying hallucinations are never happening for them.

I find hallucinations happen for us with those models, but we've worked on baking in guardrails and fact checking against sources of truth, so that it's less of a problem.

We're engineers. We just try to engineer out the failure modes.

rfgplk · 2026-06-07T22:47:50 1780872470

I primarily use Opus for a Lisp-like DSL codebase (non public, closed source) and it genuinely has _never_ hallucinated. All it pulls from is BNF, language spec + examples. So I have no idea how people are getting it to hallucinate on _popular_ languages.

Chu4eeno · 2026-06-08T01:40:35 1780882835

I think you forget that they really are stochastic (I'd wager it has hallucinated things to you that wasn't important so you missed it, or you've just been very lucky), and the people you're arguing with are forgetting that there is significant difference in when and how often even frontier LLMs hallucinate. I catch claude every now and then, but isn't the measured hallucination rates down in low single digit percent for claude now?

Loic · 2026-06-07T16:18:43 1780849123

Sometimes the harness can only be a human.

And this is fine. Developing new software with a really smart intern is the same, you, as an expert, need to bring your experience/expertise on the table to have everything right. Because experience needs time.

rfgplk · 2026-06-07T22:44:03 1780872243

Those models simply don't hallucinate if you use them properly in any form. The only way they _might_ hallucinate is if you use the web based chat interface and give them zero context.

vips7L · 2026-06-08T03:41:53 1780890113

Ah the old “you’re holding it wrong” argument.

JoeyJoJoJr · 2026-06-08T05:43:10 1780897390

Sorry, but we aren’t at the “press a button to build a game/app” yet.

DaedalusII · 2026-06-08T00:08:43 1780877323

you should rewrite this comment with chatgpt if you do not want to be dox

https://www.tomsguide.com/ai/ai-can-now-identify-anonymous-i...

mbbutler · 2026-06-07T15:34:33 1780846473

False-positive rate is so high with Mythos according to friends and other reporting I have seen.

The original Mythos release used ASan to filter false-positives so it was able to maintain a good FPR, but when Mythos moves into domains that don't have a readily available oracle to help filter hits, the result is a deluge of false bullshit.

ilaksh · 2026-06-07T17:00:28 1780851628

3 years max. Maybe 5 if you are lucky.The models will continue to improve. The exponential gains in compute efficiency that have been ongoing for 70+ years will continue and that will result in even smarter models. There are dramatic hardware changes in the pipeline.

But really that particular issue could have been solved by literally just telling it in a markdown file or instructions something like "verify all facts or compliance requirements with web search and include citations in responses".

ofjcihen · 2026-06-07T17:12:05 1780852325

This is akin to “don’t make mistakes”

“Verify all facts and compliance requirements” leaves enormous holes even if you assume the LLM has a concept of facts and requirements (it does not).

What facts? What requirements? For what industry? For what subset of that industry? For what country or countries that you will be doing business in? Are these current “facts” and “requirements” or is the LLM referencing a dusty article from 1992 for which the subject matter has been radically overhauled?

In my job I regularly see small but incredibly important mistakes like this lead to major issues. Some of those are human driven but increasingly the defense of the person responsible has turned into “Claude said it was fine though!”

rfgplk · 2026-06-07T22:51:37 1780872697

> “Verify all facts and compliance requirements”

No. This is a disasterous instruction. Not only is it vague, but it's also meaningless. When giving instructions to an LLM your prompt must be concise and exact. Tell it _exactly_ which requirements need to be followed, ideally have it write or (preferably) pass audited tests to enforce these requirements. You also need to provide it with a hard source of truth it can rely upon. Instead of saying "verify facts", you're better off by saying "... make sure [whatever you're doing] matches with data at X.Y.Z, verify by running [instruction/command/program]"

ofjcihen · 2026-06-08T12:34:22 1780922062

I think you might have meant to reply to the parent comment.

zuzululu · 2026-06-08T06:14:20 1780899260

Especially in cybersecurity.

zuzululu · 2026-06-08T06:20:39 1780899639

If someone can’t distinguish between the two then I honestly wonder what company would be comfortable putting them anywhere near a regulated or security-sensitive workflow especially from someone one that condescendingly views their own jobs as a daycare for people seemingly beneath them.

ilaksh · 2026-06-07T17:24:01 1780853041

It can make mistakes and will sometimes, but what he specifically mentioned was a case where it did not pull up a reference that it needed. So using a web search tool effectively would make a big difference.

ofjcihen · 2026-06-07T17:34:19 1780853659

It still does not rise the standard he requires which your response indicated would be easy for the model to achieve with a simple prompt.

Additionally, using a specific tool does not suddenly give the model common sense enough to say “this piece of information doesn’t answer the question of whether this solution fits in this specific industry at this time in this place”.

ilaksh · 2026-06-07T18:08:20 1780855700

A web search tool to pull up the law that is relevant?

kolinko · 2026-06-07T18:54:55 1780858495

Well, you wouldn't just give human a task "verify all facts and compliance requirements" and expect it to end well either, no?

ofjcihen · 2026-06-07T20:33:36 1780864416

If I was working with someone who had experience in the specific industry then yes, that is in fact what I would do.

If I plucked a random passerby and gave them the task then no, I’d find myself detailing out every specific to them.

You’re equating the LLM to the least qualified candidates. I don’t think your argument is communicating what you intended.

zuzululu · 2026-06-07T21:51:36 1780869096

of course not, nobody experienced at their job would/should be saying that and expecting it to be flawlessly followed through especially cybersecurity.

feel like the parent you are replying to literally views their place of work as a daycare which is very condescending

ofjcihen · 2026-06-07T22:33:19 1780871599

You’ve managed to contradict yourself between your 1st and 2nd sentence. I’m not sure what point you’re trying to make.

The argument is regarding LLMs and domain knowledge.

zuzululu · 2026-06-07T22:42:15 1780872135

Explain.

vor_ · 2026-06-07T18:46:51 1780858011

> 3 years max. Maybe 5 if you are lucky.The models will continue to improve. The exponential gains in compute efficiency that have been ongoing for 70+ years will continue and that will result in even smarter models. There are dramatic hardware changes in the pipeline.

I remember hearing that 10 years ago about self-driving.

oblio · 2026-06-07T19:13:35 1780859615

60 years ago about flying cars, 40 years ago about cold fusion, the list is long.

We need a lot more basic research into LLMs and also a lot cheaper hardware.

The current batch of LLMs will turn a lot of fields upside down, but not to the tune of $3tn or whatever crazy amounts are being invested right now.

ilaksh · 2026-06-07T20:01:54 1780862514

I mean basically you and I are effectively living in parallel universes. Waymo has been running for years, and there are other services including in China and Tesla which is not 100% there but actually very effective.

And the thing he complained about is fixable with a web search, and AI does programming and office work today. So, it's already here. It's just a question of degrees.

habinero · 2026-06-08T09:31:10 1780911070

Waymo heavily relies on real humans to get their robots unstuck. They also rely on extremely detailed mapping data, which is why they're only in a few cities.

Tesla has been a couple years away from FSD for, what, like ten years now?

If you scrape off the glitter, you'll find a lot more duct tape and wire than you think.

DaSHacka · 2026-06-07T19:09:44 1780859384

"Just 2 more weeks guys, and AI will be able to do everything!"

suttontom · 2026-06-07T17:10:11 1780852211

Ah yes, the magical equivalent of "you are a senior software engineer who writes bug-free code".

IME people would benefit greatly from the process, albeit tedious and time-consuming, of testing out the same prompt sequence/session with the exact same model multiple times. It becomes clear extremely quickly how capable but unreliable and inconsistent a model can be even when given the same context. If you have ever completed a long, complicated task with an agent and then lost the session and tried doing the same thing again from scratch you may have had the experience of seeing the subtle changes that come up in the model's thinking which lead it to accept or reject certain paths and ignore or incorporate prompt instructions like the one you've provided.

ilaksh · 2026-06-08T03:45:14 1780890314

Change the temperature to 0 and it will be more consistent.

jppope · 2026-06-07T17:25:13 1780853113

Stuff like that is risk tolerance... its not strictly codified and its more akin to probability. Different companies at different stages, in different industries will all interpret their risk differently... how will a smarter model improve that?

eikenberry · 2026-06-07T17:22:58 1780852978

The classic 3-5 year window for a new technology that is uncertain and requires just a few more breakthroughs to get there...

Upvoter33 · 2026-06-07T19:04:22 1780859062

written with confidence too. I'm amazed at the levels of confidence people have in predicting the (unclear) future.

latentsea · 2026-06-08T03:54:22 1780890862

LLMs learnt to be confidently wrong from us.

weakfish · 2026-06-07T17:59:48 1780855188

Like full self driving!

tpoacher · 2026-06-07T16:19:02 1780849142

In some sense, you should still act on this, since if an external auditor relies on the same stack, it'll still cause you headaches.

whatevaa · 2026-06-07T16:22:50 1780849370

The models can change at any time and behave differently.

DANmode · 2026-06-08T04:53:21 1780894401

> the code in question had already been reviewed by human counsel

You sure?

galactushonor · 2026-06-07T14:59:57 1780844397

> it had of course hallucinated what the regulation actually required

Did it do the correct job once you put the regulations doc(s) in the context?

loloquwowndueo · 2026-06-07T15:14:43 1780845283

What I usually do when in doubt is challenge the AI. “Please quote the section of regulation the product is non compliant with”. It usually admits it hallucinated the whole thing.

mattmanser · 2026-06-07T15:21:10 1780845670

It sometimes says that even if it hasn't though, so like everything with LLMs, you can't actually rely on that.

Chu4eeno · 2026-06-08T01:58:26 1780883906

You should just hit retry, usually they either latch onto the correct part of the latent space (surprisingly often) or they admit they don't know (or call a tool, depending on the model).

Lionga · 2026-06-07T13:57:01 1780840621

Have you added "Make no mistakes" to the proompt? Mythos can't go wrong then, must be a skill issue.

cheschire · 2026-06-07T14:29:06 1780842546

its shocking people don't realize you're being ironic

steveBK123 · 2026-06-07T14:34:20 1780842860

AI cannot fail, it can only be failed