Hacker Newsnew | past | comments | ask | show | jobs | submit | lubujackson's commentslogin

I explored the different mental frameworks for how we use LLMs here: https://yagmin.com/blog/llms-arent-tools/ I think the "software factory" is currently the end state of using LLMs in most people's minds, but I think there is (at least) one more level: LLMs as applications.

Which is more or less creating a customized harness. There is a lot more that is possiible once we move past the idea that harnesses are just for workflow variations for engineers.


Bit by bit, we need to figure out how to rebuild human contextual understanding in a way that LLMs can understand. One thing that gets overlooked is the problem if incorrect data. You can provide all of the context in the world but LLMs tend to choke on contradictions or, at the minimum, work a whole lot harder to determine how to ignore or work around incorrect facts.

"Forgetting" and "ignoring" are hugely valuable skills when building context.


I can’t help but feel the logical conclusion to such context conundrums is that”what if we spoke Haskell to the LLM, and also the LLM could compile Haskell?

And, yeah. Imagine if our concept-words were comprehensible, transmittable, exhaustively checked, and fully defined. Imagine if that type inference extended to computational execution and contradictions had to be formally expunged. Imagine if research showed it was more efficient way to have dialog with the LLM (it does, btw, so like learning Japanese to JRPG adherents should learn Haskell to LLM optimally). Imagine if multiple potential outcomes from operations (test fail, test succeeds), could be combined for proper handling in some kind of… I dunno, monad?

Imagine if we had magic wiki-copy chat-bots that could teach us better ways of formalizing and transmitting our taxonomies and ontologies… I bet, if everything worked out, we’d be able to write software one time, one place, that could be executed over and over forever without a subscription. Maybe.


> the problem if incorrect data.

Was the typo intentional? :)


I agree LLMs shouldn't be "compilers" because that implies abstracting away all decisions embedded in the code. Code is structured decisions and we will always want access and control over those decisions. We might not care about many of those decisions, but some of those we absolutely do. Some might be architectural, some might be we want the button to always be red.

This is why I think the better goal is an abstraction layer that differentiates human decisions from default (LLM) decisions. A sweeping "compiler" locks humans out of the decision making process.


Have you ever led a project where you had to give the specs to other developers? Have you ever contracted out complete implementation to a consulting company? Those are just really slow Mechanical Turk style human LLMs

100% agree with the org shift, but I think of things differently. Specialists are important for architectural insight and domain expertise, but are also the byproduct of codebases growing past a certain size.

It starts with "I know/can handle issues from infra to design" to regroupings of focus, often DevOps / code / design. But companies also might split focus by user concerns, like the "Admin console team" vs. "End user team". That depends on the product and the complexity of the specialist concerns.

I think across the board there is going to be a blurring of management and engineering. We see the value of "product engineers" now but they are starting to eat some of PM's lunch. On the other side, "technical PMs" are more valuable, as they come at things from the other side. The driver for this change is that both are using a shared context to bridge the gap from "business concerns + product requirements" to code.


I appreciate the detailed response and I certainly haven't studied this, but part of the reason I made the measurement/construction comparison is because information is not equally important, but the errors are more or less equally distributed. And the biggest issue is the lack of ability to know if something is an error in the first place, failure is only defined by the difference between our intent and the result. Code is how we communicate our intent most precisely.

You're absolutely right. Apologies if I came off as critical, which wasn't my intent.

I was trying to make a connection with random sampling as a way to maybe reduce the inherent uncertainty in how well AI solves problems, but there's still a chance that 10 AIs could come up with the wrong answer and we'd have no way of knowing. Like how wisdom of the crowd can still lead to design by committee mistakes. Plus I'm guessing that AIs already work through several layers of voting internally to reach consensus. So maybe my comment was more of a breadcrumb than an answer.

Some other related topics might be error correcting codes (like ECC ram), Reed-Solomon error correction, the Condorcet paradox (voting may not be able to reach consensus) and even the halting problem (zero error might not be reachable in limited time).

However, I do feel that AI has reached an MVP status that it never had before. Your post reminded me of something I wrote about in 2011, where I said that we might not need a magic bullet to fix programming, just a sufficiently advanced one:

https://web.archive.org/web/20151023135956/http://zackarymor...

I took my blog(s) down years ago because I was embarrassed by what I wrote (it was during the Occupy Wall Street days but the rich guys won). It always felt so.. sophomoric, no matter how hard I tried to convey my thoughts. But it's interesting how so little has changed in the time since, yet some important things have.

Like, I hadn't used Docker in 2011 (it didn't come out until 2013) so all I could imagine was Erlang orchestrating a bunch of AIs. I thought that maybe a virtual ant colony could be used for hill climbing, similarly to how genetic algorithms evolve better solutions, which today might be better represented by temperature in LLMs. We never got true multicore computing (which still devastates me), but we did get Apple's M line of ARM processors and video cards that reached ludicrous speed.

What I'm trying to say is, I know that it seems like AI is all over the place right now, and it's hard to know if it's correct or hallucinating. Even when starting with the same random seed, it seems like getting two AIs to reach the same conclusion is still an open problem, just like with reproducible builds.

So I just want to say that I view LLMs as a small piece of a much larger puzzle. We can imagine a minimal LLM with less than 1 billion parameters (more likely 1 million) that controls a neuron in a virtual brain. Then it's not so hard to imagine millions or billions of those working together to solve any problem, just like we do. I see AIs like ChatGPT more like logic gates than processors. And they're already good enough to be considered fully reliable, if not better at humans than most tasks already, so it's easy to imagine a society of them with metacognition that couldn't get the wrong answer if it tried. Kind of like when someone's wrong on the internet and everyone lets them know it!


This is very much a "vibe coding can build you the Great Pyramids but it can't build a cathedral" situation, as described earlier today: https://news.ycombinator.com/item?id=46898223

I know this is an impressive accomplishment and is meant to show us the future potential, but it achieves big results by throwing an insane amount of compute at the problem, brute forcing its way to functionality. $20,000 set on fire, at Claude's discounted Max pricing no less.

Linear results from exponential compute is not nothing, but this certain feels like a dead end approach. The frontier should be more complexity for less compute, not more complexity from an insane amount more compute.


> $20,000 in API costs

I would interpret this as being at API pricing. At subscription pricing, it's probably at most 5 or 6 Max subscriptions worth.


> $20,000 set on fire

To be fair, that's two weeks of the employer cost of a FAANG engineer's labor. And no human hacks a working compiler in two weeks.

It's a lot of AI compute for a demo, sure. But $20k stunts are hardly unique. Clearly there's value being demonstrated here.


Yes a human can hack together a compiler in two weeks.

If you can't, you should turn off the AI and learn for yourself for a while.

Writing a compiler is not a flex; it's a couple very well understood problems, most of which can be solved using existing libraries.

Parsing is solved with yacc, bison, or sitting down and writing a recursive descent parser (works for most well designed languages you can think of).

Then take your AST and translate it to an IR, and then feed that into anything that generates code. You could use crainlift or whatever it's called, you could roll your own.


Meanwhile:

> I spent a good part of my career (nearly a decade) at Google working on getting Clang to build the linux kernel.

https://news.ycombinator.com/item?id=46905771


Afaik the Linux Kernel strongly depends on GCC extensions and GCC specific behavior, so maybe that's why this is such an interesting part? Also extensions like inline assembly seem wildly complicated to add to an existing compiler WHILE replicating the syntax and semantics of another compiler (which has a different software architecture).

If you spend a decade working on something, you’re not “hacking it”.

> Parsing is solved with yacc, bison, or sitting down and writing a recursive descent parser (works for most well designed languages you can think of).

No human being writes a recursive descent parser for "Linux Kernel C" in two weeks, though. And AFAIK there's no downloadable BNF for that you can hand to an automatic generator either, you have to write it and test it and refine it. And you can't do it in two weeks.

Yes yes, we all know how to write a compiler because we took a class on it. That's like "Elite CS Nerd Basic Admission". We still can't actually do it at the cost being demonstrated, and you know it.


I wrote a couple hobby compilers. The only difficulty with C is the ambiguous syntax.

Now compare the article's setup with a single senior engineer who uses an agent or two at the same time.


> I wrote a couple hobby compilers.

So did most of us, join the club. What you can't do is write such a compiler for $20k if you want to put food on the table, or do it in two weeks (what it costs to buy your time currently until AI eats your job). And let's be honest: it's not going to build something of the complexity of Linux either. Hobby compilers run hobby code. Giant decades-old source trees test edge cases like no one's business.


Is there really value being presented here? Is this codebase a stable enough base to continue developing this compiler or does it warrant a total rewrite? Honest question, it seems like the author mentioned it being at its limits. This mirrors my own experience with Opus in that it isn't that great at defining abstractions in one-shot at least. Maybe with enough loops it could converge but I haven't seen definite proof of that in current generation with these ambitious clickbaity projects.

This is an experiment to see the current limit of AI capabilities. The end result isn't useful, but the fact is established that in Feb 2026, you can spend $20k on AI to get a inefficient but working C complier.

Of course it's impressive. I am just pointing out that these experiments with the million line browser and now this c compiler seem to greatly extrapolate conclusions. The researchers claim they prove you can scale agents horizontally for econkmic benefit. But the products both of these built are of questionable technical quality and it isnt clear to me they are a stable enough foundation to build on top of. But everyone in the hype crowd just assumes this is true. At least this researcher has sort of promised to pursue this project whereas Wilson already pretty much gave up on his browser. I hadn't seen a commit in that repo for weeks. Given that, I am not going to immediately assume these agents truly achieved anything of economic value relative to what a smaller set of agents could have achieved.

> The end result isn't useful

Then, as your parent comment asked, is there value in it? $20K, which is more than the yearly minimum wage in several countries in Europe, was spent recreating a worse version of something we already have, just to see if it was possible, using a system which increases inequality and makes climate change—which is causing people to die—worse.


> inefficient but working

FWIW, an inefficient but working product is pretty much the definition of a startup MVP. People are getting hung up on the fact that it doesn't beat gcc and clang, and generalizing to the idea that such a thing can't possibly be useful.

But clearly it can, and is. This builds and boots Linux. A putative MVP might launch someone's dreams. For $20k!

The reflexive ludditism is kinda scary actually. We're beyond the "will it work" phase and the disruption is happening in front of us. I was a luddite 10 months ago. I was wrong.


You are projecting and over-reacting. My response is measured against the insane hype this is getting beyond what was demonstrared. I never said ot wasn't impressive.

I'm not hung up on anything. Clearly the project isn't stable because it can't be modified without regression. It can be an MVP but if it needs someone to rewrite it or spend many man-months just to grok the code to add to it then its conceivable it isnt an economic win in the long run. Also, they haven't compared this to what a smaller set of agents could accomplish with the same task and thus I am still not fully sold on the economic viability of horizontally scaling agents at this time (well at least not on the task that was tested).


If it generates a booting kernel and passes the test suite at 99% it's probably good enough to use, yeah.

The point isn't to replace GCC per se, it's to demonstrate that reasonably working software of equivalent complexity is within reach for $20k to solve whatever problem it is you do have.


> it's probably good enough to use, yea.

Not for general purpose use, only for demo.

> that reasonably working software of equivalent complexity is within reach for $20k to solve

But if this can't come close to replacing GCC and can't be modified without introducing bugs then it hasn't proven this yet. I learned some new hacks from the paper and that's great and all but from my experiencing of trying to harness even 4 claude sessions in parallel on a complex task it just goes off the rails in terms of coherence. I'll try the new techniques but my intuition is that its not really as good as you are selling it.


> Not for general purpose use, only for demo.

What does that mean, though? I mean, it's already meeting a very high quality bar by booting at all and passing those tests. No, it doesn't beat existing solutions on all the checkboxes, but that's not what the demo is about.

The point being demonstrated is that if you need a "custom compiler" or something similar for your own new, greenfield requirement, you can have it at pretty-clearly-near-shippable quality in two weeks for $20k.

And if people can't smell the disruption there, I don't know what to say.


Is it really shippable if it is strictly worse than the thing it copied. Do you know anyone who would use a vibe coded compiler that cant be modified without introducing regressions (as the researcher admitted)?

> you can have it at pretty-clearly-near-shippable quality in two weeks for $20k.

if you spend months writing a tight spec, tests and have a better version of the compiler around to use when everything else fails.


> if you spend months writing a tight spec, tests and have a better version of the compiler around to use when everything else fails.

Doesn't matter because your competitors will have beaten you to market. That's just a simple Darwinian point, no AI magic needed.

No one doubts that things will be different in the coming Claudepocalypse, and new ideas about quality and process will need to happen to manage it. But sticking our heads in the sand and pretending that our stone tools are still better is just a path to early retirement at this point.


I feel like maybe you spend too much time watching hypefluencers. AI tools are great but if they are already super intelligent why haven't you gotten a swarm of agents to build yourself a billion dollar SaaS?

It's hard to separate the bullshit from reality when the hype is just turned to the max everywhere you turn. It feels like I'm in some elaborate psy-op where my experiences with these tools are just an order of magnitude lower than the hype and I can't even express those thoughts without having "luddite" patch attached to me. And if you read between the lines of what Karpathy wrote in his famous "anxiety" post, it kind of echoes my point. Its "an alien technology and we can't yield it right" yada yada. Which is an odd way to say "sometimes this thing works magically but a lot of the time its total shit so you aren't as productive as you would like".


Humans can hack a compiler in much less. Stop reading this hype and focus on learning

I am not against vibe coding at all, I just don't think people understand how shaky the foundation is. Software wants to be modified. With enough modifications the disconnect between the code as it is imagined and the code in reality becomes too arduous of a distance to bridge.

The current solution is to simply reroll the whole project and let the LLM rebuild everything with new knowledge. This is fine until you have real data, users and processes built on top of your project.

Maybe you can get away with doing that for a while, but tech debt needs to be paid down one way or another. Either someone makes sense of the code, or you build so much natural language scaffolding to keep the ship afloat that you end up putting in more human effort than just having someone codify it.

We are definitely headed toward a future where we have lots of these Frankenstein projects in the wild, pulling down millions in ARR but teetering in the breeze. You can definitely do this, but "a codebase always pays its debts."


This hasn’t been my experience at all working on production code bases with LLMs. What you are describing is how it was more like in gpt 3.5 era.

Not using LLMs, but using them without ever looking at the code.

As a writer and engineer, I don't see it.

Can AI kludge together a ripping story? Sure. But there is a reason people still write new books and buy new books - we crave the human connection and reflection of our current times and mores.

This isn't just a high art thing. My kids read completely different YA novels than I did, with just a few older canon titles persisting. I can hand them a book I loved as a kid and it just doesn't connect with them anymore.

How I think AI CAN produce art that people want is through careful human curation and guided generation. This is structurally the same as "human-in-the-loop" programming. We can connect to the artistry of the construction, in other words the human behind the LLM that influenced how the plot was structured, the characters developed and all the rest.

This is akin to a bad writer with a really good editor, or maybe the reverse. Either way, I think we will see a bunch of this and wring our hands because AI art is here, but I don't think we can ever take the human out of that equation. There needs to be a seed of "new" for us to give a shit.


Again, this article is not discussing the quality of generative AI. Sanderson clearly believes himself that AI is already able to produce things that are indiscernible to art from his eyes.

What this article is trying to get across is that art is a transformative process for the human who creates it, and by using LLMs to quickly generate results, robs the would be artist of the ability for that transformation to take place. Here's a quote from Sanderson:

"Why did I write White Sand Prime? It wasn’t to produce a book to sell. I knew at the time that I couldn’t write a book that was going to sell. It was for the satisfaction of having written a novel, feeling the accomplishment, and learning how to do it. I tell you right now, if you’ve never finished a project on this level, it’s one of the most sweet, beautiful, and transcendent moments. I was holding that manuscript, thinking to myself, “I did it. I did it."


This is such a sad comment.

This is great, I will definitely make use of this!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: