This is entirely too charitable. Basically all this proves is that the agent cou...

anthonypasq96 · 2026-01-26T22:33:46 1769466826

> This is entirely too charitable. Basically all this proves is that the agent could run in a loop for a week or so, did anyone doubt that?

yes, every AI skeptic publicly doubted that right up until they started doing it.

user34283 · 2026-01-26T21:22:39 1769462559

I find it hard to believe after running agents fully autonomously for a week you'd end up with something that actually compiles and at least somewhat functions.

And I'm an optimist, not one of the AI skeptics heavily present on HN.

From the post it sounds like the author would also doubt this when he talks about "glorified autocomplete and refactoring assistants".

simonw · 2026-01-26T22:32:24 1769466744

You don't run coding agents for a week and THEN compile their code. The best available models would have no chance of that working - you're effectively asking them to one-shot a million lines of code with not a single mistake.

You have the agents compile the code every single step of the way, which is what this project did.

user34283 · 2026-01-27T07:55:21 1769500521

With the agent running autonomously for a long time, I'd have feared it would break my build/verification tasks in an attempt to fix something.

My confidence in running an agent unsupervised for a long time is low, but to be fair that's not something I tried. I worked mostly with the agent in the foreground, at most I had two agents running at once in Antigravity.

Veserv · 2026-01-26T22:29:39 1769466579

It did not compile [1], so your belief was correct.

[1] https://news.ycombinator.com/item?id=46649046

simonw · 2026-01-26T22:30:36 1769466636

It did compile - the coding agents were compiling it constantly.

It didn't have correctly configured GitHub Actions so the CI build was broken.

Veserv · 2026-01-26T22:54:24 1769468064

Then you should have no difficulty providing evidence for your claim. Since you have been engaging in language lawyering in this thread, it is only fair your evidence be held up to the same standard and must be incontrovertible evidence for your claims with zero wiggle room.

Even though I have no burden of proof to debunk your claims as you have provided no evidence for your claims, I will point out that another commenter [1] indicates there were build errors. And the developer agrees there were build errors [2] that they resolved.

[1] https://news.ycombinator.com/item?id=46627675

[2] https://news.ycombinator.com/item?id=46650998

simonw · 2026-01-26T22:56:05 1769468165

I mean I interviewed the engineer for 47 minutes and asked him about this and many other things directly. I think I've done enough homework on this one.

I take back the implication I inadvertently made here that it compiled cleanly the whole time - I know that's not the case, we discussed that in our interview: https://simonwillison.net/2026/Jan/23/fastrender/#intermitte...

I'm frustrated at how many people are carrying around a mental model that the project "didn't even compile" implying the code had never successfully compiled, which clearly isn't true.

Veserv · 2026-01-26T23:17:03 1769469423

Okay, so the evidence you are presenting is that the entity pushing intentionally deceptive marketing with a direct conflict of interest said they were not lying.

I am frustrated at people loudly and proudly "releasing" a system they claim works when it does not. They could have pointed at a specific version that worked, but chose not to indicating they are either intentionally deceptive or clueless. Arguing they had no opportunity for nuance and thus had no choice but to make false statements for their own benefit is ethical bankruptcy. If they had no opportunity for nuance, then they could make a statement that errs against their benefit; that is ethical behavior.

simonw · 2026-01-26T23:30:58 1769470258

See my comment here: https://news.ycombinator.com/context?id=46771405

I do not think Cursor's statements about this project were remotely misleading enough to justify this backlash.

Which of those things would you classify as "false statements"? The use of "from scratch"?

blibble · 2026-01-26T23:30:54 1769470254

> Arguing they had no opportunity for nuance and thus had no choice but to make false statements for their own benefit is ethical bankruptcy.

absolutely

and clueless managers seeing these headlines will almost certainly lead to people losing their jobs

santadays · 2026-01-26T21:58:35 1769464715

That is a good point. It is impressive. Llms from two years ago were impressive, llms a year ago were impressive, and from a month ago even more impressive.

Still, getting "something" to compile after a week of work is very different from getting the thing you wanted.

What is being sold, and invested in, is the promise that LLMs can accomplish "large things" unaided.

But they can't, as of yet, they cannot, unless something is happening in one of the SOTA labs that we don't know about.

They can however accomplish small things unaided. However there is an upper bound, at least functionally.

I just wish everyone was on the same page about their abilities and their limitations.

To me they understand conext well (e.g. the task, build a browser doesn't need some huge specification because specifications already exist).

They can write code competently (this is my experience anyway)

They can accomplish small tasks (my experience again, "small" is a really loose definition I know)

They cannot understand context that doesn't exist (they can't magically know what you mean, but they can bring to bear considerable knowledge of pre-existing work and conventions that helps them make good assumptions and the agentic loop prompts them to ask for clarification when needed)

They cannot accomplish large tasks (again my experience)

It seems to me there is something akin to the context window into which a task can fit. They have this compact feature which I suspect is where this limitation lies. Ie a person can't hold an entire browser codebase in their head, but they can create a general top level mapping of the whole thing so they can know where to reach, where areas of improvement are necessary, how things fit together and what has been and what hasn't been implemented. I suspect this compaction doesn't work super well for agents because it is a best effort tacked on feature.

I say all this speculatively, and I am genuinely interested in whether this next level of capability is possible. To me it could go either way.