Hacker Newsnew | past | comments | ask | show | jobs | submit | santadays's commentslogin

This is entirely too charitable. Basically all this proves is that the agent could run in a loop for a week or so, did anyone doubt that?

They marketed as if we were really close to having agents that could build a browser on their own. They rightly deserve the blowback.

This is an issue that is very important because of how much money is being thrown at it, and that effects everyone, not just the "stakeholders". At some point if it does become true that you can ask an agent to build a browser and it actually does, that is very significant.

At this point in time I personally can't predict whether that will happen or not, but the consequences of it happening seem pretty drastic.


> This is entirely too charitable. Basically all this proves is that the agent could run in a loop for a week or so, did anyone doubt that?

yes, every AI skeptic publicly doubted that right up until they started doing it.


I find it hard to believe after running agents fully autonomously for a week you'd end up with something that actually compiles and at least somewhat functions.

And I'm an optimist, not one of the AI skeptics heavily present on HN.

From the post it sounds like the author would also doubt this when he talks about "glorified autocomplete and refactoring assistants".


You don't run coding agents for a week and THEN compile their code. The best available models would have no chance of that working - you're effectively asking them to one-shot a million lines of code with not a single mistake.

You have the agents compile the code every single step of the way, which is what this project did.


With the agent running autonomously for a long time, I'd have feared it would break my build/verification tasks in an attempt to fix something.

My confidence in running an agent unsupervised for a long time is low, but to be fair that's not something I tried. I worked mostly with the agent in the foreground, at most I had two agents running at once in Antigravity.


That is a good point. It is impressive. Llms from two years ago were impressive, llms a year ago were impressive, and from a month ago even more impressive.

Still, getting "something" to compile after a week of work is very different from getting the thing you wanted.

What is being sold, and invested in, is the promise that LLMs can accomplish "large things" unaided.

But they can't, as of yet, they cannot, unless something is happening in one of the SOTA labs that we don't know about.

They can however accomplish small things unaided. However there is an upper bound, at least functionally.

I just wish everyone was on the same page about their abilities and their limitations.

To me they understand conext well (e.g. the task, build a browser doesn't need some huge specification because specifications already exist).

They can write code competently (this is my experience anyway)

They can accomplish small tasks (my experience again, "small" is a really loose definition I know)

They cannot understand context that doesn't exist (they can't magically know what you mean, but they can bring to bear considerable knowledge of pre-existing work and conventions that helps them make good assumptions and the agentic loop prompts them to ask for clarification when needed)

They cannot accomplish large tasks (again my experience)

It seems to me there is something akin to the context window into which a task can fit. They have this compact feature which I suspect is where this limitation lies. Ie a person can't hold an entire browser codebase in their head, but they can create a general top level mapping of the whole thing so they can know where to reach, where areas of improvement are necessary, how things fit together and what has been and what hasn't been implemented. I suspect this compaction doesn't work super well for agents because it is a best effort tacked on feature.

I say all this speculatively, and I am genuinely interested in whether this next level of capability is possible. To me it could go either way.


It did not compile [1], so your belief was correct.

[1] https://news.ycombinator.com/item?id=46649046


It did compile - the coding agents were compiling it constantly.

It didn't have correctly configured GitHub Actions so the CI build was broken.


Then you should have no difficulty providing evidence for your claim. Since you have been engaging in language lawyering in this thread, it is only fair your evidence be held up to the same standard and must be incontrovertible evidence for your claims with zero wiggle room.

Even though I have no burden of proof to debunk your claims as you have provided no evidence for your claims, I will point out that another commenter [1] indicates there were build errors. And the developer agrees there were build errors [2] that they resolved.

[1] https://news.ycombinator.com/item?id=46627675

[2] https://news.ycombinator.com/item?id=46650998


I mean I interviewed the engineer for 47 minutes and asked him about this and many other things directly. I think I've done enough homework on this one.

I take back the implication I inadvertently made here that it compiled cleanly the whole time - I know that's not the case, we discussed that in our interview: https://simonwillison.net/2026/Jan/23/fastrender/#intermitte...

I'm frustrated at how many people are carrying around a mental model that the project "didn't even compile" implying the code had never successfully compiled, which clearly isn't true.


Okay, so the evidence you are presenting is that the entity pushing intentionally deceptive marketing with a direct conflict of interest said they were not lying.

I am frustrated at people loudly and proudly "releasing" a system they claim works when it does not. They could have pointed at a specific version that worked, but chose not to indicating they are either intentionally deceptive or clueless. Arguing they had no opportunity for nuance and thus had no choice but to make false statements for their own benefit is ethical bankruptcy. If they had no opportunity for nuance, then they could make a statement that errs against their benefit; that is ethical behavior.


See my comment here: https://news.ycombinator.com/context?id=46771405

I do not think Cursor's statements about this project were remotely misleading enough to justify this backlash.

Which of those things would you classify as "false statements"? The use of "from scratch"?


> Arguing they had no opportunity for nuance and thus had no choice but to make false statements for their own benefit is ethical bankruptcy.

absolutely

and clueless managers seeing these headlines will almost certainly lead to people losing their jobs


> I can definitely believe that in 2026 someone at their computer with access to money can send the right emails and make the right bank transfers to get real people to grow corn for you.

I think this is the new turing test. Once it's been passed we will have AGI and all the Sam Altmans of the world will be proven correct. (This isn't a perfect test obviously, but neither was the turing test)

If it fails to pass we will still have what jdthedisciple pointed out

> a non-farmer, is doing professional farmer's work all on his own without prior experience

I am actually curious how many people really believe AGI will happen. Theres alot of talk about it, but when can I ask claude code to build me a browser from scratch and I get a browser from scratch. Or when can I ask claude code to grow corn and claude code grows corn. Never? In 2027? In 2035? In the year 3000?

HN seems rife with strong opinions on this, but does anybody really know?


Researchers love to reduce everything into formulae, and believe that when they have the right set of formulae, they can simulate something as-is.

Hint: It doesn't work that way.

Another hint: I'm a researcher.

Yes, we have found a great way to compress and remix the information we scrape from the internet, and even with some randomness, looks like we can emit the right set of tokens which makes sense, or search the internet the right way and emit these search results, but AGI is more than that.

There's so much tacit knowledge and implicit computation coming from experience, emotions, sensory inputs and from our own internal noise. AI models doesn't work on those. LLMs consume language and emit language. The information embedded in these languages are available to them, but most of the tacit knowledge is just an empty shell of the thing we try to define with the limited set of words.

It's the same with anything we're trying to replace humans in real world, in daily tasks (self-driving, compliance check, analysis, etc.).

AI is missing the magic grains we can't put out as words or numbers or anything else. The magic smoke, if you pardon the term. This is why no amount of documentation can replace a knowledgeable human.

...or this is why McLaren Technology Center's aim of "being successful without depending on any specific human by documenting everything everyone knows" is an impossible goal.

Because like it or not, intuition is real, and AI lacks it. Irrelevant of how we derive or build that intuition.


> There's so much tacit knowledge and implicit computation coming from experience, emotions, sensory inputs and from our own internal noise.

The premise of the article is stupid, though...yes, they aren't us.

A human might grow corn, or decide it should be grown. But the AI doesn't need corn, it won't grown corn, and it doesn't need any of the other things.

This is why, they are not useful to us.

Put it in science fiction terms. You can create a monster, and it can have super powers, _but that does not make it useful to us_. The extremely hungry monster will eat everything it sees, but it won't make anyone's life better.


The Torment Nexus can't even put a loaf of bread on my table, so it's obvious we have nothing to fear from it!

I agree we don't have much to (physically) fear from it...yet. But the people who can't take "no" for an answer and don't get that it is fundamentally non-human, I can believe they are quite dangerous.

  > Hint: It doesn't work that way.
I mean... technically it would work this way but, and this is a big but, reality is extremely complicated and a model that can actually be a reliable formula has to be extremely complicated. There's almost certainly no globally optimal solutions to these types of problems, not to mention that the solution space is constantly changing as the world does. I mean this is why we as humans and all animals work in probabilistic frameworks that are highly adaptable. Human intuition. Human ingenuity. We simply haven't figured out how to make models at that level of sophistication. Not even in narrow domains! What AI has done is undeniably impressive, wildly impressive even. Which is why I'm so confused why we embellish it so much.

It's really easy to think everything is easy when we look at problems from 40k feet. But as you come down to Earth the complexity exponentially increases and what was a minor detail is now a major problem. As you come down resolution increases and you see major problems that you couldn't ever see from 40k feet.

As a researcher, I agree very much with you. And as an AI researcher one of the biggest issues I've noticed with AI is that they abhor detail and nuance. Granted, this is common among humans too (and let's not pretend CS people don't have a stereotype of oversimplification and thinking all things are easy). While people do this frequently they also don't usually do it in their niche domains, and if they are we call them juniors. You get programmers thinking building bridges is easy[0] while you get civil engineers thinking writing programs is easy. Because each person understands the other's job only at 40k feet and are reluctant to believe they are standing so high[1]. But AI? It really struggles with detail. It really struggles with adaptation. You can get detail out but it often requires significant massaging and it'll still be a roll of the dice[2]. You also can get the AI to change course, a necessary thing as projects evolve[3]. Anyone who's tried vibe coding knows the best thing to do is just start over. It's even in Anthropic's suggestion guide.

My problem with vibe coding is that it encourages this overconfidence. AI systems still have the exact same problem computer systems do: they do exactly what you tell them to. They are better at interpreting intent but that blade cuts both ways. The major issue is you can't properly evaluate a system's output unless you were entirely capable of generating the output. The AI misses the details. Doubt me? Look at Proof of Corn! The fred page is saying there's an API error[4]. The sensor page doesn't make sense (everything there is fine for an at home hobby project but anyone that's worked with those parts knows how unreliable they are. Who's going to do all the soldering? You making PCBs? Where's the circuit to integrate everything? How'd we get to $300? Where's the detail?). Everything discussed is at a 40k foot view.

[0] https://danluu.com/cocktail-ideas/

[1] I'm not sure why people are afraid of not knowing things. We're all dumb as shit. But being dumb as shit doesn't mean we aren't also impressive and capable of genius. Not knowing something doesn't make you dumb, it makes you human. Depth is infinite and we have priorities. It's okay to have shallow knowledge, often that's good enough.

[2] As implied, what is enough detail is constantly up for debate.

[3] No one, absolutely nobody, has everything figured out from the get-go. I'll bet money none of you have written a (meaningful) program start to finish from plans, ending up with exactly what you expect, never making an error, never needing to change course, even in the slightest.

Edit:

[4] The API issue is weird and the more I look at the code the more weird things are. Like there's a file decision-engine/daily_check.py that has a comment to set a cron job to run every 8 hours. It says to dump data to logs/daily.log but that file doesn't exist but it will write to logs/all_checks.jsonl which appears to have the data. So why in the world is it reading https://farmer-fred.sethgoldstein.workers.dev/weather?


I think once we get off LLM's and find something that more closely maps to how humans think, which is still not known afaik. So either never or once the brain is figured out.

I'd agree that LLMs are a dead end to AGI, but I don't think that AI needs to mirror our own brains very closely to work. It'd be really helpful to know how our brains work if we wanted to replicate them, but it's possible that we could find a solution for AI that is entirely different from human brains while still having the ability to truly think/learn for itself.

> ... I don't think that AI needs to mirror our own brains very closely to work.

Mostly agree, with the caveat that I haven't thought this through in much depth. But the brain uses many different neurotransmitter chemicals (dopamine, serotonin, and so on) as part of its processing, it's not just binary on/off signals traveling through the "wires" made of neurons. Neural networks as an AI system are only reproducing a tiny fraction of how the brain works, and I suspect that's a big part of why even though people have been playing around with neural networks since the 1960's, they haven't had much success in replicating how the human mind works. Because those neurotransmitters are key in how we feel emotion, and even how we learn and remember things. Since neural networks lack a system to replicate how the brain feels emotion, I strongly suspect that they'll never be able to replicate even a fraction of what the human brain can do.

For example, the "simple" act of reaching up to catch a ball doesn't involve doing the math in one's head. Rather, it's strongly involved with muscle memory, which is strongly connected with neurotransmitters such as acetylcholine and others. The eye sees the image of the ball changing in direction and subtly changing in size, the brain rapidly predicts where it's going to be when it reaches you, and the muscles trigger to raise the hands into the ball's path. All this happens without any conscious thought beyond "I want to catch that ball": you're not calculating the parabolic arc, you're just moving your hands to where you already know the ball will be, because your brain trained for this since you were a small child playing catch in the yard. Any attempt to replicate this without the neurotransmitters that were deeply involved in training your brain and your muscles to work together is, I strongly suspect, doomed to failure because it has left out a vital part of the system, without which the system does not work.

Of course, there are many other things AIs are being trained for, many of which (as you said, and I agree) do not require mimicking the way the human brain works. I just want to point out that the human brain is way more complex than most people realize (it's not merely a network of neurons, there's so much more going on than that) and we just don't have the ability to replicate it with current computer tech.


This is where it’s a mistake to conflate sentience and intelligence. We don’t need to figure out sentience, just intelligence.

Is there intelligence without sentience ?

Nobody can know, but I think it is fairly clearly possible without signs of sentience that we would consider obvious and indisputable. The definition of 'intelligence' is bearing a lot of weight here, though, and some people seem to favour a definition that makes 'non-sentient intelligence' a contradiction.

As far as I know, and I'm no expert in the field, there is no known example of intelligence without sentience. Actual AI is basically algorithm and statistics simulating intelligence.

Definitely a definition / semantics thing. If I ask an LLM to sketch the requirements for life support for 46 people, mixed ages, for a 28 month space journey… it does pretty good, “simulated” or not.

If I ask a human to do that and they produce a similar response, does it mean the human is merely simulating intelligence? Or that their reasoning and outputs were similar but the human was aware of their surroundings and worrying about going to the dentist at the same time, so genuinely intelligent?

There is no formal definition to snap to, but I’d argue “intelligence” is the ability to synthesize information to draw valid conclusions. So, to me, LLMs can be intelligent. Though they certainly aren’t sentient.


Can you spell out your definition of 'intelligence'? (I'm not looking to be ultra pedantic and pick holes in it -- just to understand where you're coming from in a bit more detail.) The way I think of it, there's not really a hard line between true intelligence and a sufficiently good simulation of intelligence.

I would say that "true" intelligence will allow someone/something to build a tool that never existed before while intelligence simulation will only allow someone/something to reproduce tools that already known. I would make a difference between someone able to use all his knowledge to find a solution to a problem using tools he knows of and someone able to discover a new tool while solving the same problem. I'm not sure the latter exists without sentience.

I honestly don't think humans fit your definition of intelligent. Or at least not that much better than LLMs.

Look at human technology history...it is all people doing minor tweaks on what other people did. Innovation isn't the result of individual humans so much as it is the result of the collective of humanity over history.

If humans were truly innovative, should we not have invented for instance at least a way of society and economics that was stable, by now? If anything surprise me about humans it is how "stuck" we are in the mold of what others humans do.

Circulate all the knowledge we have over and over, throw in some chance, some reasoning skills of the kind LLMs demonstrate every day in coding, have millions of instances most of whom never innovate anything but some do, and a feedback mechanism -- that seems like human innovation history to me, and does not seem like demonstrating anything LLMs clearly do not possess. Except of course not being plugged into history and the world the way humans are.


We have those eureka moments, whene good idea appears out of nowhere. I would say this "nowhere" is intelligence without sentience.

I think we are closer than most folks would like to admit.

in my wild guess opinion:

- 2027: 10%

- 2030s: 50%

- 2040: >90%

- 3000: 100%

Assuming we don't see an existential event before then, i think it's inevitable, and soon.

I think we are gonna be arguing about the definition of "general intelligence" long after these system are already running laps around humans at a wide variety of tasks.


This is pretty unlikely for the same reason that India is far from industrialized.

When people aren’t super necessary (aka rare), people are cheap.


"new turing test" indeed!,any farmer worth his salt will smell a sucker and charge acordingly

One definition of analysis is: The process of separating something into its constituent elements.

I think when someone designs a software system, this is the root process, to break a problem into parts that can be manipulated. Humans do this well, and some humans do this surprisingly well. I suspect there is some sort of neurotransmitter reward when parsimony meets function.

Once we can manipulate those parts we tend to reframe the problem as the definition of those parts, the problem ceases to exist and what is left is only the solution.

With coding agents we end up in weird place, one, we have to just give them the problem, or we have to give them the solution. Giving them the solution means that we have to give them more and more details until they arrive at what we want. Giving an agent the problem we never really get the satisfaction of the problem dissolving into the solution.

At some level we have to understand what we want. If we don't we are completely lost.

When the problem changes we need to understand it, orient ourselves to it, find which parts still apply and which need to change and what needs to be added, if we had no part in the solution we are that much further behind in understanding it.

I think this, at an emotional level is what developers are responding to.

Assumptions baked into the article are:

You can keep adding features and Claude will just figure it out, sure, but for whom, and will they understand it.

Performance won't demand you prioritize feature A over feature B.

Security (that you don't understand) will be implemented over feature C, because Claude knows better.

Claude will keep getting more intelligent.

The only assumption I think is right, is that Claude will keep getting better. All the other assumptions require you know WTF you are doing (which we do, but for how long will we know what we are doing).


Maybe one day our knee jerk reactionary outrage will be quelled not by any enlightenment but because we are forced to grow weary of falling prey to phishing attacks.

I'd feel pretty stupid getting worked up about something only to realize that getting worked up about it was used against me.

I'm writing this because for a moment I did get worked up and then had the slow realization it was a phishing attack, slightly before the article got to the point.

Anyways, I think the clickbait is kindof appropriate here because it rather poignantly captures what is going on.


I agree. It can demonstrate the knee-jerk affect in real time for the reader. Someone who reacts strongly to the title of this thread would have experienced a similar reaction if they had received the SendGrid phish email. Never seen clickbait wording actually be appropriate before.


When I see stories that make me want to click, I read HN comments first, and 8 times in ten that saves me a from a "won't get fooled again" moment.

There's got to be a way to generalize this for anyone who still cares about the difference between real facts and manipulation.


The effectiveness of these techniques will die off over time as young people are increasingly inoculated against them in the same way our generations are generally immune to traditional advertising. The memetics filters get better over time as us geezers are replaced by new models.


I've seen the following quote.

"The energy consumed per text prompt for Gemini Apps has been reduced by 33x over the past 12 months."

My thinking is that if Google can give away LLM usage (which is obviously subsidized) it can't be astronomically expensive, in the realm of what we are paying for ChatGPT. Google has their own TPUs and company culture oriented towards optimizing the energy usage/hardware costs.

I tend to agree with the grandparent on this, LLMs will get cheaper for what we have now level intelligence, and will get more expensive for SOTA models.


Google is a special case - ever since LLMs came out I've been pointing out that Google owns the entire vertical.

OpenAI, Anthropic, etc are in a race to the bottom, but because they don't own the vertical they are beholden to Nvidia (for chips), they obviously have less training data, they need constant influsx of cash just to stay in that race to the bottom, etc.

Google owns the entire stack - they don't need nvidia, they already have the data, they own the very important user-info via tracking, they have millions, if not billions, of emails on which to train, etc.

Google needs no one, not even VCs. Their costs must be a fraction of the costs of pure-LLM companies.


> OpenAI, Anthropic, etc are in a race to the bottom

There's a bit of nuance hiding in the "etc". Openai and anthropic are still in a race for the top results. Minimax and GLM are in the race to the bottom while chasing good results - M2.1 is 10x cheaper than Sonnet for example, but practically fairly close in capabilities.


> There's a bit of nuance hiding in the "etc". Openai and anthropic are still in a race for the top results.

That's not what is usually meant by "race to the bottom", is it?

To clarify, in this context I mean that they are all in a race to be the lowest margin provider.

They re at the bottom of the value chain - they sell tokens.

It's like being an electricity provider: if you buy $100 or electricity and produce 100 widgets, which you sell for $1k each, that margin isn't captured by the provider.

That's what being at the bottom of the value chain means.


I get what it means, but it doesn't look to me like they're trying that yet. They don't even care that people buy multiple highest level plans to rotate them every week, because they don't provide a high enough tier for the existing customers. I don't see any price war happening. We don't know what their real margins are, but I don't see the race there. What signs do you see that Anthropic and Openai are in the race to the bottom?


> I don't see any price war happening. What signs do you see that Anthropic and Openai are in the race to the bottom?

There doesn't need to be signs of a race (or a price-war),only signs of commodification; all you need is a lack of differentiation between providers for something to turn into a commodity.

When you're buying a commodity, there's no big difference between getting your commodity delivered by $PROVIDER_1 and getting your commodity delivered by $PROVIDER_2.

The models are all converging quality-wise. Right now the number of people who swear by OpenAI models are about the same as the number of people who swear by Anthropic models, which are about the same as the number of people who swear by Google's models, etc.

When you're selling a commodity, the only differentiation is in the customer experience.

Right now, sure, there's no price war, but right now almost everyone who is interested are playing with multiple models anyway. IOW, the target consumers are already treating LLMs as a commodity.


Gmail has 1.8b active users, each with thousands of emails in their inbox. The number of emails they can train of is probably in the trillions.


Email seems like not only a pretty terrible training data set, since most of it is marketing spam with dubious value, but also an invasion of privacy, since information could possibly leak about individuals via the model.


> Email seems like not only a pretty terrible training data set, since most of it is marketing spam with dubious value

Google probably even has an advantage there: filter out everything except messages sent from valid gmail account to valid gmail account. If you do that you drop most of the spam and marketing, and have mostly human-to-human interactions. Then they have their spam filters.


I'd upgrade that "probably" leak to "will absolutely" leak, albeit with some loss of fidelity.

Imagine industrial espionage where someone is asking the model to roleplay a fictional email exchange between named corporate figures in a particular company.


> Google has ... company culture oriented towards optimizing the energy usage/hardware costs.

Google has a company culture of luring you in with freebies and then mining your data to sell ads.


> if Google can give away LLM usage (which is obviously subsidized) it can't be astronomically expensive

There is a recent article by Linus Sebastian (LTT) talking about Youtube: it is almost impossible to support the cost to build a competitor because it is astronomically expensive (vs potential revenue)


I do not disagree they will get cheaper, but I pointing out that none of this is being reflected in hardware pricing. You state LLMs are becoming more optimized (less expensive). I agree. This should have a knockon effect on hardware prices, but it is not. Where is the disconnect? Are hardware prices a lagging indicator? Is Nvidia still a 5 trillion dollar company if we see another 33x improvement in "energy consumed per text prompt"?


Jevon's paradox. As AI gets more efficient its potential scope expands further and the hardware it runs on becomes even more valuable.

BTW, the absolute lowest "energy consumed per logical operation" is achieved with so-called 'neuromorphic' hardware that's dog slow in latency terms but more than compensates with extreme throughput. (A bit like an even more extreme version of current NPU/TPUs.) That's the kind of hardware we should be using for AI training once power use for that workload is measured in gigawatts. Gaming-focused GPUs are better than your average CPU, but they're absolutely not the optimum.


GraalVM supports running javascript in a sandbox with a bunch of convenient options for running untrusted code.

https://www.graalvm.org/latest/security-guide/sandboxing/


Oh that looks neat! It appears to have the memory limits I want (engine.MaxIsolateMemory) and a robust CPU limit: sandbox.MaxCPUTime

One catch: the sandboxing feature isn't in the "community edition", so only available under the non-open-source (but still sometimes free, I think?) Oracle GraalVM.


I get this take, but given the state of the world (the US anyways), I find it hard to trust anyone with any kind of profit motive. I feel like any information can’t be taken as fact, it can just be rolled into your world view and discarded if useful or not. If you need to make a decision that can’t be backed out of that has real world consequences I think/hope most people are learning to do as much due diligence as reasonable. Llms seem at this moment to be trying to give reliable information. When they’ve been fine tuned to avoid certain topics it’s obvious. This could change but I suspect it will be hard to find tune them too far in a direction without losing capability.

That said, it definitely feels as though keeping a coherent picture of what is actually happening is getting harder, which is scary.


I feel like any information can’t be taken as fact, it can just be rolled into your world view and discarded if useful or not.

The concern, I think, is that for many that “discard function” is not, “Is this information useful?”. Instead: “Does this information reinforce my existing world view?”

That feedback loop and where it leads is potentially catastrophic at societal scale.


This was happening well before LLMs, though. If anything, I have hope that LLMs might break some people out of their echo chambers if they ask things like "do vaccines cause autism?"


> I have hope that LLMs might break some people out of their echo chambers

Are LLMs "democratized" yet, though? If not, then it's just-as-likely that LLMs will be steered by their owners to reinforce an echo-chamber of their own.

For example, what if RFK Jr launched an "HHS LLM" - what then?


... nobody would take it seriously? I don't understand the question.


> I find it hard to trust anyone with any kind of profit motive.

As much as this is true, and i.e. doctors for sure can profit (here in my country they don't get any type of sponsor money AFAIK, other than having very high rates), there is still accountability.

We have built a society based on rules and laws, if someone does something that can harm you, you can follow the path to at least hold someone accountable (or, try).

The same cannot be said about LLMs.


>there is still accountability

I mean there is some if they go wildly off the rails, but in general if the doctor gives a prognosis based on a tiny amount of the total corpus of evidence they are covered. Works well if you have the common issue, but can quickly go wrong if you have the uncommon one.


Comparing anything real professionals do to the endless, unaccountable, unchangeable stream of bullyshit from AI is downright dishonest.

This is not the same scale of problem.


I can’t imagine this is not happening. There exists the will, the means and the motivation, with not a small dose of what pg might call naughtiness.


Don't know about excel, but for Google Sheets. You can ask chatgpt to write you a appsscript custom function e.g CALL_OPENAI. Then you can pass in variables into. =CALL_OPEN("Classify this survey response as positive, negative, or off-topic: "&A1)


Sheets also has an `AI` formula now that you can use to invoke Gemini models directly.


When I tried the Gemini/AI formula it didn’t work very well, gpt-5 mini or nano are cheap and generally do what you want if you are asking something straightforward about a piece of content you give them. You can also give a json schema to make the results more deterministic.


It seems like there is a bunch of research/working implementations that allow efficient fine tuning of models. Additionally there are ways to tune the model to outcomes vs training examples.

Right now the state of the world with LLMs is that they try to predict a script in which they are a happy assistant as guided by their alignment phase.

I'm not sure what happens when they start getting trained in simulations to be goal oriented, ie their token generation is based off not what they think should come next but what should come next in order to accomplish a goal. Not sure how far away that is but it is worrying.


That's already happening. It started happening when they incorporated reinforcement learning into the training process.

It's been some time since LLMs were purely stochastic average-token predictors; their later RL fine tuning stages make them quite goal-directed, and this is what has given some big leaps in verifiable domains like math and programming. It doesn't work that well with nonverifiable domains, though, since verifiability is what gives us the reward function.


That makes sense for why they are so much better at writing code than actually following the steps the same code specifies.

Curious, is anyone training in adversarial simulations? In open world simulations?

I think what humans do is align their own survival instinct with a surrogate activities and then rewrite their internal schema to be successful in said activities.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: