The weird mathemagical language processor that can pretend to be human some of the time with some effectiveness not only can fuck up if asked not to, but has a famous history of doing so.
Sorry - still author's fault. They didn't understand how LLM's work. They thought Cursor implemented some magic "I control every action LLM takes" thing. It's impossible.
right. But cursor _said_ they had some magic. At some point you have to trust vendors. I don't know exactly how AWS guarantees eleven nines of durability on S3. But I sure hope that they do.
Here is what they say, at the very top they explain that llm's are inherently unreliable. It looks like they offer security tools and safeguards, but they also provide an auto run option. There is nothing a vendor can really be responsible for someone shooting themselves in the face. You can argue that they shouldn't provide that, but that's what people want, so they do, with warnings.
It sounds like this user either didn't use security controls, approved prompts they didn't understand, or disabled the checks entirely. Working in IT/tech a big chunk of my life so far and seeing all the dumb crap people who even know better do, I would bet my house on that being the most likely scenario rather than cursor somehow being at fault here.
yeah and when you interview the junior dev who also convinces you they're smart and have something special, they also delete prod and guess what... not that devs fault.
You absolutely do not. When someone makes an unbelievable claim, such as having magic guardrails for LLMs that prevent dangerous actions (what would that even mean?!), you don’t have to trust that claim.
If you trust someone’s claim without justification, that’s on you.
> At some point you have to trust vendors. I don't know exactly how AWS guarantees eleven nines of durability on S3. But I sure hope that they do.
Trust is earned, it's built on reputations at the individual, corporate, and industry-wide levels. AWS has 20 years of reputation on which I can judge the value of their promises.
Not only has the LLM industry (it is not "AI" and never will be) absolutely not earned anything like that level of trust, the thing the technology has proven most effective at is in fact scamming. Making up something that looks/sounds convincing, especially if you aren't thinking too hard about it, is what they're best at. Combine that with a lot of money flying around and trust levels should be somewhere around "Elon Musk promises".
At this point there have been so many blatant examples of why you should never give a LLM "agent" control over production systems, but the allure of just giving some vague direction to a chatbot and telling it not to screw things up it just irresistible to some like Sideshow Bob stepping on rakes [1].
If everyone around you is whacking themselves in the face with the rake, and you know you can avoid it just by using your brain and not stepping on the rake, and avoid entirely by just keeping your rakes contained, but a rake vendor comes to you saying that instead they have built a new rake that they swear won't whack you in the face even if you leave it right in your walking path, do you trust them?
Yeah I wasn't clear with "the author is right", I think they are right to be frustrated, but that doesn't clear their own fault in the matter
It's just that it wasn't their fault alone.
This is not a polarizing issue, it's not just the authors fault, or cursors fault, or society's fault. It's everyone's, and we all got something to learn from this.
You just have to add a human in the loop for destructive calls.
Add an additional TOTP parameter to destructive calls that's generated from the agent UI that requires a human to click a button, which generates a code that's sent to the model and used in the call.
Having said that - even categorisation of destructive and non destructive calls is inherently not safe, unless you have very strict os level / VM like setup (everything read only, world access is through MCPs so it is not LLM deciding the destructive calls but the MCP etc. )
I am confused why you would write a public article about it as a financial company. But I have many things I am confused about here; I cannot really figure what they do that requires 800 people or how 130m tx/y is anything to boast about in itself. But maybe it is fantastic; I don't know.
The harm hasn't been adequately demonstrated though. Whereas we know cigarettes are harmful to everyone.
Alcohol in the UK can be consumed in the house from 5 years old. Which is the point. That societal norms at work. Everyone knows it's not ok to let your young kids get drunk, but we trust society to let parents decide what is appropriate and when.
A VPN can't get around a cigarette and alcohol ban.
Perhaps children should be given locked down phones, with fines for parents who buy non child safe phones for their kids. It would take time for this to take effect but a social media ban would actually be effective at the end.
> Just like you can't get around a random adult buying for kids. It's just an imperfect deterrent.
This argument feels really weak. Convincing an adult to buy alcohol for kids is dramatically more difficult on average than setting up a VPN.
If you’re on this tech website you should know that it’s not hard to get VPN access even with cash by buying cards at retail. You can also use one of the various free (ad supported or spyware) VPN products.
It’s nothing like trying to involve another adult and asking them to take on the legal liability of that action.
I live in the UK, though not in London. I can count on one hand the number of times a group of children asked me to buy alcohol for them. So it's not that it doesn't happen, but it almost never happens.
Compare standing outside a supermarket, repeatedly begging passers by to commit a crime for you every time you want alcohol, with the one time action of installing a VPN client on your device and it's obvious one law is enforceable while the other is not.
What? - I live in London, if you walk through a high street where there is both a secondary school and a corner store (if you say you don't know what that is, I will assume you are Trump) at around 3-4pm - you either get asked to buy cigarettes; or, refusing to do so you will get asked if you have any cigarettes. Without fail.
You are trying to make it dramatic by saying it is a crime - in this context installing a VPN is as much a crime (arguably with more traces / evidence) as buying cigarettes for teenagers.
It is not. There is no law against circumventing age gates by means of a VPN. It is illegal to promote VPN services to children as a means of circumventing age gates, but the act itself is not illegal.
Yep - It's nearly impossible to assign profit to those things - we have X revenue from Android licenses, what's the cost of an android license? Is it all the R&D that goes to UI or hardware research? What's the cost of a Youtube Ad?
Name one product that Meta created over the last 10 years that mattered - beyond adtech. They can fire everyone in every team and just retain ads (tech and sales) - and some minimal setup for instagram and whatsapp and facebook and their revenue will not take a dent. So, yes, they overhired.
This comment put everything into perspective.
I can't name anything beyond Facebook, Instagram or Whatsapp that Meta's created and I've used in the past 10 years.
I've never even (knowingly) used the LLama models tbh.
If you consider Marketplace its own product it’s a massive win but they haven’t monetized it beyond some very ineffective post boosting and advertising. I honestly think they could charge 10% of list price for items over $50, plus membership levels that reduce or remove listing fees. and make a significant amount of money.
I use marketplace to search for cars, and the algorithm is beyond frustrating. I just want some decent filters. How a company that big can create something so terrible is beyond me.
Oddly enough - facebook groups are not terrible for very niche hobbies. Not sure what makes them attractive, but the groups are there. Thinking about it - there is really no alternative. My Retro Computing group is there, car owners group is there, very niche metal bands' posters group is there.
What is the difference between the two? I treated them as synonym.
I did not said it cures the root issue, frankly I dont care that much. I am taking it so that the pain stops and I can function normally.
Paracetamol does not stop pain/hurt. It is like taking nothing and just waiting. It may help with fewer or some such, but I am not taking Ibuprofen for fewer.
Honestly a lot of useful software is ‘unimportant’ in the sense that the consequences of introducing a bug or bad code smell aren’t that significant, and can be addressed if needed. It might well be for many projects the time saved not reviewing is worth dealing with bugs that escape testing. Also, it’s entirely possible for software to be both well engineered and useless.
AI agents do not have agency(!), they have no understanding of consequences. They actually have no understanding. At all.
reply