Related: https://news.ycombinator.com/item?id=46223311
Switching to paid version was already problematic as it uses GCP account system which doesn't have a spending limit, and API keys do not have an expiration date. So the free offer was great for freelancers and SMEs and yet the paid version was the worst possible scenario you can imagine for same freelancers and SMEs. OpenRouter + free models and the increased rate limit you get after buying 10 credits (10€) is my current favourite choice for learning/teaching.
Microsoft is using the deep penetration of SharePoint in companies to sell Copilot license. At least in France it's well and alive and I see much more Copilot licenses than actual OpenAI uses.
There's about a dozen workarounds around context limits, agents being one of them, MCP servers being another one, AGENTS.md being the third one, but none of them actually solve the issue of a context window being so small that it's useless for anything even remotely complex.
Let's imagine a codebase that can fit onto a revolutionary piece of technology known as a floppy drive. As we all know, a floppy drive can store <2 megabytes of storage. But a 100k tokens is only about 400 kilobytes. So, to process the whole codebase that can fit onto a floppy drive, you need 5 agents plus the sixth "parent process" that those 5 agents will report to.
Those five agents can report "no security issues found" in their own little chunk of the codebase to the parent process, and that parent process will still be none the wiser about how those different chunks interact with each other.
You can have an agent that focuses on studying the interactions. What you're saying is that an AI cannot find every security issue but neither do humans otherwise we wouldn't have security breaches in the first place. You are describing a relatively basic agentic setup mostly using your AI-assisted text editor but a commercial security bot is a much more complex beast hopefully. You replace context by memory and synthesis for instance, the same way our brain works.
In one instance it could not even describe why a test is bad unit test (asserting true is equal to true), which doesn’t even require context or multi file reasoning.
Its almost as if it has additional problems beyond the context limits :)
You may want to try using it, anecdotes often differ from theories, especially when they are being sold to you for profit. It takes maybe a few days to see a pattern of ignoring simple instructions even when context is clean. Or one prompt fixes one issue and causes new issues, rinse and repeat. It requires human guidance in practice.
Strongman: LLMs aren't a tool, they're fuzzy automation.
And what keeps security problems from making it into prod in the real world?
Code review, testing, static and dynamic code scanning, and fuzzing.
Why aren't these things done?
Because there isn't enough people-time and expertise.
So in order for LLMs to improve security, they need to be able to improve our ability to do one of: code review, testing, static and dynamic code scanning, and fuzzing.
It seems very unlikely those forms of automation won't be improved in the near future by even the dumbest form of LLMs.
And if you offered CISOs a "pay to scan" service that actually worked cross-language and -platform (in contrast to most "only supported languages" scanners), they'd jump at it.
And that buys you what, exactly? Your point is 100% correct and why LLMs are no where near able to manage / build complete simple systems and surely not complex ones.
Why? Context. LLMs, today, go off the rails fairly easily. As I've mentioned in prior comments I've been working a lot with different models and agentic coding systems. When a code base starts to approach 5k lines (building the entire codebase with an agent) things start to get very rough. First of all, the agent cannot wrap it's context (it has no brain) around the code in a complete way. Even when everything is very well documented as part of the build and outlined so the LLM has indicators of where to pull in code - it almost always cannot keep schemas, requirements, or patterns in line. I've had instances where APIs that were being developed were to follow a specific schema, should require specific tests and should abide by specific constraints for integration. Almost always, in that relatively small codebase, the agentic system gets something wrong - but because of sycophancy - it gleefully informs me all the work is done and everything is A-OK! The kicker here is that when you show it why / where it's wrong you're continuously in a loop of burning tokens trying to put that train back on the track. LLMs can't be efficient with new(ish) code bases because they're always having to go lookup new documentation and burning through more context beyond what it's targeting to build / update / refactor / etc.
So, sure. You can "call an LLM multiple times". But this is hugely missing the point with how these systems work. Because when you actually start to use them you'll find these issues almost immediately.
To add onto this, it is a characteristic of their design to statistically pick things that would be bad choices, because humans do too. It’s not more reliable than just taking a random person off the street of SF and giving them instructions on what to copy paste without any context. They might also change unrelated things or get sidetracked when they encounter friction. My point is that when you try to compensate by prompting repeatedly, you are just adding more chances for entropy to leak in — so I am agreeing with you.
> To add onto this, it is a characteristic of their design to statistically pick things that would be bad choices, because humans do too.
Spot on. If we look at, historically, "AI" (pre-LLM) the data sets were much more curated, cleaned and labeled. Look at CV, for example. Computer Vision is a prime example of how AI can easily go off the rails with respect to 1) garbage input data 2) biased input data. LLMs have these two as inputs in spades and in vast quantities. Has everyone forgotten about Google's classification of African American people in images [0]? Or, more hilariously - the fix [1]? Most people I talk to who are using LLMs think that the data being strung into these models has been fine tuned, hand picked, etc. In some cases for small models that were explicitly curated, sure. But in the context (no pun) of all the popular frontier models: no way in hell.
The one thing I'm really surprised nobody is talking about is the system prompt. Not in the manner of jailbreaking it or even extracting it. But I can't imagine that these system prompts aren't collecting mass tech debt at this point. I'm sure there's band aid after band aid of simple fixes to nudge the model in ever so different directions based on things that are, ultimately, out of the control of such a large culmination of random data. I can't wait to see how these long term issues crop and and duct taped for the quick fixes these tech behemoths are becoming known for.
Talking about the debt of a system prompt feels really weird. A system prompt tied to an LLM is the equivalent of crafting a new model in the pre-LLM era. You measure their success using various quality metrics. And you improve the system prompt progressively to raise these metrics. So it feels like bandaid but that's actually how it's supposed to work and totally equivalent to "fixing" a machine learning model by improving the dataset.
"You can investigate this yourself by putting a logging proxy between the claude code CLI and the Anthropic API using ANTHROPIC_BASE_URL" I'd be eager to read a tutorial about that I never know which tool to favour for doing that when you're not a system or network expert.
agree - i've had claude one-shot this for me at least 10 times at this point cause i'm too lazy to lug whatever code around. literally made a new one this morning
I call that self-destructive prompting in the sense that you use AI to output programs that replace calling the AI in the future. The paper seems to indicate that this also brings much better results. However it's subject to attacks as running generated code is usually unsafe. A sandbox has to be used, major agentic AI players are providing some solutions, like Langchain sandbox released earlier this year.
If the generated code uses a suitable programming language, like the safe subset of Haskell, then the risk is significantly lower. Anyway it makes sense to execute this code in the user's browser instead of on the server.
Yeah I mean you can replace sandboxing buy other safe alternatives but the idea is the same, the generated code has to be considered as 100% untrusted. Supply chain attacks are especially nasty.
I am pretty sure you can figure massive loopholes like how it's legal to train the model on stolen data but not to steal data etc.
For instance advertisers can push model benchmarks that favours some opinions, based on a biased selection of research papers.
I think we've only seen the beginnings of what intricate business models can be figured for an AI company, it's much more convoluted than a search engine or even a social network.
"that could redefine the web economy" I don't think that ads in ChatGPT are that disruptive, it's just another channel. I think ChatGPT apps are an order magnitude more game changing, as they are not a new markting channel but a new distribution channel for software. Your next ad will still be an ad, but your next SaaS might be a ChatGPT App.
If you are an European, regulation also has the benefit to induce soft protectionism from countries that are less keen on consumer and environment protection. This is the heart of the debate about Mercorsur, as it creates an unfair competition by lowering regulation (in theory european regulation applies but in practice it's harder to verify), and also an internal debate in France related to some pesticide that other European countries can use. Some argue that we should allow the pesticide, some that we should stop importing products that are exposed to it.
Realistically, the reason the EU is a customs union and not a trade union is because they need to implement protectionist policies to prevent their imdustry from being outcompeted by countries which don't suffer from these regulations.
Maybe because people don't have unlimited amount of time to keep up-to-date on all data and research on toxicity, negative health effects, safety, etc on tens of thousands of products from a couple hundred countries.
Any product could apply for regulatory approval in the country where it is being sold. If the product does not get regulatory approval, it could be sold in a special shop, so customers are aware that they are taking a risk. That lets customers choose for themselves whether they want to take the risk.
Because people don't look at country of origin. They are mostly price sensitive.
If you allow imports from countries with looser regulations, you are basically putting your own sectors at a competitive disavantage in your own market. It's akin to killing it basically.
It's obviously extremely stupid but in the case of the Mercosur agreement, predictably Germany doesn't care because the agribusiness is in France and they themselves will be able to export their cars.
Generally speaking, Germany never cares about deeply shafting the rest of the union when it gives them a small advantage. See also how their economy is deeply unbalanced, they have under invested for decades and they only survive because they are part of a monetary union devoid of a fiscal union giving them the tremendous advantage of an undervalued currency at the expense of basically every southern members. See also how they made joining the currency union mandatory for entering the common market and are pushing for adding more poor eastern countries to exploit which also conveniently vote for the EPP and dillute any chance the southern countries could ally to oppose them.
Obviously, the currency union has no clear path to exit it.
1. More euro using countries with weaker economies ensure the euro stay as low as possible which is insanely advantageous for Germany, a country which has built all its economy on exports. Plus it provides a new outcome for the German excess savings via credits which will amplify the unbalancing created by the monetary policies and add a vicious extractive cycle on top.
2. These countries tend to prioritise their immediate safety from Russia to any economical considerations and are strongly NATO aligned. They have historically voted for parties which are close to the EPP, the currently dominant European party which is itself controlled by and subservient to German interests. See how Von Der Leyen was basically saved by Poland in 2024. This ensure the EPP remains the dominant force in Europe and significantly dilutes the voices of countries strongly disavantaged by how the eurozone is working and which could be tempted to ally to try to push reforms (Portugal, Spain, Italy, Greece, France). Generally, expension strongly favours the current status quo, itself extremely favourable to Germany, Austria and the Netherlands.
I'm confused, Europeans on HN are always telling me how NATO is a big scheme the US uses to keep the dollar strong. Now you're telling me the EPP is a big scheme from Germany to keep the euro weak. Something's not adding up.
This requires some actual history, not just memes and conspiracy theories.
Originally, it was the French during Mitterrand times who pushed for a unified European currency. Kohl granted it to them in exchange for their consent to unify Germany, but wasn't happy about it, because he knew that conservative German voters were attached to the strength of the Deutsche Mark.
Nevertheless, 15-20 years on, it actually turned out that a weaker euro was a problem for industry in places like France and Italy, while supporting German exports. Germany had a streak of really strong exports.
Nowadays, it does not matter anymore, though. Aging of the population, expensive energies, bureaucracy gone wild and bad immigration policies have made Germany a sick man of Europe again. When it comes to raw industrial growth, the strongest player in the EU is now Poland, which does not even use the euro.
The EPP is a political party not a scheme but yes, Germany benefits immensely from a weak euro as a net exporter and the way the eurozone is structured, as a monetary union without a fiscal union, and how it operates, roughly with transfers being very limited, a big no no for the population of the advantaged countries if not an impossibility considering the historical rulings of the German constitutional court, ensure it stays this way.
I have no personal opinion on NATO being a big scheme to keep the dollar strong. I personally think its creation had more to do with limiting the spread of the USSR and ensuring the former European empires remained in vassal positions following the second world war. Still, as a net importer, the USA generally benefits from a strong dollar. The dollar is in a fairly unique position anyway as it remains the internation reserve currency.
I fail to see what's not adding up here personnaly.
Replying to inglor_cz here because dang rate limited me because one of my post about Rust was apparently grounded but written in what dang considers a "flamebaity" way while being highly upvoted:
To me, that's a deep misrepresentation of the systemic issue at stake.
Germany didn't magically happen to have strong exports while it became an issue for France and Italy. That's a structural feature of the monetary union. The euro was always going to be weaker that the DM and stronger than the Lira. That gives an inherent advantage to Germany and conversely deeply disavantage Italy. That's why there never was a currency union without transfers in history before the euro. It plainly can't work.
What Mitterand and Delor did was take a gamble. They pushed for an unsustainable currency union hoping it would extend to a fully featured fiscal union when a crisis inevitably came. Sadly, that's not what happened when said crisis came and we are now stuck with a setup which is either slowly erroding the competitivity of the periphery or forcing it into pro-cycle austerity in the name of a political doctrine it never chose while it favors a few core countries widely misallocating their excess savings while pretending to be virtuous. Our saving grace
It's obviously completely unsustainable hence the constant rise of extremist parties in the perepheric countries but like a good quasi-neocolonial setup, you will see a lot of people actually defend it with arguments which are roughly the same as the one the empires used to use: leaving will be economical ruin, the alternative is chaos, you obviously can't manage your economy without us.
It's no surprise the strongest industrial player in the EU is becoming Poland. It is because they are out of the euro. Look at how while they are theorically forced to join by the treaty, they are doing everything they can to stay out.
Amusingly, we might all end up being saved by Trump because tariffs on top of two decades of systemic underinvestements have put the German economy so out of balance, we might finally witness the end of ordoliberalism.
>Still, as a net importer, the USA generally benefits from a strong dollar. The dollar is in a fairly unique position anyway as it remains the internation reserve currency.
I would say the causality goes the other way, we are a net importer because foreigners need dollars since they are the reserve currency.
The euro being undervalued is a relative statement. It’s undervalued for Germany in the sense that considering Germany current policies and trade balance, an equivalent German only currency would be considerably stronger. That’s a significant part of how Germany remains competitive despite investing so little in their productivity.
Conversely, it’s extremely overvalued for the economy of the periphery. If you look at their trade balance and policies, their own currency would be far weaker. Paradoxically this would be a boon for them. Sure it would impact their ability to import but it would make their exports far cheaper in relative terms.
Adding country with economy pulling down the value of the euro is therefore extremely advantageous to Germany at the expense of the periphery. This is by design. A currency union can’t work without transfers.
That’s why it’s extremely unfair to impose the euro as part of the criteria for joining and why you see country like Poland doing its best to not join. Sadly, Spain, Portugal, Greece and Italy are stuck in. I personally can’t refrain from strongly resenting the union every time I see someone from the advantaged core pretending to be morally virtuous while being the direct beneficiary of one of the most unfair transfer setup since decolonisation and pretend the south should just go with austerity which is the exact reverse of what’s actually needed (investment and devaluation).
I somehow understand how we got there and the weight the completely botched unification of Germany in 1990 carries in it. It doesn’t really make the pill easier to swallow.
The EU already has country of origin requirements. They still had to specify things like "X% of the product has to be made in country Y to be qualified for the 'made in Y' label". And even that can and does get muddy.
For the purpose of this discussion, the % made in country Y doesn't matter--the important thing is whether the product is compliant with regulations in country Y.
Using the same idea, are you personally for legalizing all drugs as well or not requiring doctors to be licensed? Because I think there are lots of things forbidden/regulated across the world, mostly because people do not to make (or are not able to make due to lack of information) the best decisions for them, and then society suffers as a whole.
Me personally, if I have to choose between food 10% cheaper that will give 1 in 1000 people a cancer, or eating something more local/boring I prefer the latter, even if I would never buy it myself.
I already stated in this thread that I'm in favor of smart regulation, not zero regulation. For example, instead of government licensing of doctors, I would be interested in a more elegant solution like requiring all doctors to carry malpractice insurance and publish information about the insurance rate they're currently paying. If graduating from a particular medical school is truly associated with reduced malpractice rates, that should be reflected in lower insurance rates for those doctors. Insurers would design their own exams which would probably be better than government licensing exams since insurers have skin in the game.
The problem is the "root of trust". Someone has to decide if it was "malpractice" or not. The doctor (and the insurer) have the interest to say "it was the best service we could provide", and even if you involve a lawsuit/judge/etc., they will have no clue who is correct. And if you have a "root of trust", they can directly test/manage the doctors (the current system).
Returning to the topic to which I responded: I prefer some organization responsible to make and check a set of rules about food, rather than each person to have to do their own research (and the first does not exclude anyhow the second). I find that smart in the sense that it will reuse knowledge of some people and will not require a lot of people learning a lot of things. I have the impression that I do care about food quality more than the average, so I am not at all worried about too strict requirements.
You don't let customer to decide if they love pesticide or not, their are basic functions even to a minimal state and environment and health protection is among them.
That's a shortcut, llm providers are very short sighted but not to that extreme, alive websites are needed to produce new data for future trainings.
Edit: damn I've seen this movie before