If I have to resort to hole-driven development then it seems like something has gone very wrong? I like to understand the libraries I’m using. Libraries should have good API documentation and intuitive type definitions. I shouldn’t have to carefully study how they’re put together, like solving a math homework problem.
I like how the Go team does things. For example, this is only one part of it, but the Go checksum database seems like a pretty good solution for making sure that a path and version reliably maps to the same source code.
Given enough people enough guns and school shootings are inevitable.
Allow a handful of people that grab the economy and all means of production and violence will be the result.
At this point in time it is simply cause and effect, the surprising thing to me is how long it is holding together. But at the rate the economy is being wrecked I fail to see how it will do so for much longer.
Effectively the French elites started the French revolution by being a little bit more greedy than the population would have tolerated. That set off an avalanche of what were effectively a series of mini revolutions ultimately resulting in modern France, which is in many ways unlike any other country in the world. The United States had its war of independence (aided by France, by the way), and then its civil war. But it never had a class war - yet - and this article presages that class war.
It could well be that the small number of rich people that are currently effectively a government outside of the government genuinely believe that their wealth and power insulate them from the consequences of pushing their greed and wealthy to ridiculous levels. But I suspect the author is right in that this is approaching some kind of threshold and I have no way of seeing across the divide, I'm hoping for another France rather than another Somalia.
The property being tested in this example is “after inserting a row into a database table, the same row can be read back again.”
The insert statement isn’t independent of the database because the table needs to exist and its schema has to allow the inserted values. If the database is generated randomly, you need access to it to generate an insert statement that will work.
This is straightforward to do if the library is designed for it. Using my own TypeScript library [1]:
In the link above he's described 7 very practical ways to use it. No functional jargon, no mathematical jargon. Just practical useful ideas. And the language choice in the book is irrelevant - the concepts translate well.
There is an alternate universe where he would be well known as the top author on software engineering. His website is great as well.
That said, if you do know a bit of the math his example introduced commutative, invertible, invariance, idempotency, structural recursion & isomorphism - but anyone reading it would never really know and would never need to know. It's just framed as useful applications of tests.
A year ago the LLM's weren't good enough to find these security issues. They could have done other stuff. But then again, the big tech companies were already doing other stuff, with bug bounties, fuzzing, rewriting key libraries, and so on.
This initiative probably could have started a few months sooner with Opus and similar models, though.
Nevertheless, the distance between free models and Mythos is not so great as claimed by the Anthropic marketing, which of course is not surprising.
In general, this is expected to be also true for other applications, because no single model is equally good for everything, even the SOTA models, trying multiple models may be necessary for obtaining the best results, but with open weights models trying many of them may add negligible cost, especially if they are hosted locally.
That's not quite true, even a year ago LLMs were finding vulnerabilities, especially when paired with an agent harness and lots of compute. And even before that security researchers have been shouting about systemic fragility.
Mythos certainly represents a big increase in exploitation capability, and we should have anticipated this coming.
A lot of those bugs were found by seasoned developers and security professionals though. Anthropic claims that Mythos is finding vulns from people who have no security background, who just typed "hey, go find a vulnerability in X", went home for the night, and came back the next morning with a PoC ready. They could definitely be an exaggerating, but if it's true that's a very different threat category which is worth paying attention to.
Previous models have done this just fine. For the last year, whenever a new model has come out I just point it at some of my repos and say something like "scan this entire codebase, look for bugs, overengineering, security flaws etc" and they always find a few useful things. Obviously each new model does this better than the last, though.
Imo that's a big deal primarily because the issue with automatically discerned vulnerabilities has long been a high volume of reports and a very bad signal-to-noise ratio. When an LLM is capable of developing PoC exploits, that means you finally have a tool that enables meaningfully triaging reports like this.
If you run Opus 4.6 and GPT 5.4 in a loop right now (maybe 100 times) against top XXXX repos, I guarantee you that you'll find at the very least, medium vulnerabilities.
> A year ago the LLM's weren't good enough to find these security issues
I know of two F100s that already started using foundation models for SCA in tandem with other products back in 2024. It's noisy, but a false positive is less harmful than an undetected true positive depending on the environment.
No, Opus has found a lot and 112 vulnerabilities were reported to Firefox alone by Opus [0]. But Mythos is uniquely capable of exploiting vulnerabilities, not just finding them.
You've got to admit that crying wolf about how dangerous their new model is for the hundredth time right when the biggest story about the company was a leak that made them and their internal vibe-coding look totally incompetent is a bit suspect.
A lesson of the parable about "crying wolf" is that cynicism based on previous events doesn't prove that the next event is fake. The people who ignored the warning may have thought it "most likely," but they were wrong.
I mean sure, they could be lying. It seems like a rather elaborate lie, though, considering that they got several other major companies to go along with it.
If you want to show that that there's a risk of disaster you need to do better than making a silly analogy. Companies will often start expensive projects that fail and then they pick themselves up and move on. Big, profitable companies can afford bigger failures. Google has had a slew of failed projects, and Meta's metaverse stuff tanked, and they're still fine. They can afford to experiment.
So which companies are betting so big that it might actually threaten them? Oracle maybe?
"Google has had a slew of failed projects, and Meta's metaverse stuff tanked, and they're still fine. They can afford to experiment."
Only with the blessing of shareholders. Frankly Google's search box and ad-tech has been carrying all of its failed bets but at some point people will start questioning if Google is returning enough cash given the results of new investments. Google's management does not own the cash - it holds the cash on behalf of the owners.
Which shareholders do you mean? Mark Zuckerberg holds >50% of voting rights for Facebook. Sergey Brin and Larry Page hold >50% of voting rights for Google. That means management gets to do what it wants, within very broad legal limits.
On the other hand, how the stock does will matter to other employees because they’re shareholders and they have a stake in the outcome.
Seems clear to me that OpenAI at this point is a Ponzi scheme waiting to collapse. This is why they are trying to IPO and dump their shares on the public market before they go bankrupt.
reply