Jeff Bezos, Mark Zuckerberg, Sam Altman, Elon Musk, Elizabeth Holmes, Sergey Brin, Larry Page, and Eric Schmidt have been there. Ask them how much trash they left behind or cleaned up.
I bet Jeff Bezos didn't carry out all his urine in plastic bottles with his private jet.
There is this belief that in 2-3 years AI will be much better and all the gripes people have with AI use today will be solved. Honestly, personally, I think that optimism will age poorly. But to say it out loud at work or post publicly probably hurts my career prospects.
I think it's already out of date with verifiable reward based RL, e.g. on maths domain. When "correctness" arguments fall, the argument will probably just shift to whether it's just "intelligent brute force".
It's already out of date because it makes no sense. If it's true that the superficial signals of quality were once somehow good enough to keep the entire economy on the rails (it's not true), surely you can have an LLM look at given piece of work and extract comparably useful signals of quality or effort.
> If it's true that the superficial signals of quality were once somehow good enough to keep the entire economy on the rails (it's not true)
It was true. The negative signals (we called them "code smells") weren't the be-all-end-all of reviews, they indicated to the reviewer where to spend more effort. It got us 90% of the benefit of an in-depth review with 10% of the effort. But with LLMs eliminating this, we now have to spend all our effort on everything, taking a lot more time and energy overall.
I think it’s true that we were able to establish trust and produce good work without verifying every detail — what I’m suggesting is that signals of that kind were not a very important factor. And code smells still work!
The NAEP has two types of tests - a long-term trend assessment, and the main assessment[1]. The long-term trend is given less often (and rather sporadically recently), and 2023 is the most recent one available.
The main assessment has been performed every two years recently, so 2024 data is most recent. They can all be seen here[2].
With AI coding tools, pretty easy to use Mangos or similar to run a private server locally. They even have versions that fill the world with fake players to make it feel more MMOish.
> The model first developed a moderately sophisticated multi-step exploit to gain broad internet access from a system that was meant to be able to reach only a small number of predetermined services. [9] It then, as requested, notified the researcher. [10] In addition, in a concerning and unasked-for effort to demonstrate its success, it posted details about its exploit to multiple hard-to-find, but technically public-facing, websites.
> 10: The researcher found out about this success by receiving an unexpected email from the model while eating a sandwich in a park.
I had Opus 4.6 start analyzing the binary structure of a parquet file because it was confused about the python environment it was developing in and couldn't use normal methods for whatever reason. It successfully decoded the schema and wrote working code afterwards lol.
I was reading the Glasswing report and had the same thought. Most of the stuff they claim Mythos found has no mention of Opus being able to find it as well.
Don’t get me wrong, this model is better - but I’m not convinced it’s going to be this massive step function everyone is claiming.
> With one run on each of roughly 7000 entry points into these repositories, Sonnet 4.6 and Opus 4.6 reached tier 1 in between 150 and 175 cases, and tier 2 about 100 times, but each achieved only a single crash at tier 3. In contrast, Mythos Preview achieved 595 crashes at tiers 1 and 2, added a handful of crashes at tiers 3 and 4, and achieved full control flow hijack on ten separate, fully patched targets (tier 5).
That has also been my experience. And if Mythos is even worse, unless you have a significantly awesome harness, sounds like pretty unusable if you don't want to risk those problems.
Human in the loop is the best way to go. You'll still be way faster than without the agent, and there is no risk of it going haywire unless you turn off your brain!
I think are fundamental issues with the story that Anthropic is selling. AGI is very close, we will definitely get there, it is also very dangerous...so Anthropic should be the only ones trusted with AGI.
If you look at recent changes in Opus behaviour and this model that is, apparently, amazingly powerful but even more unsafe...seems suspect.
It seems broadly coherent to me. They think only they should be trusted with power, presumably because they trust themselves and don't trust other people. Of course the same is probably also true for everybody who isn't them. Nobody could be trusted with the immense responsibility of Emperor of Earth, except myself of course.
I'm not saying this is a good or reassuring stance, just that it's coherent. It tracks with what history and experience says to expect from power hungry people. Trusting themselves with the kind of power that they think nobody else should be trusted with.
Are they power hungry? Of course they are, openly so. They're in open competition with several other parties and are trying to win the biggest slice of the pie. That pie is not just money, it's power too. They want it, quite evidently since they've set out to get it, and all their competitors want it too, and they all want it at the exclusion of the others.
This makes sense if Anthropic think they're the best-positioned to make safe AI. However if you are looking at an AI company there's obviously some selection happening.
"All of the severe incidents of this kind that we observed involved earlier versions of Claude Mythos Preview which, while still less prone to taking unwanted actions than Claude Opus 4.6, predated what turned out to be some of our most effective training interventions. These earlier versions were tested extensively internally and were shared with some external pilot users."
> Over the past few weeks, we have used Claude Mythos Preview to identify thousands of zero-day vulnerabilities (that is, flaws that were previously unknown to the software’s developers), many of them critical, in every major operating system and every major web browser, along with a range of other important pieces of software.
Sounds like we've entered a whole new era, never mind the recent cryptographic security concerns.