For me definitely the worst regression was the system prompt telling claude to analyze file to check if it's malware at every read. That correlates with me seeing also early exhausted quotas and acknowledgments of "not a malware" at almost every step.
It is a horrible error of judgement to insert a complex request for such a basic ability. It is also an error of judgement to make claude make decisions whether it wants to improve the code or not at all.
It is so bad, that i stopped working on my current project and went to try other models. So far qwen is quite promising.
I don't think that's accurate. The malware prompt has been around since Sonnet 3.7. We carefully evaled it for each new model release and found no regression to intelligence, alongside improved scores for cyber risk. That said, we have removed the prompt for Opus 4.6 since it no longer needed it.
I started seeing "not a malware, continuing" in almost every reply since around 2 weeks ago. Maybe you just reintroduced it with some regression? Opus 4.6
I'm happy to provide any other info that can be useful (as long as i'm not sharing any information about the code or tools we use into a public github issue).
1. I've never seen this. Is there a config option to unhide it if it's happening? Is this in Claude Code? Does it have to be set to verbose or something?
2. Can we pay more/do more rigorous KYC to disable it if it's active?
I’m familiar with postal services both in Poland and Japan and I like the Japanese solution even more - most of the new buildings have package lockers operated by the building owner and independent from the delivery service. Everyone could put the packages there and my building would notify me about a waiting package when I entered.
That's actually rad, but... it's not that different from making current mailboxes bigger. In PL in large buildings those are on ground floor, next to each other. If you make them bigger you only need to add notifications to match that.
DPP-4 drugs are less effective also on other metrics. Would be far more interesting to see the comparison of SGLT-2 inhibitors vs GLP-1 agonists.
For some reason GLP-1 drugs are not that popular in Korea (and still not prescribed just for the weight loss), so that may explain why these researchers haven't done that.
From experience of traveling to many countries and talking with people about food - not everyone is interested in it. I've heard many bad recommendations or people surprised I know the dishes they have never tried. And that's okay.
It is a horrible error of judgement to insert a complex request for such a basic ability. It is also an error of judgement to make claude make decisions whether it wants to improve the code or not at all.
It is so bad, that i stopped working on my current project and went to try other models. So far qwen is quite promising.
reply