I've been using AIs for RE work extensively, and I concur.
The worst AI when it comes to the "safety guardrails" in my experience is ChatGPT. It's far too "safety-pilled" - it brings up "safety" and "legality" in unrelated topics and that makes it require coaxing for some of my tasks. It does weird shit like see a security vulnerability and actively tell me that it's not really a security vulnerability because admitting that an exploitable bug exists is too much for it. Combined with atrocious personality tuning? I really want to avoid it. I know it's capable in some areas, but I only turn to it if I maxed out another AI.
Claude is sharp, doesn't give a fuck, and will dig through questionable disassembled code all day long. I just wish it was cheaper in API and had higher usage limits. And, also that CBRN filter seriously needs to die. That one time I had a medical device and was trying to figure out its business logic? The CBRN filter just kept killing my queries. I pity the fools who work in biotech and got Claude as their corporate LLM of choice.
Gemini is quite decent, but long context gives it brainrot. Far more so than other models - instruction following ability decays too fast, it favors earlier instructions over latter ones or just gets too loopy.
The worst AI when it comes to the "safety guardrails" in my experience is ChatGPT. It's far too "safety-pilled" - it brings up "safety" and "legality" in unrelated topics and that makes it require coaxing for some of my tasks. It does weird shit like see a security vulnerability and actively tell me that it's not really a security vulnerability because admitting that an exploitable bug exists is too much for it. Combined with atrocious personality tuning? I really want to avoid it. I know it's capable in some areas, but I only turn to it if I maxed out another AI.
Claude is sharp, doesn't give a fuck, and will dig through questionable disassembled code all day long. I just wish it was cheaper in API and had higher usage limits. And, also that CBRN filter seriously needs to die. That one time I had a medical device and was trying to figure out its business logic? The CBRN filter just kept killing my queries. I pity the fools who work in biotech and got Claude as their corporate LLM of choice.
Gemini is quite decent, but long context gives it brainrot. Far more so than other models - instruction following ability decays too fast, it favors earlier instructions over latter ones or just gets too loopy.