The niche of GPT-4.5 is lower hallucations than any existing model. Whether that...

energy123 · 2025-02-28T06:20:34 1740723634

Actually, this comment of mine was incorrect, or at least we don't have enough information to conclude this. The metric OpenAI are reporting is the total number of incorrect responses on SimpleQA (and they're being beaten by Claude Haiku on this metric...), which is a deceptive metric because it doesn't account for non-responses. A better metric would be the ratio of Incorrects to the total number of attempts.