It makes it look like the presentation is rushed or made last minute. Really bad to see this as the first plot in the whole presentation. Also, I would have loved to see comparisons with Opus 4.1.
Edit: Opus 4.1 scores 74.5% (https://www.anthropic.com/news/claude-opus-4-1). This makes it sound like Anthropic released the upgrade to still be the leader on this important benchmark.
After reading around, it seems like they probably forgot to update/swap the slides before presentation. The graphs were correct on their website, as they launched. But the ones they used in the presentation were probably some older versions they had forgotten to fix.
https://bsky.app/profile/tylermw.com/post/3lvtac5hues2n