The assumption I make is that "better benchmarks" are going to be 5% better, not 5000% better. LLMs are getting better capabilities faster than the benchmarks get better at measuring them accurately.
So, yes, we just aren't going to get anything that's radically better. Just more of the same, and some benchmarks that are less bad. Which is still good. But don't expect a Benchmark Revolution when everyone suddenly realizes just how Abjectly Terrible the current benchmarks are, and gets New Much Better Benchmarks to replace them with. The advances are going to be incremental, unimpressive, and meaningful only in aggregate.
So, yes, we just aren't going to get anything that's radically better. Just more of the same, and some benchmarks that are less bad. Which is still good. But don't expect a Benchmark Revolution when everyone suddenly realizes just how Abjectly Terrible the current benchmarks are, and gets New Much Better Benchmarks to replace them with. The advances are going to be incremental, unimpressive, and meaningful only in aggregate.