The whole mess is a good example why benchmark-driven-development has negative consequences.
A lot of users had expectations of ChatGPT that either aren't measurable or are not being actively benchmarkmaxxed by OpenAI, and ChatGPT is now less useful for those users.
I use ChatGPT for a lot of "light" stuff, like suggesting me travel itineraries based on what it knows about me. I don't care about this version being 8.243% more precise, but I do miss the warmer tone of 4o.
Because benchmarks are meaningless and, despite having so many years of development, LLMs become crap at coding or producing anything productive as soon as you move a bit from the things being benchmarked.
I wouldn't mind if GPT-5 was 500% better than previous models, but it's a small iterative step from "bad" to "bad but more robotic".
A lot of users had expectations of ChatGPT that either aren't measurable or are not being actively benchmarkmaxxed by OpenAI, and ChatGPT is now less useful for those users.
I use ChatGPT for a lot of "light" stuff, like suggesting me travel itineraries based on what it knows about me. I don't care about this version being 8.243% more precise, but I do miss the warmer tone of 4o.