The whole mess is a good example why benchmark-driven-development has negative c...

Terretta · 2025-08-13T20:59:05 1755118745

> I don't care about this version being 8.243% more precise, but I do miss the warmer tone of 4o.

Why? 8.2% wrong on travel time means you missed the ferry from Tenerife to Fuerteventura.

You'll be happy Altman said they're making it warmer.

I'd think the glaze mode should be the optional mode.

mFixman · 2025-08-13T22:02:57 1755122577

Because benchmarks are meaningless and, despite having so many years of development, LLMs become crap at coding or producing anything productive as soon as you move a bit from the things being benchmarked.

I wouldn't mind if GPT-5 was 500% better than previous models, but it's a small iterative step from "bad" to "bad but more robotic".

tankenmate · 2025-08-13T21:27:10 1755120430

"glaze mode"; hahaha, just waiting for GPT-5o "glaze coding"!