The niche of GPT-4.5 is lower hallucations than any existing model. Whether that niche justifies the price tag for a subset of usecases remains to be seen.
Actually, this comment of mine was incorrect, or at least we don't have enough information to conclude this. The metric OpenAI are reporting is the total number of incorrect responses on SimpleQA (and they're being beaten by Claude Haiku on this metric...), which is a deceptive metric because it doesn't account for non-responses. A better metric would be the ratio of Incorrects to the total number of attempts.