Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The results for GPT - 4.5 are in for Kagi LLM benchmark too.

It does crush our benchmark - time to make new? ;) - with performance similar of that of reasoning models. It does come at a great price both in cost and speed.

A monster is what they created. But looking at the tasks it fails, some of them my 9 year old would solve. Still in this weird limbo space of super knowledge and low intelligence.

May be remembered as the last the last of the 'big ones', can't imagine this will be a path for the future.

https://help.kagi.com/kagi/ai/llm-benchmark.html



Do you have results for gpt-4? I’d be very interested in seeing the lift here from their last “big one”.


Why don't you have Grok?


No api for grok 3 might be why


If Gemini 2 is the top in your benchmark, make sure to re-check your benchmark.


Gemini 2 pro is actually very impressive (maybe not for coding, haven't used it for that)

Flash is pretty garbage but cheap


Gemini 2.0 Pro is quite good.


Gemini 2 pro is pretty strong actually.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: