Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Isn't that up to the reader/visitor/user to decide? As it stands right now, Cursor are publishing results they won't say how they got them, and compares them against aggregate scores we don't know the true results of, and you're saying "it doesn't matter, the tool is better anyways".

Then why publish the obscured benchmarks in the first place then?



No I said I don’t believe any of the existing benchmarks do well when it comes to using a tool chain. They built a model specifically to be used with their tool chain calls, something that a lot of the models out there struggle with.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: