Isn't that up to the reader/visitor/user to decide? As it stands right now, Curs... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		diggan 40 days ago \| parent \| context \| favorite \| on: Composer: Building a fast frontier model with RL Isn't that up to the reader/visitor/user to decide? As it stands right now, Cursor are publishing results they won't say how they got them, and compares them against aggregate scores we don't know the true results of, and you're saying "it doesn't matter, the tool is better anyways". Then why publish the obscured benchmarks in the first place then?

infecto 39 days ago [–]

No I said I don’t believe any of the existing benchmarks do well when it comes to using a tool chain. They built a model specifically to be used with their tool chain calls, something that a lot of the models out there struggle with.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact