Curious you left out Frontier Math’s statement that they provided 300 questions ...

freehorse · 2025-02-28T06:39:14 1740724754

1. I said the majority of the problems, and the article I linked also mentioned this. Nothing “curious” really, but if you thought this additional source adds sth more, thanks for adding it here.

2. We know that “open”ai is bad, for many reasons, but this is irrelevant. I want processes themselves to not depend on the goodwill of a corporation to give intended results. I do not trust benchmarks that first presented themselves secret and then revealed they were not, regardless if the product benchmarked was from a company I otherwise trust or not.

brookst · 2025-02-28T14:16:44 1740752204

Fair enough. It’s hard for me to imagine being so offended as the way they screwed up disclosure that I’d reject empirical data, but I get that it’s a touchy subject.

542354234235 · 2025-02-28T17:23:52 1740763432

When the data is secret and unavailable to the company before the test, it doesn’t rely on me trusting the company. When the data is not secret and is available to the company, I have to trust that the company did not use that prior knowledge to their advantage. When the company lies and says it did not have access, then later admits that it did have access, is means the data is less trustworthy from my outsider perspective. I don’t think “offense” is a factor at all.

If a scientific paper comes out with “empirical data”, I will still look at the conflicts of interest section. If there are no conflicts of interest listed, but then it is found out that there are multiple conflicts of interest, but the authors promise that while they did not disclose them, they also did not affect the paper, I would be more skeptical. I am not “offended”. I am not “rejecting” the data, but I am taking those factors into account when determining how confident I can be in the validity of the data.

refulgentis · 2025-02-28T20:54:02 1740776042

> When the company lies and says it did not have access, then later admits that it did have access, is means the data is less trustworthy from my outsider perspective.

This isn't what happened? I must be missing something.

AFAIK:

The FrontierMath people self-reported they had a shared folder the OpenAI people had access to that had a subset of some questions.

No one denied anything, no one lied about anything, no one said they didn't have access. There was no data obtained under the table.

The motte is "they had data for this one benchmark"

The bailey is "they got data under the table"