>Apparently someone here doesn't know how outliers affect a mean.
If the concern is that easy questions distort the mean, then the obvious fix is to reduce the proportion of easy questions, not to invent a convoluted scoring method to compensate for them after the fact. Standardized testing has dealt with this issue for a long time, and there’s a reason most systems do not handle it the way ARC-AGI 3 does. Francois is not smarter than all those people, and certainly neither are you.
How do you define "easy question" for a potential alien intelligence? The solution, like most solutions when dealing with outliers, in my opinion, is to minimize the impact of outliers.
I mean presumably that's what the preview testing stage would handle right ? It should be clear if there are a class of obviously easy questions. And if that's not clear then it makes the scoring even worse.
And in some sense, all of these benchmarks are tied and biased for human utility.
I don't think ARC would be designed and scored the way it is if giving consideration for an alien intelligence was a primary concern. In that case, the entire benchmark itself is flawed and too concerned with human spatial priors.
There are many ways to deal with a problem. Not all of them are good. The scoring for 3 is just bad. It does too much and tells too much.
5% could mean it only answered a fraction of problems or it answered all of them but with more game steps than the best human score. These are wildy different outcomes with wildly different implications. A scoring methodology that can allow for such is simply not a good one.
If the concern is that easy questions distort the mean, then the obvious fix is to reduce the proportion of easy questions, not to invent a convoluted scoring method to compensate for them after the fact. Standardized testing has dealt with this issue for a long time, and there’s a reason most systems do not handle it the way ARC-AGI 3 does. Francois is not smarter than all those people, and certainly neither are you.
This shouldn't be hard to understand.