Maybe the LLM is just doing interpolation rather than regression?
I've played with math examples (a while back, not with recent models) where they make errors, but seem to get the magnitude right, so perhaps easy to find the closest points (or roughly closest) to interpolate between.
Interpolation is a legitimate technique for regression problems, a special case of the k-nearest-neighbor estimator (which is one of the methods they test against). Lots of related regression techniques involve a tradeoff between global trends and nearby points: KNN, kernel regressions, gaussian process regressions, generalized additive models, local regressions, mixed models/generalized linear models with some covariance structures
- all of them more or less manifestations of the same underlying math. The tree based techniques - random forests, GBM and the like - they don't look like interpolations, but the more you zoom in to the leaf nodes of those trees, the more they look like averaging one or more local y-values.
Literal interpolation would be use a lot more often, except you can't practically do it in higher dimensions (even 2 isn't trivial).
I've played with math examples (a while back, not with recent models) where they make errors, but seem to get the magnitude right, so perhaps easy to find the closest points (or roughly closest) to interpolate between.