Maybe the LLM is just doing interpolation rather than regression? I've played wi...

waldrews · on April 13, 2024

Interpolation is a legitimate technique for regression problems, a special case of the k-nearest-neighbor estimator (which is one of the methods they test against). Lots of related regression techniques involve a tradeoff between global trends and nearby points: KNN, kernel regressions, gaussian process regressions, generalized additive models, local regressions, mixed models/generalized linear models with some covariance structures - all of them more or less manifestations of the same underlying math. The tree based techniques - random forests, GBM and the like - they don't look like interpolations, but the more you zoom in to the leaf nodes of those trees, the more they look like averaging one or more local y-values.

Literal interpolation would be use a lot more often, except you can't practically do it in higher dimensions (even 2 isn't trivial).

senseiV · on April 14, 2024

Ive noticed the same on extremely small models aswell, magnitude is a positional encoding or a couple tokens, so its easy to grok?