Unfortunately we can’t know at this point whether transformers really understand...

timschmidt · 2025-10-18T17:04:29 1760807069

> Can you please explain how we can discern that GPT-2 in this instance really built a model of the board?

Read the article. It's very clear. To quote it:

"Next, I wanted to see if my model could accurately track the state of the board. A quick overview of linear probes: We can take the internal activations of a model as it’s predicting the next token, and train a linear model to take the model’s activations as inputs and predict board state as output. Because a linear probe is very simple, we can have confidence that it reflects the model’s internal knowledge rather than the capacity of the probe itself."

If the article doesn't satisfy your curiosity, you can continue with the academic paper it links to: https://arxiv.org/abs/2403.15498v2

See also Anthropic's research: https://www.anthropic.com/research/mapping-mind-language-mod...

If that's not enough, you might explore https://www.amazon.com/Thought-Language-Lev-S-Vygotsky/dp/02...

or https://www.amazon.com/dp/0156482401 to better connect language and world models in your understanding.

manmal · 2025-10-18T18:08:40 1760810920

Thanks for putting these sources together. It’s impressive that they got to this level of accuracy.

And is your argument now that an LLM can capture arbitrary state of the wider world as a general rule, eg pretending to be a Swift compiler (or LSP), without overfitting to that one task, making all other usages impossible?

timschmidt · 2025-10-18T18:15:12 1760811312

> is your argument now that an LLM can capture arbitrary state of the wider world as a general rule, eg pretending to be a Swift compiler (or LSP), without overfitting to that one task, making all other usages impossible?

Overfitting happens, even in humans. Have you ever met a scientist?

My points have been only that 1: language encodes a symbolic model of the world, and 2: training on enough of it results in a representation of that model within the LLM.

Exhaustiveness and accuracy of that internal world model exist on a spectrum with many variables like model size, training corpus and regimen, etc. As is also the case with humans.

timschmidt · 2025-10-24T21:08:25 1761340105

Here's another good one: https://arxiv.org/abs/2510.14665