Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In order to predict the next token it’s doing something more like simulating the writer of the words and the context they were likely to be in while writing the words. You cannot make accurate predictions without understanding the world that gave rise to these words.

Consider a detective story with all the clues laid out and then at the end the detective says: “I know who it is. It is: …” Correctly predicting the next “tokens” entails that you incorporate all the previous details. Same goes for math questions, emotive statements, etc.

I’d be careful calling it simple. They might be simulating humans including for example a theory of mind just as a side effect.



Yes exactly. All of the dismissive ‘it’s just a fancy next word predictor’ articles can’t see the woods for the trees. Just because the function of the model is to predict the next word tells us almost nothing about what internal representation of the world it has built in order to generalise. I don’t for a second think it’s currently of comparable complexity to the world model we all have in our brains, but I also don’t think there’s any clearly defined theoretical limit on what could be learned, beyond the need for the internal model to make better predictions and the optimiser to find this more optimal set of weights (which might be a limit in practice as of now).


> You cannot make accurate predictions without understanding the world that gave rise to these words.

I think you must either admit that chatgpt does exactly this, or else give up our traditional connotation of "understand". Chatgpt has never seen a sunset, felt pain, fallen in love, etc, yet it can discuss them coherently; to call that understanding the world is to implicitly say that the world can be understood solely through reading about it and keeping track of which words follow which. It's amazing that generating text from statistical relationships about tokens in a corpus, which generated nonsensical-but-grammatical sentence fragments at smaller scales, can expand to concepts and essays with more compute, but it is just a difference in scale.

> I’d be careful calling it simple.

I'm not calling it simple, I'm calling us simple! I'm saying that chatgpt is proof that natural language processing is much easier than I previously thought.


Ok, but billions upon billions of “statistical relationships”.. I mean at some point the term “simple” loses its meaning. I get your point though. It is not pure magic.


Yeah, "simple" as in we didn't have to make something that can learn general concepts, and then teach it language. It feels like a hack, doesn't it? Like you were working on a problem you thought was NP-hard and then you stumble over a way to do it in O(n^2).


Yeah, I think we get sidetracked by how it “feels” to us when we learn. We forget that is just a convenient story that our mind tells itself. We are incapable or at least severely handicapped when it comes to raw experience and the knowing of it.

Somehow this approach to ML feels kind of natural to me, but it’s hard to articulate why.


“You cannot make accurate predictions without understanding the world that gave rise to these words.”

This depends on the definition of understanding. There are an infinite number of equations that could describe the trajectory of a ball being thrown, and none of them are exactly correct depending on how deep down the understanding hole one travels.


The point is, you need those equations. Their particular form is secondary and indeed up for debate.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: