The full comment is right there, we don't need to seance what the rest of it was or remix it.
> Arguably, it is the other way around: they aren’t focused on appealing to those biases, but driven by them, in the that the perception of language modeling as a road to real general reasoning is a manifestation of the same bias which makes language capacity be perceived as magical
There's no charitable reading of this that doesn't give the researcher's way too little credit given the results of the direction they've chosen.
This has nothing to do with biases and emotion, I'm not sure why some people need it to be: modalities have progressed in order of how easy they are to wrangle data on: text => image => audio => video.
We've seen that training on more tokens improves performance, we've seen that training on new modalities improves performance on the prior modalities.
It's so needlessly dismissive to act like you have this mystical insight into a grave error these people are making, and they're just seeking to replicate human language out of folly, when you're ignoring table stakes for their underlying works to start with.
Note that there is only one thing about the research that I have said is arguably influenced by the bias in question, “the perception of language modeling as a road to real general reasoning”. Not the order of progression through modalities. Not the perception that language, image, audio, or video are useful domains.
> Arguably, it is the other way around: they aren’t focused on appealing to those biases, but driven by them, in the that the perception of language modeling as a road to real general reasoning is a manifestation of the same bias which makes language capacity be perceived as magical
There's no charitable reading of this that doesn't give the researcher's way too little credit given the results of the direction they've chosen.
This has nothing to do with biases and emotion, I'm not sure why some people need it to be: modalities have progressed in order of how easy they are to wrangle data on: text => image => audio => video.
We've seen that training on more tokens improves performance, we've seen that training on new modalities improves performance on the prior modalities.
It's so needlessly dismissive to act like you have this mystical insight into a grave error these people are making, and they're just seeking to replicate human language out of folly, when you're ignoring table stakes for their underlying works to start with.