Video, he says 1. The next frame is easy, but multiple frames is not 2. What wor...

Video, he says

1. The next frame is easy, but multiple frames is not

2. What works for text doesn't work for video.

Then Sora comes out and shows multiple frames and someone tweets gotcha.

He then tweets without saying he misspoke ..... goes on about the model doesn't understand physics.

And his project, V-JEPA, is the best

He keeps saying stuff about "sucks as a mental model" but doesn't say why that would not apply to text.

Me: If text doesn't need a mental model, I see no reason video needs it. His argument sucks or is badly worded.