Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Video, he says

1. The next frame is easy, but multiple frames is not

2. What works for text doesn't work for video.

Then Sora comes out and shows multiple frames and someone tweets gotcha.

He then tweets without saying he misspoke ..... goes on about the model doesn't understand physics.

And his project, V-JEPA, is the best

He keeps saying stuff about "sucks as a mental model" but doesn't say why that would not apply to text.

https://twitter.com/ylecun/with_replies

Me: If text doesn't need a mental model, I see no reason video needs it. His argument sucks or is badly worded.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: