Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Looks like there is some quality reduction, but nonetheless 2s to generate a 5s video on a 5090 for WAN 2.1 is absolutely crazy. Excited to see more optimizations like this moving into 2026.




Efficient realtime video diffusion will revolutionize the way people use computers even more so than LLMs.

I actually think we are already there with quality, but nobody is going to wait 10 minutes to do a task with video that takes 2 seconds with text.

If Sora/Kling/whatever ran cool locally 24/7 at 60FPS, would anyone ever build a UI? Or a (traditional) OS?

I think it's worth watching the scaling graph.


> If Sora/Kling/whatever ran cool locally 24/7 at 60FPS, would anyone ever build a UI?

I like my buttons to stay where I left them.


Yeah, it’s like asking “why would anyone read a book today when LLMs can generate infinite streams of text”

those streams of text are often conditioned on the prompts - people are using it to learn about new concepts, and as a hyperpersonalised version of search. it can not only tell you of tools you didn't know existed, but it can show you how to use them.

I do like my buttons to stay where I left them - but that can be conditioned. instead of gnome "designers" telling me the button needs to be wide enough to hit with my left foot, I could tell the system I want this button to be small and in that corner - and add it to my prompt.


I feel like a lot of the above assumes the user knows what they want or what works best. I want an intelligent designer to figure out the best flow/story/narrative/game and create/present it, cause I'm a dumb user who doesn't know what is actually good.

that's called a default - I'm happy for a gnome designer to "design" the button to be large enough to hit with my foot with a blindfold on, but I'd like the option to change it to adjust to my workflow rather than adjust my workflow to the button.

I suppose if one only reads self-help books of the “You’re the best, trust your instincts!” kind, then LLMs are an appropriate replacement.

Or indeed, if one has a mind of their own and wants a tool to obey them, rather than submit to their "betters"'s opinions.

Please no, please no

That will be Windows 12 and perhaps 2 generations in of iOS :)


That’s not the actual time if you run it, encoding and decoding is extra

Nevertheless it does seem that generating will fairly soon become fast enough to extend a video clip in realtime. Autoregressive by the second. Integrated with a multi modal input model you would be very close to an AI avatar that would be extremely compelling.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: