Looks like there is some quality reduction, but nonetheless 2s to generate a 5s video on a 5090 for WAN 2.1 is absolutely crazy. Excited to see more optimizations like this moving into 2026.
those streams of text are often conditioned on the prompts - people are using it to learn about new concepts, and as a hyperpersonalised version of search. it can not only tell you of tools you didn't know existed, but it can show you how to use them.
I do like my buttons to stay where I left them - but that can be conditioned. instead of gnome "designers" telling me the button needs to be wide enough to hit with my left foot, I could tell the system I want this button to be small and in that corner - and add it to my prompt.
I feel like a lot of the above assumes the user knows what they want or what works best. I want an intelligent designer to figure out the best flow/story/narrative/game and create/present it, cause I'm a dumb user who doesn't know what is actually good.
that's called a default - I'm happy for a gnome designer to "design" the button to be large enough to hit with my foot with a blindfold on, but I'd like the option to change it to adjust to my workflow rather than adjust my workflow to the button.
Nevertheless it does seem that generating will fairly soon become fast enough to extend a video clip in realtime. Autoregressive by the second. Integrated with a multi modal input model you would be very close to an AI avatar that would be extremely compelling.