One under appreciated / misunderstood aspect of these models is they use more co...

One under appreciated / misunderstood aspect of these models is they use more compute than an equivalent sized autoregressive model.

It’s just that for N tokens, autoregressive model has to make N sequential steps.

Where diffusion does K x N, with the N being done in parallel. And for K << N.

This makes me wonder how well they will scale to many users, since batching requests would presumably saturate the accelerators much faster?

Although I guess it depends on the exact usage patterns.

Anyway, very cool demo nonetheless.