It’s just that for N tokens, autoregressive model has to make N sequential steps.
Where diffusion does K x N, with the N being done in parallel. And for K << N.
This makes me wonder how well they will scale to many users, since batching requests would presumably saturate the accelerators much faster?
Although I guess it depends on the exact usage patterns.
Anyway, very cool demo nonetheless.
It’s just that for N tokens, autoregressive model has to make N sequential steps.
Where diffusion does K x N, with the N being done in parallel. And for K << N.
This makes me wonder how well they will scale to many users, since batching requests would presumably saturate the accelerators much faster?
Although I guess it depends on the exact usage patterns.
Anyway, very cool demo nonetheless.