Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

DeepFloyd IF is effectively the same architecture/text encoder as Imagen (https://imagen.research.google/), although that paper doesn't hypothesize why text works out a lot better.


Right, I'm aware of the Imagen architecture, just curious to see further research determining which aspect of it is responsible for the improved text rendering.

EDIT: According to the figure in the Imagen paper FL33TW00D's response referred me to, it looks like the text encoder size is the biggest factor in the improved model performance all-around.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: