T2V is already I2V if you're enterprising enough to open up the model and play w... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		echelon 11 months ago \| parent \| context \| favorite \| on: OmniHuman-1: Human Animation Models T2V is already I2V if you're enterprising enough to open up the model and play with the latents. The I2V modality is almost just a trick.

liuliu 11 months ago [–]

Yes, the Llava model can encode image, and you can encode image into 3D vae space. Without fine-tune the model though, you are not going to have fidelity to original (if only use Llava's SigLIP to encode), or end up with image with limited motion (3D vae encoded latents as the first frame then doing vid2vid).

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact