Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Tencent is releasing a ton of stuff though!

https://aivideo.hunyuan.tencent.com/

Github is overflowing with Tencent, Alibaba, and Ant Group models. Typically licensed as Apache 2, and replete with pretrained weights and fine tuning scripts.



The training process in OmniHuman-1 seems to be straightforward to replicate once Tencent releases their image-to-video model too.


T2V is already I2V if you're enterprising enough to open up the model and play with the latents. The I2V modality is almost just a trick.


Yes, the Llava model can encode image, and you can encode image into 3D vae space. Without fine-tune the model though, you are not going to have fidelity to original (if only use Llava's SigLIP to encode), or end up with image with limited motion (3D vae encoded latents as the first frame then doing vid2vid).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: