Can you comment a bit on the tech on this? I tried something similar with songs: I wanted artists X to sing a song from artist Y. I cleaned the voices, the audios, but the transfe rjust didnt work. I didnt do any annotations on the text (it shouldnt be that hard since all lyrics are available), but if you could recommend a path or maybe an open source project I be grateful. Thanks and great work by the way!
The best results I've seen are from researcher Ryuichi Yamamoto (r9y9 on Github). He continually publishes astonishing results and novel architectures: