Can you comment a bit on the tech on this? I tried something similar with songs:...

echelon · on July 27, 2020

Thanks!

There are a lot of neat research threads ongoing in terms of generating vocals.

Nvidia published Mellotron (code + paper + models), and the results are promising:

https://nv-adlr.github.io/Mellotron

The best results I've seen are from researcher Ryuichi Yamamoto (r9y9 on Github). He continually publishes astonishing results and novel architectures:

https://github.com/r9y9

https://github.com/r9y9/nnsvs

https://soundcloud.com/r9y9/sets/dnn-based-singing-voice

These results lead me to believe he's going to have a replacement for Vocaloid soon.

There's lots more stuff out there, and I can come back and edit my post later.

Some folks are getting good results by simply combining Tacotron with autotune:

- https://www.youtube.com/watch?v=3qR8I5zlMHs Mister Rogers sings Beautiful World (amazing, super charming, and shows the promise of this tech)

- https://www.youtube.com/watch?v=K1jrDgbRs9Q (Tupac, possibly NSFW lyrics)

- https://www.youtube.com/watch?v=QW16_W0K3qU (Tupac with various results, possibly NSFW)

There's a lot that gets posted to /r/VocalSynthesis and occasionally /r/MediaSynthesis

101008 · on July 27, 2020

Thank you very much, I will look at them!