Huh, if it's good enough, movies/TV shows dubbed with AI-clone of the original voice would be great (if we can ignore the ethics of using the actor's voice and the loss of work for the dubbing companies and actors).
Yes, have you used OpenAI’s voice model? It uses and reacts to tones
My favorite conversation has been getting it to tell me about marshmallow vs marshmellow spelling and pronunciation, it became very strict but patient with me
It can reply in other languages too, but I cant detect dialect as well to say
In my experience, human dubbing never captures the original tone anyway. Probably never can unless it's done by people fluent in both source and target languages and're also good at voice acting. And so I have a huge preference for subs so I can appreciate the nuance in the original voices.
Sometimes, very rare, the dub is actually better (yes, subjective, but still) than the original. E.g. I do find Fear and Loathing in Las Vegas in German absolutely hilarious, start to finish. Then I watched the english version - was I surprised how serious this is in original voice. It's an entirely different movie in its original tone.
"Wayne's World" in French is also a masterpiece. But it's been dubbed by very good people (called "Les Nuls") who understood the jokes and created appropriate ones instead of loosing them in translation when needed.
When was the last time you experienced human dubbing? They do it with amazingly high quality in some languages today. I actually feel sorry for the dubbers who now have to be actors as well, making every grunt, sigh and laugh of the people they are dubbing.
They are even dubbing reality survival shows, so somebody has to sit in a studio and groan as if they are climbing a slippery hill in Alaska.
Can't talk for the German dubbing, but the Italian version sounds natural to us Italians while the original, English version, is hard to relate to and create a bond with. The dubbing makes it "close" to home if that makes sense. You might feel it's weird because you've grown accustomed to watching the original version while also immersed in everything that sitcom portrays.
What I found is that, for cross-language use-cases, this often just applies the intonation of the “context” sample to the created sample, which, if they are from different languages, usually gives the wrong result (in the sense that it sounds off).
I realise that for dubbing (and voice actors in general), they have the least amount of "potential fallback legal protections" in the context of automation via generative AI:
Translating? Machine translation is already well established.
Copyright? IIRC the most popular voice generation company (ElevenLabs) uses copyright-safe models, where the sources for the base model were already consenting.
Likeness? As you said, just use a synthetic voice.
Replacement of jobs? Not really a legal issue. It's not much worse than self checkouts or driverless cars, for example. The only reason we're talking about it is because it affects white-collar workers and not blue-collar workers, and voice actors are more likely to be celebrities than cashiers, for example.
For example here's how weird Friends is in German: https://www.youtube.com/watch?v=nCoNSZV--z0 . Or Italian: https://www.youtube.com/watch?v=wO5qTzvyQ1s
Can AI detect the emotional tone of sentences yet, and recreate it in the target language?