Thanks. What we've shared here is a demo tool to show our new speech model that can clone a voice with few seconds of audio. You can try that with English or non-English recordings, but the generated voice can only speak English at the moment. If you are looking for high-fidelity cloning, you can sign up and try it in our app here - https://play.ht/voice-cloning/
High-fidelity cloning requires at least 20 mins of good quality audio. The more the better.
We launched the playground.play.ht in beta to share the new speech model we are working on. We've been operating play.ht for a while and have teams from these companies using the platform.
The intention for this playground was to let people try the model. We actually have auto moderation on the user facing platform (https://play.ht/) and malicious text gets blocked and the user get flagged.
I noticed that when i put in the following text from the BBC as a test and it pronounces 2008 as "two thousand eight" but I believe most people would pronounce it as "two thousand and eight"
Great work
A billionaire's son, who fled to Yemen within hours of the death of a student in London 15 years ago, has admitted his involvement to the BBC.
The body of Martine Vik Magnussen, 23, was discovered under rubble in a Great Portland Street basement in 2008.
Farouk Abdulhak, who is on the Met Police's most wanted list and is the subject of an international arrest warrant, has never spoken about the case before.
I'm based in Europe and a native English speaker, I thought I was aware of most of the differences between UK/US English. I can't believe I have worked with Americans for decades and never noticed this. Live and learn!
I'm an American, and both methods sound right to my ears. I hear both variations quite a lot from the people around me. I assume it depends on what part of the US you're from.
Another note: the share view on the clips doesn't include any way to get the actual link to the file. I imagine most people want the actual link so they can have more control over how and where they share it.
I don't think you understand what I mean. There is a share button, and they generate a link for each clip... You can access it by clicking on the "#1234" button (which is not obvious that it's a button/link), but when you open the share menu, there is no option to just copy the url there... instead it's just buttons for facebook, linkedin, and twitter.
You are right, the technology will become ubiquitous, therefore, at least for platforms like us, it's a responsibility to have countermeasures and safeguards to prevent abuse and harm people. There'll always be people who will find ways to abuse but making it more and more difficult and evolving on that seems like a way forward.
We have these measures in place and are working on others to make sure the technology is used towards the betterment of humanity.
1/ Auto moderation on text to block harmful/malicious speech.
2/ As someone pointed out in the comments, we had a manual review process in place where the user is required to read out a consent and a member from Play.ht would review it before approving the voice. We're working on improving and adding this back.
3/ The user facing service is paywalled so we don't allow everyone in.
4/ Users trying to create malicious content are flagged and reviewed.
5/ A classifier to detect AI generated speech
You are right, and unfortunately that is a possibility, and we are working on having measure in place to guard against such attempts. We have auto moderation on the input text that will block such audio being generated. Such users are flagged in the system.
Couldn't agree more with your comment. We are working on counter measures like manual verification of voice, a classifier to detect cloned speech, etc. As of now we have auto moderation in place that detects and blocks hate/harmful speech.
I actually have one thousand HackerNews good boy points, so I'm kind of a big deal
I think that a few years from now this tech is going to be ubiquitous, real time, and work on a mobile device. Trying to slam the lid shut on Pandora's Box probably isn't going to work.. the best thing at this point would be for the word to get out to everyone that voices can now be doctored the same way photos can
High-fidelity cloning requires at least 20 mins of good quality audio. The more the better.