More

hammadh · on March 27, 2023

Thanks. What we've shared here is a demo tool to show our new speech model that can clone a voice with few seconds of audio. You can try that with English or non-English recordings, but the generated voice can only speak English at the moment. If you are looking for high-fidelity cloning, you can sign up and try it in our app here - https://play.ht/voice-cloning/

High-fidelity cloning requires at least 20 mins of good quality audio. The more the better.

hammadh · on March 27, 2023

We launched the playground.play.ht in beta to share the new speech model we are working on. We've been operating play.ht for a while and have teams from these companies using the platform.

hammadh · on March 27, 2023

The intention for this playground was to let people try the model. We actually have auto moderation on the user facing platform (https://play.ht/) and malicious text gets blocked and the user get flagged.

owlbynight · on March 28, 2023

Except this post is 8 hours old and I'm still able to view this link.

SamBam · on March 28, 2023

17 hours old, still there.

jprete · on March 28, 2023

Another 8 hours and I can still see it too.

Apocryphon · on March 28, 2023

How would your auto moderation detect that example is malicious?

hammadh · on March 27, 2023

Thanks, we intended the playground to be merely a testing tool for the new model we're building. We'll improve based on your feedback!

mywacaday · on March 28, 2023

I noticed that when i put in the following text from the BBC as a test and it pronounces 2008 as "two thousand eight" but I believe most people would pronounce it as "two thousand and eight"

Great work

A billionaire's son, who fled to Yemen within hours of the death of a student in London 15 years ago, has admitted his involvement to the BBC.

The body of Martine Vik Magnussen, 23, was discovered under rubble in a Great Portland Street basement in 2008.

Farouk Abdulhak, who is on the Met Police's most wanted list and is the subject of an international arrest warrant, has never spoken about the case before.

iamthemonster · on March 28, 2023

Two thousand eight = American English Two thousand and eight = UK and Australian English

mywacaday · on March 28, 2023

I'm based in Europe and a native English speaker, I thought I was aware of most of the differences between UK/US English. I can't believe I have worked with Americans for decades and never noticed this. Live and learn!

JohnFen · on March 28, 2023

I'm an American, and both methods sound right to my ears. I hear both variations quite a lot from the people around me. I assume it depends on what part of the US you're from.

jnwatson · on March 28, 2023

American English traditionally uses an “and” to separate the whole from the fraction, e.g. two thousand eight and two thirds.

ljlolel · on March 28, 2023

Pretty sure I would say two thousand eight?

the88doctor · on March 28, 2023

I would definitely just say "two thousand eight"

hammadh · on March 27, 2023

Yes, we are working on making the API pay as you go soon. Thanks for the feedback!

MattRix · on March 28, 2023

Another note: the share view on the clips doesn't include any way to get the actual link to the file. I imagine most people want the actual link so they can have more control over how and where they share it.

detrimental-def · on March 28, 2023

The link is not needed (for the tech-savvy crowd). Anyone can share all of the generated demos with the world.

MattRix · on March 29, 2023

I don't think you understand what I mean. There is a share button, and they generate a link for each clip... You can access it by clicking on the "#1234" button (which is not obvious that it's a button/link), but when you open the share menu, there is no option to just copy the url there... instead it's just buttons for facebook, linkedin, and twitter.

hammadh · on March 27, 2023

We have an API - https://docs.play.ht/reference/api-getting-started

We have a beta streaming endpoint but the latency is not real time yet (something we're working on) and are adding an endpoint to create voices.

hammadh · on March 27, 2023

Thanks for sharing this ^

hammadh · on March 27, 2023

You are right, the technology will become ubiquitous, therefore, at least for platforms like us, it's a responsibility to have countermeasures and safeguards to prevent abuse and harm people. There'll always be people who will find ways to abuse but making it more and more difficult and evolving on that seems like a way forward.

We have these measures in place and are working on others to make sure the technology is used towards the betterment of humanity.

1/ Auto moderation on text to block harmful/malicious speech. 2/ As someone pointed out in the comments, we had a manual review process in place where the user is required to read out a consent and a member from Play.ht would review it before approving the voice. We're working on improving and adding this back. 3/ The user facing service is paywalled so we don't allow everyone in. 4/ Users trying to create malicious content are flagged and reviewed. 5/ A classifier to detect AI generated speech

hammadh · on March 27, 2023

You are right, and unfortunately that is a possibility, and we are working on having measure in place to guard against such attempts. We have auto moderation on the input text that will block such audio being generated. Such users are flagged in the system.

Avicebron · on March 27, 2023

What are you filtering for in the input text that would block something like a phone scam?

hammadh · on March 27, 2023

Couldn't agree more with your comment. We are working on counter measures like manual verification of voice, a classifier to detect cloned speech, etc. As of now we have auto moderation in place that detects and blocks hate/harmful speech.

Firmwarrior · on March 27, 2023

The cat's out of the bag, I'd say you guys should just go full steam ahead and make sure it's your names in the headlines

No need for a bunch of onerous kyc or anything IMO

woeirua · on March 27, 2023

Yes, definitely take this advice from some random user on HN. Can't possibly go wrong.

Firmwarrior · on March 27, 2023

I actually have one thousand HackerNews good boy points, so I'm kind of a big deal

I think that a few years from now this tech is going to be ubiquitous, real time, and work on a mobile device. Trying to slam the lid shut on Pandora's Box probably isn't going to work.. the best thing at this point would be for the word to get out to everyone that voices can now be doctored the same way photos can