Hacker Newsnew | past | comments | ask | show | jobs | submit | hammadh's commentslogin

Thanks. What we've shared here is a demo tool to show our new speech model that can clone a voice with few seconds of audio. You can try that with English or non-English recordings, but the generated voice can only speak English at the moment. If you are looking for high-fidelity cloning, you can sign up and try it in our app here - https://play.ht/voice-cloning/

High-fidelity cloning requires at least 20 mins of good quality audio. The more the better.


We launched the playground.play.ht in beta to share the new speech model we are working on. We've been operating play.ht for a while and have teams from these companies using the platform.


The intention for this playground was to let people try the model. We actually have auto moderation on the user facing platform (https://play.ht/) and malicious text gets blocked and the user get flagged.


Except this post is 8 hours old and I'm still able to view this link.


17 hours old, still there.


Another 8 hours and I can still see it too.


How would your auto moderation detect that example is malicious?


Thanks, we intended the playground to be merely a testing tool for the new model we're building. We'll improve based on your feedback!


I noticed that when i put in the following text from the BBC as a test and it pronounces 2008 as "two thousand eight" but I believe most people would pronounce it as "two thousand and eight"

Great work

A billionaire's son, who fled to Yemen within hours of the death of a student in London 15 years ago, has admitted his involvement to the BBC.

The body of Martine Vik Magnussen, 23, was discovered under rubble in a Great Portland Street basement in 2008.

Farouk Abdulhak, who is on the Met Police's most wanted list and is the subject of an international arrest warrant, has never spoken about the case before.


Two thousand eight = American English Two thousand and eight = UK and Australian English


I'm based in Europe and a native English speaker, I thought I was aware of most of the differences between UK/US English. I can't believe I have worked with Americans for decades and never noticed this. Live and learn!


I'm an American, and both methods sound right to my ears. I hear both variations quite a lot from the people around me. I assume it depends on what part of the US you're from.


American English traditionally uses an “and” to separate the whole from the fraction, e.g. two thousand eight and two thirds.


Pretty sure I would say two thousand eight?


I would definitely just say "two thousand eight"


Yes, we are working on making the API pay as you go soon. Thanks for the feedback!


Another note: the share view on the clips doesn't include any way to get the actual link to the file. I imagine most people want the actual link so they can have more control over how and where they share it.


The link is not needed (for the tech-savvy crowd). Anyone can share all of the generated demos with the world.


I don't think you understand what I mean. There is a share button, and they generate a link for each clip... You can access it by clicking on the "#1234" button (which is not obvious that it's a button/link), but when you open the share menu, there is no option to just copy the url there... instead it's just buttons for facebook, linkedin, and twitter.


We have an API - https://docs.play.ht/reference/api-getting-started

We have a beta streaming endpoint but the latency is not real time yet (something we're working on) and are adding an endpoint to create voices.


Thanks for sharing this ^


You are right, the technology will become ubiquitous, therefore, at least for platforms like us, it's a responsibility to have countermeasures and safeguards to prevent abuse and harm people. There'll always be people who will find ways to abuse but making it more and more difficult and evolving on that seems like a way forward.

We have these measures in place and are working on others to make sure the technology is used towards the betterment of humanity.

1/ Auto moderation on text to block harmful/malicious speech. 2/ As someone pointed out in the comments, we had a manual review process in place where the user is required to read out a consent and a member from Play.ht would review it before approving the voice. We're working on improving and adding this back. 3/ The user facing service is paywalled so we don't allow everyone in. 4/ Users trying to create malicious content are flagged and reviewed. 5/ A classifier to detect AI generated speech


You are right, and unfortunately that is a possibility, and we are working on having measure in place to guard against such attempts. We have auto moderation on the input text that will block such audio being generated. Such users are flagged in the system.


What are you filtering for in the input text that would block something like a phone scam?


Couldn't agree more with your comment. We are working on counter measures like manual verification of voice, a classifier to detect cloned speech, etc. As of now we have auto moderation in place that detects and blocks hate/harmful speech.


The cat's out of the bag, I'd say you guys should just go full steam ahead and make sure it's your names in the headlines

No need for a bunch of onerous kyc or anything IMO


Yes, definitely take this advice from some random user on HN. Can't possibly go wrong.


I actually have one thousand HackerNews good boy points, so I'm kind of a big deal

I think that a few years from now this tech is going to be ubiquitous, real time, and work on a mobile device. Trying to slam the lid shut on Pandora's Box probably isn't going to work.. the best thing at this point would be for the word to get out to everyone that voices can now be doctored the same way photos can


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: