Speech generation has gotten really good, but there's simply no way to faithfull...

michaelbuckbee · on July 10, 2024

This was my thought as well, but someone pointed out to me that regional accent identification captures a large percentage of cadence and inflection differences (specific word choices and turns of phrase obviously would still not be there).

Melomololotolo · on July 10, 2024

I don't think it's hard to get more than a few seconds of voice from many people.

'hi, sry to call you I'm Cindy and I'm from your insurance. I'm calling regarding your car crash ...'.

criddell · on July 10, 2024

Few seconds means less than a minute. That’s not nothing. Look at a clock and talk for a minute — it’s longer than you might think.

Do you think you could give a recording of a minute of someone talking to a talented impressionist and they could impersonate that person to some degree? It doesn’t seem that far fetched to me.

pphysch · on July 10, 2024

"Few" doesn't mean <60 it typically means ~5 or <10.

Getting high-quality audio for an arbitrary private citizen via public means isn't that easy, especially for folks like me that don't post video on public social media and use automated call screening and never say a word until the caller has been vetted.