Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Reliability needs to be way higher then what we're currently seeing with google now and siri due to the inherit danger of what ur controlling.


For specific domains, like drone control, you can achieve much better accuracy then you typically see in "dictation"-style speech recognition. You can use a statistical language model that represents the things you're most likely to hear.

For example, Google & Siri kind of need to be able to handle anything I throw at them: "What is Ke$ha's new album?" "What year was the Hardy Boys book 'Hunting For Hidden Gold' written?" They may use a language model that favors grammatical language, that is: "What is Ke$ha" is a more likely speech recognition hypothesis than "What hiss kush ball", but they still need a big model to represent that.

For drone control, you have much more constrained language, which helps recognition accuracy significantly. The model can tell the recognizer that if it heard "Go <unsure> 100 feet" that the <unsure> word is most likely to be a direction like forward/back/left/right/up/down, and not "neutrino".

It's a lot like the way that Norvig uses n-grams to illustrate writing a spelling corrector: http://norvig.com/ngrams/ch14.pdf Having a model lets you fix errors in the input.

Having constrained language and a good model is often critical to creating a successful speech interface.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: