Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Are these multimodals able to discern the input voice tone? Really curious if they're able to detect sarcasm or emotional content (or even something like mispronunciation?)


Yes, they can, and they should get better at this over time.

There is a demo video where the presenter breathes heavily and asks the AI is able to notice it as such when prompted.

It can’t just detect tone, it seems to also be able to use tone itself.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: