The voice modality plays a huge role in how impressive it seems.
When GPT-2/3/3.5/4 came out, it was fairly easy to see the progression from reading model outputs that it was just getting better and better at text. Which was pretty amazing but in a very intellectual way, since reading is typically a very "intellectual" "front-brain" type of activity.
But this voice stuff really does make it much more emotional. I don't know about you, but the first time I used GPT's voice mode I notice that I felt something -- very un-intellectually, very un-cerebral -- like, the feeling that there is a spirit embodying the computer. Of course with LLM's there always is a spirit embodying the computer (or, there never is, depending on your philosophical beliefs).
The Suno demos that popped up recently should have clued us all in that this kind of emotional range was possible with these models. This announcement is not so much a step function in model capabilities, but it is a step function in HCI. People are just not used to their interactions with a computer be emotional like this. I'm excited and concerned in equal parts that many people won't be truly prepared for what is coming. It's on the horizon, having an AI companion, that really truly makes you feel things.
Us nerds who habitually read text have had that since roughly GPT-3, but now the door has been blown open.
Honestly, as someone who has been using this functionality almost daily for months now, the times that break immersion the most by far is when it does human-like things, such as clearing its throat, pandering, or attaching emotions to its responses.
Very excited about faster response times, auto interrupt, cheaper api, and voice api — but the “emotional range” is actually disappointing to me. hopefully it doesn’t impact the default experience too much, or the memory features get good enough that I can stop it from trying to pretend to be a human
When GPT-2/3/3.5/4 came out, it was fairly easy to see the progression from reading model outputs that it was just getting better and better at text. Which was pretty amazing but in a very intellectual way, since reading is typically a very "intellectual" "front-brain" type of activity.
But this voice stuff really does make it much more emotional. I don't know about you, but the first time I used GPT's voice mode I notice that I felt something -- very un-intellectually, very un-cerebral -- like, the feeling that there is a spirit embodying the computer. Of course with LLM's there always is a spirit embodying the computer (or, there never is, depending on your philosophical beliefs).
The Suno demos that popped up recently should have clued us all in that this kind of emotional range was possible with these models. This announcement is not so much a step function in model capabilities, but it is a step function in HCI. People are just not used to their interactions with a computer be emotional like this. I'm excited and concerned in equal parts that many people won't be truly prepared for what is coming. It's on the horizon, having an AI companion, that really truly makes you feel things.
Us nerds who habitually read text have had that since roughly GPT-3, but now the door has been blown open.