There's usually a very small, purpose built model for hearing the initial starting phrase (Alexa, hey Google, etc). It's called a wake word model or wake word detection. Because it's a separate component, it's then fairly straightforward to disable it during an active conversation.