I tried it with Japanese, and it sounded about as good as in English. Only at one point did it sound unnatural. Japanese two-person conversation uses a lot of backchannelling (aizuchi), that is, semilinguistic sounds made by the listener to indicate attention and emotional reaction. At one point, the female voice said very distinctly "fumu fumu," which is how such aizuchi might be written in a script or manga. In actual speech, though, it would be a continuous sound without syllables and with a rising and/or falling intonation.
That brief TTS-like moment was the only time I was reminded that the voices were not human.
The podcast is about the impact of AI on higher education in Japan. I prompted NotebookLM briefly in Japanese about the topic, and it collected ten sources in Japanese and English that it used as the basis for the audio overview.
Hahahaha. Oh wow that is comedically out of place in a way that no human tries to use it comedically. That's... gorgeously funny. It's like fumu fumu but so deadpan in the delivery. I think I might just have to try and insert some of these completely out of place deadpan fumu fumu into my everyday speech. Too good.
That brief TTS-like moment was the only time I was reminded that the voices were not human.