Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

YT transcripts definitely lack speaker ID. LLMs can infer speakers from context but miss nuance without proper speaker recognition.

I have been tackling this while building VideoToBe.com. My current pipeline is Download Video -> Whisper Transcription with diarization -> Replace speaker tags with AI generated speaker ID + human fallback.

Reliable ML speaker identification is still surprisingly hard. For podcast summarization, speaker ID is a game-changer vs basic YT transcripts.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: