YT transcripts definitely lack speaker ID. LLMs can infer speakers from context but miss nuance without proper speaker recognition.
I have been tackling this while building VideoToBe.com.
My current pipeline is Download Video -> Whisper Transcription with diarization -> Replace speaker tags with AI generated speaker ID + human fallback.
Reliable ML speaker identification is still surprisingly hard.
For podcast summarization, speaker ID is a game-changer vs basic YT transcripts.
I have been tackling this while building VideoToBe.com. My current pipeline is Download Video -> Whisper Transcription with diarization -> Replace speaker tags with AI generated speaker ID + human fallback.
Reliable ML speaker identification is still surprisingly hard. For podcast summarization, speaker ID is a game-changer vs basic YT transcripts.