Interesting! I used whipser last year to attempt to build an audio transcription tool but gave up due to excessive amount of hallucinated output no matter what model I used.
It would produce seemingly ok output until you started paying attention.
One example, it insisted that Biggie Smalls sings "Puttin five carrots in my baby girl ear". (its "carats").
It's apparently not useful in transcription as it don't reason [sic].
It would produce seemingly ok output until you started paying attention.
One example, it insisted that Biggie Smalls sings "Puttin five carrots in my baby girl ear". (its "carats").
It's apparently not useful in transcription as it don't reason [sic].