WhisperX — Automated Transcripts w/ Timestamps and Speaker Tagging

dgdft@lemmy.world · 1 day ago

WhisperX — Automated Transcripts w/ Timestamps and Speaker Tagging

irmadlad@lemmy.world · 1 day ago

What would be some use cases for WhisperX? I’m struggling to envision how I would use that in a selfhosting/homelabbing environment.

fatalicus@lemmy.world · 16 hours ago

I’m personally looking at setting up whisper or whisperx with bazarr, to get subtitles for movies and series that I can’t find any to download.

TheFogan@programming.dev · edit-2 1 day ago

half sarcastic but the overall premise of rigging something in to a local voice assistant, when an arguement starts “Ok nabu record this conversation”. then 2 weeks later on another arguement… “OK nabu search our last arguement for the cabinet”. Would be like having a court transcriber on call.

irmadlad@lemmy.world · 1 day ago

I have a lady friend that does quite a good enough job of that. LOL

‘You remember back in 1979…it was a Friday at 2:11 PM, and you said…’ ‘Babe, I don’t remember what I had for breakfast yesterday.’

onslaught545@lemmy.zip · 1 day ago

Does she do it for her fuckups, though?

irmadlad@lemmy.world · 1 day ago

What kind of stupid-ass question is that? LOL All kidding aside, she’s a good soul. We’re not married, we’ve just know each other for 45+ years. It just kind of clicked. She lives in her house, and I in mine, and we get together as often as possible.

hendrik@palaver.p3x.de · 1 day ago

Hmm… Would be interesting to find out what kind of effect that has on the average marriage or relationship 😅

e0qdk@reddthat.com · 1 day ago

“You love the robot more than me!” 💔️

RaivoKulli@sopuli.xyz · 5 hours ago

“WELL AT LEAST THE ROBOT LISTENS TO ME”

TheFogan@programming.dev · 23 hours ago

I mean, I’d imagine probably not a good one :) Somehow I imagine asking the AI to record a conversation, is an instant arguement escalator… as is asking to read the facts back, and usually the topic would be switched rather than one side admitting their fault in the conversation.

Actually I think there’s a black mirror episode on roughly that (not a device for recording audio when asked, but everyone having a chip in their head that automatically records their memories, and a huge fight when a husband discovers his wife deleted a few hours of recordings.

faberfedor@lemmy.world · 3 hours ago

That was a great episode!

hendrik@palaver.p3x.de · 1 day ago

Likely everyday stuff… Meeting minutes, phone or video conferences and such…

irmadlad@lemmy.world · 1 day ago

I guess that’s why I am having difficulty coming up with a use case. I mean, I walk around the lab talking to myself all day long, but I think it’d be a bad idea to have a record of all those conversations. lol

onslaught545@lemmy.zip · 1 day ago

If you don’t have to sit through a bunch of ‘meetings that could have been emails’ on a daily basis, you likely won’t have a use case for it.

But in my last job I was a systems engineer for a web development company. I had to be included on all of the dev calls in case an infrastructure question came up that I needed to answer, and so I was vaguely aware of what the devs were doing.

This software would have been a lifesaver, because my ADHD doesn’t let me listen to stuff like that for a straight hour or two.

hoshikarakitaridia@lemmy.world · 22 hours ago

Long videos or voice notes where you’re usually just looking for a small snippet.

irmadlad@lemmy.world · 10 hours ago

Now that’s an interesting angle. I am a mediocre musician on my best day, but sometimes I incorporate phrases and lyric snippits in a piece. I wonder if I could use WhisperX to find those words or phrases from a stack of songs. For instance, I did a piece that used a line from Jimi Hendrix’s ‘If 6 were 9’ where he says ‘I’m the one who’s gotta die when it’s time for me to die. So let me live my life the way I want to.’ I wonder if WhisperX could pick that out of a stack of Jimi Hendrix songs.

dgdft@lemmy.world · edit-2 6 hours ago

You should be able to get decent results if you pipe your tracks through demucs first to isolate the vocals.

https://github.com/adefossez/demucs

Vanilla whisper will probably be better than whisperX for that use case though.

Depending on how esoteric your music library is, you can also build a lyrics DB with beets: https://beets.readthedocs.io/en/stable/plugins/lyrics.html

irmadlad@lemmy.world · 5 hours ago

I use UVR for vocal isolation. It just works, but that shouldn’t be a problem. I’ll check it out. At the worst, I’ll learn something.

hoshikarakitaridia@lemmy.world · 8 hours ago

It might take a while, but when your PC is working on it you are not and searching for words might be easier ^^

I’m excited to hear how well it works ^^

irmadlad@lemmy.world · 3 hours ago

I’m always excited to try new stuff. You never know. A use case might develop that you didn’t think of.

WhisperX — Automated Transcripts w/ Timestamps and Speaker Tagging

WhisperX — Automated Transcripts w/ Timestamps and Speaker Tagging

GitHub - m-bain/whisperX: WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)