I think a lot of people have heard of OpenAI’s local-friendly Whisper model, but I don’t see enough self-hosters talking about WhisperX, so I’ll hop on the soapbox:

Whisper is extremely good when you have lots of audio with one person talking, but fails hard in a conversational setting with people talking over each other. It’s also hard to sync up transcripts with the original audio.

Enter WhisperX: WhisperX is an improved whisper implementation that automatically tags who is talking, and tags each line of speech with a timestamp.

I’ve found it great for DMing TTRPGs — simply record your session with a conference mic, run a transcript with WhisperX, and pass the output to a long-context LLM for easy session summaries. It’s a great way to avoid slowing down the game by taking notes on minor events and NPCs.

I’ve also used it in a hacky script pipeline to bulk download podcast episodes with yt-dlp, create searchable transcripts, and scrub ads by having an LLM sniff out timestamps to cut with ffmpeg.

Privacy-friendly, modest hardware requirements, and good at what it does. WhisperX, apply directly to the forehead.

  • irmadlad@lemmy.world
    link
    fedilink
    English
    arrow-up
    4
    ·
    1 day ago

    I guess that’s why I am having difficulty coming up with a use case. I mean, I walk around the lab talking to myself all day long, but I think it’d be a bad idea to have a record of all those conversations. lol

    • onslaught545@lemmy.zip
      link
      fedilink
      English
      arrow-up
      3
      ·
      1 day ago

      If you don’t have to sit through a bunch of ‘meetings that could have been emails’ on a daily basis, you likely won’t have a use case for it.

      But in my last job I was a systems engineer for a web development company. I had to be included on all of the dev calls in case an infrastructure question came up that I needed to answer, and so I was vaguely aware of what the devs were doing.

      This software would have been a lifesaver, because my ADHD doesn’t let me listen to stuff like that for a straight hour or two.