The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

  • My badges

  • Twitter Updates

  • My Flickr Stream

  • Pages

  • All categories

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 1,860 other subscribers

GitHub – ggerganov/whisper.cpp: Port of OpenAI’s Whisper model in C/C++

Posted by jpluimers on 2024/11/20

For future experimentation transcribing voice conversations: [Wayback/Archive] GitHub – ggerganov/whisper.cpp: Port of OpenAI’s Whisper model in C/C++

Whisper (speech recognition system) usually runs in the cloud (someone else’s computers, often rentable for a substantial monthly sum).

Via

  1. [Wayback/Archive] Jeroen Wiert Pluimers: “Wat is een goede tool voor transcriptie van Nederlandse tekst voor hobbymatig gebruik?…” – Mastodon
  2. [Wayback/Archive] bert hubert 🇺🇦🇪🇺: “@wiert whisper.cpp als je handig bent…” – Fosstodon

Now hopefully Whisper works well with the Dutch language…

I later realised Jeff Geerling mentioned Whisper a while ago as well:

  1. [Wayback/Archive] Jeff Geerling on X: “Since people are asking, I’m using Whisper (github.com/openai/whisper) to transcribe individual video files (which I organized chronologically), then SBERT (github.com/dmmiller612/bert-extractive-summarizer…) to summarize each vlog”
  2. [Wayback/Archive] Jeff Geerling on X: “@NetworkChuck Here’s a quick blog post I did about Whisper earlier this year: jeffgeerling.com/blog/2023/transcribing-recorded-audio-and-video-text-using-whisper-ai-on-mac… It’s freakishly good, even with technobabble.”

    [Wayback/Archive] Transcribing recorded audio and video to text using Whisper AI on a Mac | Jeff Geerling

    [Wayback/Archive] Every YouTube creator should do this (most don’t) – YouTube

and [Wayback/Archive] Jeff Geerling on X: “Here’s how I use Whisper to easily create accurate subtitles for every YouTube video across all three of my channels: … More content creators should do this—can’t speak to non-English languages, but for English, it’s eerily accurate.” / X (edit 20250130: added this Twitter)

and even earlier: [Wayback/Archive] Lior⚡ on X: “You can now transcribe 2.5 hours of audio in 98 seconds, locally…

You can now transcribe 2.5 hours of audio in 98 seconds, locally.

A new implementation called insanely-fast-whisper is blowing up on Github.

It works on works on Mac or Nvidia GPUs and uses the Whisper + Pyannote library speed up transcriptions and speaker segmentations.

Here’s how you can use it:

pip install insanely-fast-whisper

insanely-fast-whisper --file-name <FILE NAME or URL> --batch-size 2 --device-id mps --hf_token <HF TOKEN>

[Wayback/Archive] GitHub – Vaibhavs10/insanely-fast-whisper

[Wayback/Archive] Tweet JSON

--jeroen


 

 

 

 

 

 

 

 

 

 

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.