GitHub – ggerganov/whisper.cpp: Port of OpenAI’s Whisper model in C/C++
Posted by jpluimers on 2024/11/20
For future experimentation transcribing voice conversations: [Wayback/Archive] GitHub – ggerganov/whisper.cpp: Port of OpenAI’s Whisper model in C/C++
Whisper (speech recognition system) usually runs in the cloud (someone else’s computers, often rentable for a substantial monthly sum).
Via
- [Wayback/Archive] Jeroen Wiert Pluimers: “Wat is een goede tool voor transcriptie van Nederlandse tekst voor hobbymatig gebruik?…” – Mastodon
- [Wayback/Archive] bert hubert 🇺🇦🇪🇺: “@wiert whisper.cpp als je handig bent…” – Fosstodon
Now hopefully Whisper works well with the Dutch language…
I later realised Jeff Geerling mentioned Whisper a while ago as well:
- [Wayback/Archive] Jeff Geerling on X: “Since people are asking, I’m using Whisper (
github.com/openai/whisper) to transcribe individual video files (which I organized chronologically), then SBERT (github.com/dmmiller612/bert-extractive-summarizer…) to summarize each vlog”
- [Wayback/Archive] Jeff Geerling on X: “@NetworkChuck Here’s a quick blog post I did about Whisper earlier this year:
jeffgeerling.com/blog/2023/transcribing-recorded-audio-and-video-text-using-whisper-ai-on-mac… It’s freakishly good, even with technobabble.”
[Wayback/Archive] Transcribing recorded audio and video to text using Whisper AI on a Mac | Jeff Geerling
[Wayback/Archive] Every YouTube creator should do this (most don’t) – YouTube
and [Wayback/Archive] Jeff Geerling on X: “Here’s how I use Whisper to easily create accurate subtitles for every YouTube video across all three of my channels: … More content creators should do this—can’t speak to non-English languages, but for English, it’s eerily accurate.” / X (edit 20250130: added this Twitter)
and even earlier: [Wayback/Archive] Lior⚡ on X: “You can now transcribe 2.5 hours of audio in 98 seconds, locally…”
You can now transcribe 2.5 hours of audio in 98 seconds, locally.
A new implementation called insanely-fast-whisper is blowing up on Github.
It works on works on Mac or Nvidia GPUs and uses the Whisper + Pyannote library speed up transcriptions and speaker segmentations.
Here’s how you can use it:
pip install insanely-fast-whisper
insanely-fast-whisper --file-name <FILE NAME or URL> --batch-size 2 --device-id mps --hf_token <HF TOKEN>[Wayback/Archive] GitHub – Vaibhavs10/insanely-fast-whisper
- [Wayback/Archive] video.twimg.com/ext_tw_video/1730306137642426368/pu/vid/avc1/406×270/nCGTfwa7_IV7YJM-.mp4
- [Wayback/Archive] video.twimg.com/ext_tw_video/1730306137642426368/pu/vid/avc1/542×360/vSQ548Z_wVfJ2ZxU.mp4
- [Wayback/Archive] video.twimg.com/ext_tw_video/1730306137642426368/pu/vid/avc1/966×640/hgsBIk36RbALXN0D.mp4
--jeroen






Leave a comment