TL;DR
| Parakeet V3 | Whisper Large V3 | |
|---|---|---|
| Speed | 10× | 1× |
| Supported languages | 25 | 100+ |
| English error rate (WER) | 6.32% | 7.44% |
| Avg error rate, 25 langs (WER) | 12.0% | 12.6% |
| Hallucinations | None | On silence |
| Best for | English & European | Asian, Arabic, 100+ |
* Speed: 35-min audio, Apple Silicon. English WER: Open ASR Leaderboard. 25-lang avg: FLEURS benchmark.
Starting with version 1.3.2, Whisper Notes for Mac ships with NVIDIA Parakeet TDT 0.6B as the default speech engine. It's 10x faster than Whisper Large V3 Turbo for English, and more accurate. Whisper models are still available if you need other languages.
Why We Switched the Default
Whisper is great, but it was designed as a general-purpose model. It handles 100+ languages, translates, generates timestamps — a Swiss Army knife. The trade-off is speed. For English dictation, where you just want words on screen fast, it's overkill.
Here's the thing that bugged me: when using Fn-key system-wide dictation with Whisper, finishing a ~1 minute utterance meant waiting 3–5 seconds for the transcript to appear. That pause breaks the flow. You stop talking, you wait, you stare at the cursor — it kills the magic of voice typing.
Parakeet changed that completely. The speed is so fast that the transcript appears the instant you stop speaking. Speak, and the words are just there. Once you experience that feeling — that seamless, zero-wait flow — it's really hard to go back to Whisper.
How Fast Is Parakeet V3?
Numbers speak louder than words. Here's a real-world comparison using a 35-minute audio file on the same Mac:
| Model | 35 min Audio |
|---|---|
| Whisper Large V3 Turbo | 3 minutes |
| Parakeet TDT 0.6B v3 | 18 seconds |
That's 10x faster. And because the model is smaller (600M vs 800M parameters), it uses less memory and less battery too.
What Makes Parakeet v3 So Fast
Whisper listens to audio the way you'd read a book out loud — word by word, frame by frame, never skipping ahead. Even during silence, it's still processing, still guessing what comes next. That's thorough, but slow.
Parakeet takes a fundamentally different approach. It compresses the audio signal 8x before processing, so the model sees only what matters. Then, instead of grinding through every single frame, it predicts not just what word you said, but how long that word lasts — and jumps ahead. Silence? Skipped entirely. A long vowel? One prediction instead of dozens.
The result is a model that processes speech the way your brain does — focusing on the words, ignoring the gaps. That's why it's 10x faster with fewer parameters and higher accuracy.
Benchmarks: Parakeet v3 vs Whisper
Parakeet v3 matches or beats models 2-4x its size across FLEURS, CoVoST, and MLS benchmarks
On the Hugging Face Open ASR Leaderboard, Parakeet v3 tops the chart with only 600M parameters — less than half of Whisper Large V3's 1.55B:
| Model | Parameters | Avg WER | Speed (RTFx) |
|---|---|---|---|
| Parakeet TDT 0.6B v3 | 0.6B | 6.32% | 3,333x |
| Canary 1B v2 | 1.0B | 7.15% | 749x |
| Whisper Large V3 | 1.55B | 7.44% | 146x |
| Whisper Large V3 Turbo | 0.8B | 7.6% | 350x |
Lower WER = fewer errors. Higher RTFx = faster. Parakeet wins on both. With 600M parameters, it's also the smallest model on that list — which means it runs beautifully on Apple Silicon with minimal memory and battery drain.
Multilingual WER: All 25 Languages
The leaderboard above covers English only. Here's the full picture — how the three models available in Whisper Notes compare across all 25 languages Parakeet supports, measured on the FLEURS benchmark. Lower WER = fewer transcription errors. Best value between Large V3 and Parakeet is highlighted per row:
| Language | Whisper Small | Whisper Large V3 | Parakeet V3 |
|---|---|---|---|
| Bulgarian | 37.3 | 12.9 | 12.6 |
| Croatian | 33.4 | 11.1 | 12.5 |
| Czech | 37.6 | 11.3 | 11.0 |
| Danish | 32.8 | 12.6 | 18.4 |
| Dutch | 16.4 | 5.6 | 7.5 |
| English | 6.1 | 4.3 | 4.9 |
| Estonian | 51.3 | 19.1 | 17.7 |
| Finnish | 24.0 | 7.7 | 13.2 |
| French | 15.0 | 6.3 | 5.2 |
| German | 10.2 | 4.3 | 5.0 |
| Greek | 30.8 | 27.0 | 20.7 |
| Hungarian | 38.9 | 14.1 | 15.7 |
| Italian | 9.8 | 2.3 | 3.0 |
| Latvian | 53.2 | 18.3 | 22.8 |
| Lithuanian | 65.6 | 22.3 | 20.4 |
| Maltese | 92.2 | 68.9 | 20.5 |
| Polish | 14.7 | 4.7 | 7.3 |
| Portuguese | 7.3 | 3.7 | 4.8 |
| Romanian | 29.8 | 8.2 | 12.4 |
| Russian | 11.4 | 4.2 | 5.5 |
| Slovak | 33.3 | 8.4 | 8.8 |
| Slovenian | 49.3 | 19.9 | 24.0 |
| Spanish | 5.6 | 3.1 | 3.5 |
| Swedish | 20.8 | 7.9 | 15.1 |
| Ukrainian | 19.3 | 6.5 | 6.8 |
| Average | 29.8 | 12.6 | 12.0 |
WER (%) on FLEURS. Whisper Small data from Radford et al.; Large V3 and Parakeet V3 data from NVIDIA Canary-1B-v2 paper.
Whisper Large V3 edges ahead on most individual languages — it's 2.5x larger, after all. But Parakeet V3 matches it on average (12.0% vs 12.6%), takes Greek, French, Estonian, and Maltese decisively, and crushes Whisper Small across the board (60% fewer errors on average). The real story isn't a fraction of a percent in WER — it's the total package: Large V3–level accuracy at 23x the speed, with 40% of the memory, zero hallucinations, and everything running locally on your Mac.
No More Hallucinations
If you've used Whisper for dictation, you've probably seen it hallucinate during silence — repeating phrases, inventing words, or spitting out "Subtitles by Amara.org" from nowhere. This happens because Whisper's autoregressive decoder always expects to produce text, even when there's nothing to transcribe.
NVIDIA trained Parakeet on 36,000 hours of pure non-speech audio (background noise, coughs, silence) paired with empty string targets. The model learned what silence sounds like and stays quiet. For "always-on" system-wide dictation, this is a game-changer — no more garbage text appearing when you pause to think.
Languages Parakeet Supports
Parakeet v3 supports 25 languages: Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hungarian, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Spanish, Swedish, and Ukrainian.
That covers most of Europe, but it does not support Chinese, Japanese, Korean, Arabic, or Hindi. That's why we kept the Whisper models as downloadable options. If you dictate in Japanese or Mandarin, pick Whisper Large V3 Turbo from the model picker. For English and European languages, Parakeet v3 is simply the better engine.
Model picker: Parakeet V3 (default), Whisper Small, and Whisper Large V3 Turbo — all running locally
Model Picker in Whisper Notes
Open Settings to switch between models:
- Parakeet V3 (default) — Fastest, best for English & European languages
- Whisper Small — Lightweight, 100+ languages
- Whisper Large V3 Turbo — Most accurate multi-language model
All models run 100% locally on your Mac. No internet, no cloud, no data leaves your device.
What About Parakeet V2?
If you used V2 before, you might wonder how it compares. V2 was an English-only model — and its English accuracy is actually slightly better than V3 (6.05% vs 6.32% WER). V3 trades that tiny margin for 25-language support. Both are significantly more accurate than Whisper.
| Parakeet V2 | Parakeet V3 | Whisper Large V3 | |
|---|---|---|---|
| English WER | 6.05% | 6.32% | 7.44% |
| Languages | English only | 25 | 100+ |
In short: if you only need English, both V2 and V3 are great. V3 is the default in Whisper Notes because multilingual support matters to most users — and the English accuracy difference is negligible.
Try It
Parakeet v3 is available now in the Mac version — just download the latest DMG. Update: Parakeet is now available in the latest iOS version as well.
Questions or feedback? Email support@whispernotes.app.