Skip to main content
Voice models take your recorded voice and transcribe them into text. There are two types of voice models: local and cloud-based. Local (or Offline) models run on your machine. Cloud models are hosted either by Superwhisper or another provider.

Superwhisper (Cloud)

These are the models that are hosted in the Cloud by Superwhisper. They are optimized for low latency and accuracy, and are available for use in 100+ languages.
NameLangTranslateSpeedAccuracySizeLicense
S1-Voice (Cloud)multi109CloudPro
Ultra (Cloud)multi99CloudPro

Whisper Models (Local)

These models run on your machine, allowing you to transcribe audio without relying on external services. They are based on the Whisper model series from OpenAI. They run locally using whisper.cpp.
NameLangTranslateSpeedAccuracySizeLicense
Ultra V3 Turboall881.6 GBPro
Ultraall6103 GBPro
Ultra V3 Turbo (Chinese)zh881.6 GBPro
Proall781.5 GBPro
Pro (English)en781.5 GBPro
Standardall85500 MBFree
Standard (English)en85500 MBFree
Nanoall93150 MBFree
Nano (English)en93150 MBFree
Fastall10175 MBFree
Fast (English)en10175 MBFree

Nvidia Parakeet (Local)

They are based on the Parakeet model. They run locally (on your laptop) using Argmax’s WhisperKit SDK. They are extremely fast and run in parallel over long recordings. Drawbacks here are they do tend to struggle with punctuation and have minor hallucination issues with single word recordings.
NameLangTranslateSpeedAccuracySizeLicense
Parakeeten108476 MBPro
Parakeet Multilanguagemulti108494 MBPro

Deepgram (Cloud)

The Nova series of models are a Cloud model hosted by Deepgram.
NameLangTrans.SpeedAccuracySizeLicense
Nova 3multi78CloudPro
Nova 2multi77CloudPro
Nova Medicalen107CloudPro
I