Whisper Model Selection
The choice of the Whisper model is a trade-off between transcription accuracy (measured by Word Error Rate, or WER) and processing speed. The graph below shows the performance of different Whisper models across four languages, which can help in selecting the most appropriate model for a given task.
Word Error Rate (WER) versus processing time for various Whisper models in French, English, Spanish, and German.
Based on the performance data, here are some model selection recommendations for specific languages:
French:
large-v3is recommended for the highest accuracy.English: the
smallmodel performs slightly better thanlarge-v3while being nearly four times faster.Spanish: the
mediummodel provides a balance between accuracy and processing speed.German:
large-v3achieves the lowest error rate among the tested models.