Home / Companies / Northflank / Blog / Post Details
Content Deep Dive

Best open source speech-to-text (STT) model in 2026 (with benchmarks)

Blog post from Northflank

Post Details
Company
Date Published
Author
Cristina Bunea
Word Count
2,330
Language
English
Hacker News Points
-
Summary

In 2026, the leading open-source speech-to-text (STT) models include Canary Qwen 2.5B, IBM Granite Speech 3.3 8B, Whisper Large V3, Whisper Large V3 Turbo, Parakeet TDT, and Moonshine, each excelling in different areas such as accuracy, multilingual support, real-time processing, and edge deployment. These models are evaluated based on metrics like word error rate (WER), real-time factor (RTF), latency, supported languages, and model size, providing flexibility and cost advantages over commercial services. Canary Qwen 2.5B is noted for its high English accuracy, IBM Granite Speech for enterprise-grade applications, and Whisper Large V3 for its multilingual capabilities. Parakeet TDT is optimized for ultra-low latency streaming, while Moonshine is designed for mobile and edge devices. Deploying these models effectively on platforms like Northflank involves considerations of model size, VRAM usage, and the specific requirements of the application, such as speed, accuracy, and deployment environment. The choice between open source and commercial STT solutions often hinges on factors like cost, data privacy, customization needs, and the scale of deployment.