Nova-3 Multilingual Speech-To-Text: Improving Multilingual Accuracy at Production Scale
Blog post from Deepgram
Nova-3 Multilingual's updated speech-to-text model significantly enhances multilingual accuracy, achieving a ~34% reduction in batch mean WER and a ~21% reduction in streaming mean WER, particularly excelling in code-switching situations without needing API changes. This update addresses the complexities of real-world multilingual speech recognition, such as language mixing within sentences, by retraining on diverse multilingual benchmarks. Supporting languages like English, Spanish, and Japanese, the model now better handles code-switching and offers features like Keyterm Prompting, which aids in domain-specific transcription without custom vocabularies. These enhancements reduce transcription errors, minimize manual corrections, and improve analytics, providing robust performance for applications like call centers and IVR systems. The model is live and available without changes to existing setups, allowing developers to leverage its capabilities for more reliable voice AI solutions.