Introducing Nova-2: The Fastest, Most Accurate Speech-to-Text API

Company

Deepgram

Date Published

Sept. 19, 2023

Author

Josh Fox

Word count

2281

Language

English

Hacker News points

URL

deepgram.com/learn/nova-2-speech-to-text-api

Summary

Deepgram introduces Nova-2, a next-generation speech-to-text model that outperforms alternatives in terms of accuracy, speed, and cost. Nova-2 is 18% more accurate than its predecessor and offers a 36% relative WER improvement over OpenAI Whisper (large). It delivers an average 30% reduction in word error rate (WER) over competitors for both pre-recorded and real-time transcription, with 5-40x faster pre-recorded inference time. Nova-2 is priced at $0.0043/min for pre-recorded audio, making it more affordable than other full-functionality providers. The model has been trained on a diverse dataset and offers improved entity accuracy, punctuation accuracy, and capitalization error rate compared to Nova-1. Deepgram's benchmarking methodology uses over 50 hours of human-annotated audio across various domains and compares Nova-2 with other prominent models in the market.