
Wav2Vec to Whisper to Nova-2: The evolution of AI & ASR

What's this blog post about?

The article discusses the evolution of AI and Automatic Speech Recognition (ASR) models from Wav2Vec 2.0 to Whisper and Nova-2. It highlights how pre-training has become a popular approach in Voice Technology, with large tech companies investing heavily in training models for Natural Language Processing tasks. The article compares the differences between Wav2Vec 2.0 and Whisper, noting that while both are pre-trained models, they have different architectures and approaches to training data. Whisper is a more customizable alternative to Wav2Vec 2.0, leveraging familiar architecture and finetuning processes. It aims to provide an easy-to-use Python package for users at various levels of abstraction. Nova-2, on the other hand, is more accurate, faster, and less expensive than Whisper, resulting from a decade's worth of iterations on patented AI architectures that deviate from the classic Transformer architecture. The article concludes by emphasizing the importance of understanding practical differences between technologies rather than getting overwhelmed by their minutiae in research contexts. It encourages users to test out Whisper and Nova-2 for themselves.


Date published
Oct. 20, 2023

Ben Luks, Jose Nicholas Francisco

Word count

Hacker News points
None found.


By Matt Makai. 2021-2024.