/plushcap/analysis/assemblyai/a-survey-on-end-to-end-speech-recognition-architectures-in-2021

Comparing End-To-End Speech Recognition Architectures in 2021

What's this blog post about?

AssemblyAI's research and development efforts focus on improving the accuracy of their Speech-to-Text API. They are exploring new architectures for end-to-end speech recognition, such as Listen Attend and Spell (LAS) and Recurrent Neural Network Transducers (RNNT). These models have shown production level accuracy matching or surpassing that of conventional hybrid DNN-HMM systems. The LAS model is perceived to have better accuracy than RNNT, but RNNT models are seen as having more desirable features for production use. Combining LAS and RNNT can achieve better accuracy and feature parity when compared to hybrid RNN-HMM models.

Company
AssemblyAI

Date published
Jan. 1, 2021

Author(s)
Michael Nguyen

Word count
3372

Hacker News points
6

Language
English