/plushcap/analysis/assemblyai/releasing-our-v8-transcription-model-major-accuracy-improvements

Releasing our v8 Transcription Model - 18.72% Better Accuracy

What's this blog post about?

AssemblyAI has released its most accurate Speech Recognition model to date, version 8 (v8), which delivers significant accuracy improvements across various types of audio and video data. The v8 model also introduces a major improvement in proper noun recognition. The company's research team, comprising AI researchers and engineers from leading technology companies, constantly researches and improves the models that power its Speech-to-Text API and other features like Topic Detection. By the end of 2022, AssemblyAI aims to develop speech recognition models approaching human level accuracy for challenging audio and video files with heavy accents and background noise. The v8 model's improvements include enhanced use of Transformers, interleaving Convolution Neural Network layers between Transformer layers, improved regularization via Layer Norm, jointly trained Language Model, and the use of word pieces instead of individual characters for predictions.

Company
AssemblyAI

Date published
Oct. 19, 2021

Author(s)
Dylan Fox

Word count
787

Hacker News points
None found.

Language
English