How to Effectively Introduce External Language Models in the RNNT by Subtracting Internal Language Model Scores

Company

Speechmatics

Date Published

Nov. 10, 2022

Author

Oliver Parish

Word count

1338

Language

English

Hacker News points

None

URL

www.speechmatics.com/company/articles-and-news/how-to-introduce-external-language-models-in-the-rnnt

Summary

Integrating external language models (LMs) into automatic speech recognition (ASR) systems can significantly improve accuracy while maintaining fast runtimes and low memory costs. Recent end-to-end models, such as the recurrent neural network transducer (RNNT), include an implicit internal language model (ILM) that is trained jointly with the rest of the network. However, when combining external LMs with RNNT models, it's essential to subtract the ILM scores to get the best accuracy results. This involves applying Bayes' rule and using scale factors λ₁ and λ₂ to balance the trade-off between acoustics and common word sequences. Approximating the ILM can be done through various methods, including removing acoustic data contributions or modeling it with an LSTM. Tuning the parameters λ₁ and λ₂ requires careful consideration of the dataset used, as they can significantly impact performance. By carefully balancing the contributions of external LMs and ILMs, ASR systems can achieve significant improvements in accuracy while maintaining fast runtimes.