How big is the gap between academic and commercial speech recognition systems?

Post Details

Company

Speechmatics

Date Published

Nov. 19, 2020

Author

-

Word Count

915

Company Posts That Month

15

Language

English

Hacker News Points

-

Source URL

www.speechmatics.com/company/articles-and-news/big-gap-academic-commercial-speech-recognition-systems

Summary

Speech recognition has been gaining momentum in recent years, with big companies launching their own voice assistants such as Apple Siri, Microsoft Cortana, Amazon Alexa, and Google Assistant. These systems are crucial for places with low literacy rates or where speech is the primary means of communication. Speech recognition APIs have become more accessible through cloud-based services from IBM, Microsoft, Google, and others. Academic research has made progress in achieving human parity, but there are significant differences between academic and commercial datasets, vocabulary sizes, and error types. Commercial systems often use larger datasets, restricted vocabularies, and ignore short functional words, whereas academics focus on English language datasets with limited data on other languages. Real-time factor, robustness, and additional functionalities such as diarization and audio segmentation are also essential for real-world applications. The gap between academic and commercial systems highlights the need for a more holistic approach to improve efficiency, with companies working to bridge this gap by developing both types of ASR technology.

Trends Found in this Post

Trend	Post Mentions	Total Month Mentions	Posts	Companies	MoM
Real-time	2	645	208	64	-14%