Company
Date Published
Author
-
Word count
915
Language
English
Hacker News points
None

Summary

Speech recognition has been gaining momentum in recent years, with big companies launching their own voice assistants such as Apple Siri, Microsoft Cortana, Amazon Alexa, and Google Assistant. These systems are crucial for places with low literacy rates or where speech is the primary means of communication. Speech recognition APIs have become more accessible through cloud-based services from IBM, Microsoft, Google, and others. Academic research has made progress in achieving human parity, but there are significant differences between academic and commercial datasets, vocabulary sizes, and error types. Commercial systems often use larger datasets, restricted vocabularies, and ignore short functional words, whereas academics focus on English language datasets with limited data on other languages. Real-time factor, robustness, and additional functionalities such as diarization and audio segmentation are also essential for real-world applications. The gap between academic and commercial systems highlights the need for a more holistic approach to improve efficiency, with companies working to bridge this gap by developing both types of ASR technology.