How to Rapidly Train New Languages Using Common Voice and OSCAR

Post Details

Company

Speechmatics

Date Published

Jan. 17, 2023

Author

Steve Kingsley

Word Count

723

Language

English

Hacker News Points

-

Source URL

www.speechmatics.com/company/articles-and-news/how-to-rapidly-train-new-languages-using-common-voice-and-oscar

Summary

When it comes to speech-to-text systems, there's a significant abundance of data available online, especially in common languages, but under-resourced languages face a major challenge due to limited data availability. To address this issue, companies like Speechmatics turn to existing datasets such as Common Voice and OSCAR to fill the gaps and rapidly deploy new language support. The Common Voice project allows users to contribute labeled data, while OSCAR provides a multilingual corpus created from another open-source project, Common Crawl. By providing these datasets, both projects help bring inclusivity and equity to speech-to-text systems, enabling companies like Speechmatics to improve the accuracy of their models and support more languages. This collaboration enables rapid language deployment, reduces bias in content availability, and promotes equality for users worldwide.