Company
Date Published
Author
Stuart Wood
Word count
1973
Language
English
Hacker News points
None

Summary

We've added an open test set to our evaluation suite, comparing ourselves against others using a publicly available dataset called FLEURS. We performed well in languages typically underrepresented, outperforming competitors by 8.45% on average and achieving higher accuracy than Amazon, AssemblyAI, and Deepgram in every language we support. However, we were surpassed by OpenAI Whisper in English by 0.48%. This led to a discussion about whether to share our test results, with some team members arguing that it could undermine our claims and damage our brand. Instead, we prioritized transparency and helping customers make informed decisions. We recognize the challenges of testing speech-to-text services, particularly with datasets that are too artificial and don't reflect real-world scenarios. To address this, we've made our test data more realistic by incorporating varied audio from different speakers, accents, and environments. Our results show that Speechmatics outperforms OpenAI Whisper in 8 out of 9 tests, making fewer errors than the competitor by an average of 32%. We believe that accuracy is not just about being the best at transcribing clean audio, but also about providing valuable transcripts regardless of input quality. Our goal is to create something useful and valuable, which requires testing our product with realistic data and acknowledging when we fall short. Ultimately, we encourage customers to try out our audio portal for themselves to see the results and make informed decisions.