Best-in-class real-time ASR system

Post Details

Company

Speechmatics

Date Published

May 9, 2023

Author

Steve Kingsley

Word Count

1,085

Language

English

Hacker News Points

-

Source URL

www.speechmatics.com/company/articles-and-news/best-in-class-real-time-asr-system

Summary

Real-Time ASR Systems have two common modes of operation: batch and real-time. In batch mode, audio is provided in complete files with a single transcript output, allowing higher accuracy. Real-time systems provide an audio stream, returning short segments of transcription back at regular intervals, where the trade-off between latency and accuracy comes into play. Evaluating batch versus real-time ASR, Ursa outperforms competitors in accuracy even when prioritizing speed over accuracy, achieving near-batch levels of accuracy with low latency settings. Latency is controlled through `max_delay` and `max_delay_mode`, allowing for a balance between timeliness and accuracy. Ursa's latest release demonstrates outstanding performance, reducing to zero relative difference in WER as latency increases to 10s, outperforming major vendors such as Amazon, Microsoft, and Google.