Company
Date Published
Author
Steve Kingsley
Word count
1085
Language
English
Hacker News points
None

Summary

Real-Time ASR Systems have two common modes of operation: batch and real-time. In batch mode, audio is provided in complete files with a single transcript output, allowing higher accuracy. Real-time systems provide an audio stream, returning short segments of transcription back at regular intervals, where the trade-off between latency and accuracy comes into play. Evaluating batch versus real-time ASR, Ursa outperforms competitors in accuracy even when prioritizing speed over accuracy, achieving near-batch levels of accuracy with low latency settings. Latency is controlled through `max_delay` and `max_delay_mode`, allowing for a balance between timeliness and accuracy. Ursa's latest release demonstrates outstanding performance, reducing to zero relative difference in WER as latency increases to 10s, outperforming major vendors such as Amazon, Microsoft, and Google.