Company
Date Published
Author
Ana Olssen
Word count
1539
Language
English
Hacker News points
None

Summary

Ursa is Speechmatics' latest speech-to-text system that can transcribe difficult audio with incredible accuracy regardless of demographics, which is crucial for high-quality downstream performance. Large language models like ChatGPT and GPT4 are trained to predict the next word given the sequence of words that have come before, learning from vast amounts of training data to perform tasks such as summarization, sentiment analysis, emotion detection, named entity recognition, and question answering. However, these models can gloss over some recognition errors and produce "better than input" answers due to hallucinations based on their knowledge from training data. The accuracy of the ASR transcript is crucial for ensuring a high-quality output, with Ursa producing transcripts with excellent accuracy particularly on named entities, technical terminology, and difficult audio. In contrast, lower-accuracy transcripts can cause errors ranging from spelling mistakes to complete inability to perform tasks, as demonstrated by experiments using GPT4 and ChatGPT on Ursa and Google transcriptions.