One stream, two jobs: introducing SpeakerRevision
Blog post from AssemblyAI
SpeakerRevision is an innovative feature that enhances real-time speech processing by providing asynchronous-grade accuracy in speaker labeling at the end of a live stream, thereby eliminating the need for separate asynchronous processing to achieve a clean final transcript. This new message type revises speaker labels with only about 400 milliseconds of added latency, significantly improving accuracy metrics such as DER and cpWER, and reducing false-alarm speakers by 84%. The implementation of SpeakerRevision allows for a unified streaming pipeline that simultaneously supports the live experience and post-call analyses without requiring redundant infrastructure, benefiting various applications like AI notetakers, contact center analytics, and voice agents by delivering both real-time and final transcripts from the same source. This advancement is particularly advantageous for maintaining seamless integration while ensuring high accuracy and reducing operational overhead.