Home / Companies / AssemblyAI / Blog / Post Details
Content Deep Dive

One stream, two jobs: introducing SpeakerRevision

Blog post from AssemblyAI

Post Details
Company
Date Published
Author
Madison Bernstein
Word Count
982
Language
English
Hacker News Points
-
Summary

SpeakerRevision is an innovative feature that enhances real-time speech processing by providing asynchronous-grade accuracy in speaker labeling at the end of a live stream, thereby eliminating the need for separate asynchronous processing to achieve a clean final transcript. This new message type revises speaker labels with only about 400 milliseconds of added latency, significantly improving accuracy metrics such as DER and cpWER, and reducing false-alarm speakers by 84%. The implementation of SpeakerRevision allows for a unified streaming pipeline that simultaneously supports the live experience and post-call analyses without requiring redundant infrastructure, benefiting various applications like AI notetakers, contact center analytics, and voice agents by delivering both real-time and final transcripts from the same source. This advancement is particularly advantageous for maintaining seamless integration while ensuring high accuracy and reducing operational overhead.