Home / Companies / Gladia / Blog / Post Details
Content Deep Dive

Building real-time multilingual ASR with code-switching

Blog post from Gladia

Post Details
Company
Date Published
Author
Bruno Hays
Word Count
1,538
Language
English
Hacker News Points
-
Summary

Bruno Hays, a Lead ML Speech Engineer at Gladia, developed a novel approach to improve real-time multilingual automatic speech recognition (ASR) with code-switching by creating a lightweight, modular ensemble system that efficiently routes between small, specialized models instead of relying on a large multilingual model. This system, which is fully open source, uses a Voice Activity Detection (VAD) component to identify speech boundaries, Streaming Zipformer models for ASR, and a Language Identification (LID) system for detecting language switches. The Asynchronous Rollback Pipeline method reduces language lag by instantly transcribing audio with the active ASR engine, monitoring for language changes, and adjusting the transcription as needed. This approach outperforms larger models in inter-utterance code-switching scenarios, achieving a 13% Word Error Rate (WER), but struggles with intra-utterance switching, where it falls behind cloud APIs despite performing better than some local models. The results suggest that future ASR systems could benefit from using small, specialized models with intelligent routing, offering a more efficient solution for local, on-device multilingual ASR tasks.