Building real-time multilingual ASR with code-switching

Post Details

Company

Gladia

Date Published

June 1, 2026

Author

Bruno Hays

Word Count

1,538

Company Posts That Month

23

Language

English

Hacker News Points

-

Post removed?

No

Source URL

www.gladia.io/blog/building-real-time-multilingual-asr-with-code-switching

Summary

Bruno Hays, a Lead ML Speech Engineer at Gladia, developed a novel approach to improve real-time multilingual automatic speech recognition (ASR) with code-switching by creating a lightweight, modular ensemble system that efficiently routes between small, specialized models instead of relying on a large multilingual model. This system, which is fully open source, uses a Voice Activity Detection (VAD) component to identify speech boundaries, Streaming Zipformer models for ASR, and a Language Identification (LID) system for detecting language switches. The Asynchronous Rollback Pipeline method reduces language lag by instantly transcribing audio with the active ASR engine, monitoring for language changes, and adjusting the transcription as needed. This approach outperforms larger models in inter-utterance code-switching scenarios, achieving a 13% Word Error Rate (WER), but struggles with intra-utterance switching, where it falls behind cloud APIs despite performing better than some local models. The results suggest that future ASR systems could benefit from using small, specialized models with intelligent routing, offering a more efficient solution for local, on-device multilingual ASR tasks.

Trends Found in this Post

Trend	Post Mentions	Total Month Mentions	Posts	Companies	MoM
Real-time	14	5,515	1,316	255	-4%
LLM	2	5,954	1,130	235	-34%

Use This Data

Use this post, company, and trend context to find content marketing opportunities, perform competitive analysis, or address product feature gaps via the Plushcap MCP server or the Plushcap API.