Home / Companies / HuggingFace / Blog / Post Details
Content Deep Dive

Can Voice Agents Handle Bilingual Customers? Benchmarking Frontier ASR on Code-Switched Speech

Blog post from HuggingFace

Post Details
Company
Date Published
Author
Shama Gupta, Lindsay Brin, and Fanny Riols
Word Count
2,621
Language
-
Hacker News Points
-
Summary

The exploration of how voice agents handle code-switched speech reveals critical insights into the performance of automatic speech recognition (ASR) systems when dealing with bilingual customers who naturally switch languages. The benchmark study focuses on four language pairs—Spanish-English, French-English, Canadian French-English, and German-English—in enterprise settings like HR and IT scenarios, assessing models through metrics such as Word Error Rate (WER), Semantic Word Error Rate (SWER), and Answer Error Rate (AER). The study finds that transcription accuracy and semantic understanding vary significantly across models, with ElevenLabs Scribe V2, Gemini 3 Flash, and AssemblyAI Universal 3-Pro leading in performance. It highlights that code-switching introduces varied challenges depending on the language pair and context, exposing differences in model robustness rather than uniformly increasing difficulty. The number of language switches within an utterance is a key factor in transcription errors, while the Code-Mixing Index (CMI) influences error magnitude. Interestingly, errors predominantly occur in the English segments of code-switched utterances despite English being well-handled in monolingual contexts, suggesting that embedded language segments pose additional transcription challenges. This study underscores the importance of benchmarking ASR systems against the specific language pairs relevant to an enterprise's customer base to ensure effective handling of code-switched speech.