Can Voice Agents Handle Bilingual Customers? Benchmarking Frontier ASR on Code-Switched Speech

Post Details

Company

HuggingFace

Date Published

June 9, 2026

Author

Shama Gupta, Lindsay Brin, and Fanny Riols

Word Count

2,621

Language

-

Hacker News Points

-

Source URL

huggingface.co/blog/ServiceNow-AI/code-switching

Summary

The exploration of how voice agents handle code-switched speech reveals critical insights into the performance of automatic speech recognition (ASR) systems when dealing with bilingual customers who naturally switch languages. The benchmark study focuses on four language pairs—Spanish-English, French-English, Canadian French-English, and German-English—in enterprise settings like HR and IT scenarios, assessing models through metrics such as Word Error Rate (WER), Semantic Word Error Rate (SWER), and Answer Error Rate (AER). The study finds that transcription accuracy and semantic understanding vary significantly across models, with ElevenLabs Scribe V2, Gemini 3 Flash, and AssemblyAI Universal 3-Pro leading in performance. It highlights that code-switching introduces varied challenges depending on the language pair and context, exposing differences in model robustness rather than uniformly increasing difficulty. The number of language switches within an utterance is a key factor in transcription errors, while the Code-Mixing Index (CMI) influences error magnitude. Interestingly, errors predominantly occur in the English segments of code-switched utterances despite English being well-handled in monolingual contexts, suggesting that embedded language segments pose additional transcription challenges. This study underscores the importance of benchmarking ASR systems against the specific language pairs relevant to an enterprise's customer base to ensure effective handling of code-switched speech.