Accent Variability in Restaurant Voice AI: Why the ASR Layer Makes or Breaks Multi-Region Ordering
Blog post from Deepgram
Accent variability in Automatic Speech Recognition (ASR) systems poses significant challenges for multi-region restaurant voice ordering, as demonstrated by McDonald's terminated voice ordering pilot, which saw accuracy drop due to accent misinterpretations. The ASR layer is crucial because any transcript errors from this stage impact the entire ordering process, leading to incorrect orders and costly customization mistakes. To address these issues, strategies such as increasing speaker diversity in training data, using Keyterm Prompting for real-time vocabulary corrections, and selecting locale-specific models are recommended to maintain accuracy across different accents. Real-world testing with market-specific audio and conditions is essential to evaluate operational accuracy versus benchmark accuracy, as the latter often fails to account for the complex, noisy environments of drive-thrus. Implementing these techniques can help restaurant chains achieve stable Word Error Rates (WER) across regions, ensuring that voice AI systems function effectively in diverse linguistic settings.