Separating Speech From Structure: A Guide to skip_patterns in Agora Conversational AI
Blog post from Agora
Agora Conversational AI utilizes a feature called "skip_patterns" to enhance voice agents by selectively omitting specific parts of text from being converted to speech, while still retaining the full content in transcripts for application use. This capability is crucial for scenarios where large language models (LLMs) generate outputs that include code, tags, or structured data, which should be processed by the application rather than read aloud to the user. The system allows developers to designate certain bracket types to be skipped in audio, ensuring that the user hears only relevant spoken content, while the application can parse and utilize the entire response for UI updates, state synchronization, or code rendering. This functionality supports diverse applications such as voice coding assistants, live shopping agents, onboarding bots, and support agents, where the separation of spoken and machine-readable text is essential for an optimal user experience. By employing skip_patterns, developers can maintain a natural voice interaction while leveraging structured data for dynamic and responsive application behavior.