Gladia x Rime I Building better CX agents with STT and TTS

Post Details

Company

Gladia

Date Published

Dec. 23, 2025

Author

Jean-Louis Quéguiner

Word Count

1,163

Language

English

Hacker News Points

-

Source URL

www.gladia.io/blog/building-better-voice-agents-with-stt-and-tts

Summary

In a webinar with Lily Clifford, founder of TTS-specialized Rime, and speech technology researcher, it was highlighted that despite technological advancements in speech-to-text (STT), text-to-speech (TTS), and large language models, fully autonomous voice assistants are yet to meet real-world expectations. Clifford pointed out that human-like TTS can sometimes negatively impact performance, as overly expressive voices may seem unnatural to users, especially in telephone interactions, leading to increased hang-ups. The discussion also emphasized the importance of precision over mere accuracy in STT and TTS, especially concerning critical entities, and the limitation of public ASR benchmarks in reflecting real-world applications. The webinar underscored the differences in voice agent design between inbound and outbound calls, stressing the need for different strategies to optimize latency and user experience. Successful voice teams engage in continuous A/B testing to refine voice interactions, focusing on conversational rhythm rather than speed alone. Ultimately, the conversation highlighted that building effective voice agents involves treating them as comprehensive user experiences, requiring ongoing testing and evaluation grounded in real user behavior and business outcomes.