Universal-3.5 Pro Realtime: the first streaming STT model that takes the agent's question as input
Blog post from AssemblyAI
Universal-3.5 Pro Realtime is AssemblyAI's latest flagship real-time speech-to-text model that emphasizes improved context retention and language handling to enhance transcription accuracy. It allows voice agents to pass questions with context, reducing word error rates significantly by using a rolling memory to keep track of conversations. This model supports 18 languages with mid-sentence code-switching and provides advanced features like voice focus to isolate primary speakers, making it ideal for noisy environments. Universal-3.5 Pro Realtime outperforms competitors in various metrics, such as word error rate and entity error rate, offering a cost-effective solution with add-ons like diarization and voice isolation. It is designed to integrate seamlessly into existing systems, with automatic upgrades for most users and the flexibility to handle large-scale operations without rate limits.