Announcing the fastest inference for realtime voice AI agents
Blog post from Together AI
Voice interfaces are increasingly crucial for AI-native applications, enhancing user engagement and productivity in tasks like transcription, speech-to-code, and custom podcasts. However, developers face challenges due to the need to integrate various specialized voice services, leading to increased complexity, latency, and costs. Together AI has introduced an expanded set of low-latency, high-performance voice infrastructure to streamline development, offering a comprehensive range of services that support both real-time and batch processing. Key features include the industry's fastest speech-to-text API, optimized for rapid transcription and natural conversation flow, and serverless open-source text-to-speech models that deliver professional-quality output with minimal latency. These innovations ensure accurate transcription, natural-sounding speech, and consistent performance under load, addressing critical aspects such as latency, quality, and scalability. The infrastructure is tailored for production voice agents, maintaining efficiency and reliability even during high-traffic scenarios, thereby enhancing user satisfaction and operational effectiveness across various applications.