How to build with the Voice Agent API
Blog post from AssemblyAI
The Voice Agent API by AssemblyAI offers a comprehensive solution for developing voice agents by integrating the entire voice processing pipeline, including speech-to-text (STT), large language model (LLM) reasoning, text-to-speech (TTS), turn detection, and tool calling, all over a single WebSocket connection. Priced at a flat rate of $4.50 per hour, the API simplifies the development process by eliminating the need for multiple service providers and invoices, thus streamlining setup and operation. Key features include adaptive turn detection, which adjusts to a user's speaking pace and context, semantic interruption handling that distinguishes between true interruptions and back-channel affirmations, and the ability to call external tools during conversations. The API supports six input languages and eleven output languages, allowing for multilingual interactions. Developers can easily integrate and customize the API within their applications without needing a dedicated SDK, using standard JSON-over-WebSocket protocols.