Home / Companies / AssemblyAI / Blog / Post Details
Content Deep Dive

How to build with the Voice Agent API

Blog post from AssemblyAI

Post Details
Company
Date Published
Author
Kelsey Foster
Word Count
2,297
Language
English
Hacker News Points
-
Summary

The Voice Agent API by AssemblyAI offers a comprehensive solution for developing voice agents by integrating the entire voice processing pipeline, including speech-to-text (STT), large language model (LLM) reasoning, text-to-speech (TTS), turn detection, and tool calling, all over a single WebSocket connection. Priced at a flat rate of $4.50 per hour, the API simplifies the development process by eliminating the need for multiple service providers and invoices, thus streamlining setup and operation. Key features include adaptive turn detection, which adjusts to a user's speaking pace and context, semantic interruption handling that distinguishes between true interruptions and back-channel affirmations, and the ability to call external tools during conversations. The API supports six input languages and eleven output languages, allowing for multilingual interactions. Developers can easily integrate and customize the API within their applications without needing a dedicated SDK, using standard JSON-over-WebSocket protocols.