Build a voice assistant app with AssemblyAI’s Voice Agent API
Blog post from AssemblyAI
AssemblyAI's Voice Agent API simplifies the creation of browser-based voice assistants by consolidating the speech-to-text, language model reasoning, and text-to-speech processes into a single WebSocket endpoint. This approach reduces latency and complexity by using a single API key, temporary tokens for secure connections, and built-in features such as barge-in handling and tool calling. Users can build a voice assistant app with less than 400 lines of code, utilizing a browser client and a lightweight Node server which ensures the API key remains secure. The API requires audio in 16-bit signed little-endian PCM format at 24,000 Hz and includes options for customizing voice selections and session prompts. Echo cancellation is recommended to prevent the agent from interrupting itself, and tokens must be refreshed for each new WebSocket connection to maintain security. AssemblyAI offers a free tier for development and testing.