Using the Voice Agent API alongside an existing voice stack
Blog post from AssemblyAI
The text discusses the integration of AssemblyAI's Voice Agent API and Universal-3 Pro Streaming within existing voice technology stacks, specifically for developers already using platforms like LiveKit and Pipecat. It emphasizes that developers do not need to overhaul their existing architecture to benefit from Universal-3 Pro's enhanced accuracy; instead, a simple endpoint change suffices. The text outlines two paths offered by AssemblyAI: the Voice Agent API, a fully managed solution ideal for new builds or quick proofs-of-concept, and the Universal-3 Pro Streaming, which integrates into current frameworks, enhancing speech-to-text accuracy at a lower cost. The importance of speech-to-text (STT) accuracy is highlighted, as it impacts downstream processes such as language model reasoning and text-to-speech outputs. The text suggests running parallel tests to evaluate the new model's performance without committing to a full migration, especially for industries requiring precise entity transcription, such as healthcare and finance. It also mentions that AssemblyAI is compliant with HIPAA for processing protected health information, catering to specific industry requirements.