Home / Companies / AssemblyAI / Blog / Post Details
Content Deep Dive

AssemblyAI vs Deepgram: what's the best voice agent API?

Blog post from AssemblyAI

Post Details
Company
Date Published
Author
Kelsey Foster
Word Count
1,887
Language
English
Hacker News Points
-
Summary

AssemblyAI and Deepgram, both offering voice agent APIs at around $4.50 per hour, utilize a cascaded architecture with distinct models for speech-to-text (STT), language models (LLM), and text-to-speech (TTS) processes. AssemblyAI's Universal-3 Pro Streaming model is noted for its higher word accuracy at 94.07% and a lower missed entity rate of 16.7%, compared to Deepgram's Nova-3 model, which has a 92.10% word accuracy and a 25.5% missed entity rate. This disparity significantly impacts the ability of voice agents to perform tasks correctly without needing user repetition. AssemblyAI's voice agent API is praised for its straightforward pricing model, offering flat per-minute billing without concurrency metering, simplifying cost prediction, whereas Deepgram's concurrency metering can lead to unpredictable costs during peak usage. Additionally, AssemblyAI's API supports dynamic mid-conversation updates, enhancing flexibility for applications requiring real-time changes, while Deepgram's approach is more conventional. AssemblyAI is particularly recommended for production environments that prioritize speech accuracy and those in healthcare, with features like Medical Mode for specialized terminology.