Node.js voice agent with AssemblyAI Universal-3 Pro Streaming
Blog post from AssemblyAI
The tutorial by Kelsey Foster demonstrates how to build a real-time voice agent in Node.js using the AssemblyAI Universal-3 Pro Streaming model, which offers features such as low latency, real-time diarization, and anti-hallucination. It provides two modes: a terminal agent for mic input and text-to-speech audio playback, and a browser server using Node.js WebSocket with a user interface. The guide highlights the advantages of AssemblyAI's neural turn detection, which utilizes both acoustic and linguistic signals, eliminating the need for a separate voice activity detection library. The tutorial includes quick start instructions, turn detection handling, and audio sending methods, and emphasizes the ability to adjust parameters for optimal performance. The setup requires Node.js 18+, specific npm packages, and can be deployed on platforms like Railway, Render, or Fly.io, with resources available for further exploration of AssemblyAI's capabilities.