Home / Companies / AssemblyAI / Blog / Post Details
Content Deep Dive

Node.js voice agent with AssemblyAI Universal-3 Pro Streaming

Blog post from AssemblyAI

Post Details
Company
Date Published
Author
Kelsey Foster
Word Count
800
Language
English
Hacker News Points
-
Summary

The tutorial by Kelsey Foster demonstrates how to build a real-time voice agent in Node.js using the AssemblyAI Universal-3 Pro Streaming model, which offers features such as low latency, real-time diarization, and anti-hallucination. It provides two modes: a terminal agent for mic input and text-to-speech audio playback, and a browser server using Node.js WebSocket with a user interface. The guide highlights the advantages of AssemblyAI's neural turn detection, which utilizes both acoustic and linguistic signals, eliminating the need for a separate voice activity detection library. The tutorial includes quick start instructions, turn detection handling, and audio sending methods, and emphasizes the ability to adjust parameters for optimal performance. The setup requires Node.js 18+, specific npm packages, and can be deployed on platforms like Railway, Render, or Fly.io, with resources available for further exploration of AssemblyAI's capabilities.