Add voice to your agent
Blog post from Cloudflare
Cloudflare has introduced an experimental voice pipeline for its Agents SDK, allowing developers to integrate real-time voice capabilities into their existing agent architectures without the need for a separate voice framework. The @cloudflare/voice package enables conversations with agents over a single WebSocket connection, maintaining the same Durable Object infrastructure and SQLite-backed conversation history as text interactions. This integration supports both full conversational voice agents and speech-to-text-only use cases, with built-in support for voice input and output using Workers AI providers like Deepgram. The system is designed to be provider-agnostic, allowing developers to mix and match components for their specific needs, and is compatible with various telephony and transport options, including WebRTC and Twilio. The voice pipeline aims to reduce latency by keeping the processing on Cloudflare's network and supports dynamic model switching and hooks for data interception. This approach allows for a seamless transition between voice and text inputs, providing a unified, multimodal agent experience.