Chatterbox Turbo is now available on fal
Blog post from Fal
Chatterbox Turbo is an open-source, ultra-fast text-to-speech model designed for real-time voice AI applications, offering a sub-150 ms time to first sound and features such as expressive paralinguistic prompting and instant voice cloning. Built on a streamlined 350M-parameter architecture with a GPT-2 backbone, it supports rapid, natural-sounding interactions suitable for live conversations, voice user interfaces, and on-device experiences. The model allows for paralinguistic tags like [laugh] and [sigh] to convey emotions and reactions, enhancing the realism of voice agents and interactive experiences. Chatterbox Turbo also offers zero-shot voice cloning from as little as five seconds of audio, preserving the original voice's timbre and style while adding natural paralinguistic cues. Safety and provenance are ensured through a watermarking feature called PerTh, which provides verifiable AI audio outputs.