Home / Companies / Resemble AI / Blog / Post Details
Content Deep Dive

Chatterbox Multilingual v3: TTS with embedded watermarking for 25 languages

Blog post from Resemble AI

Post Details
Company
Date Published
Author
-
Word Count
3,000
Language
English
Hacker News Points
-
Summary

Chatterbox Multilingual v3 is a significant release in the evolution of multilingual text-to-speech models, supporting 25 languages and featuring embedded PerTh watermarking to meet regulatory requirements and enhance trustworthiness. This version retains the 0.5B Llama-based backbone of its predecessor but improves on speaker similarity, reduces hallucination rates, and enhances conversational naturalness through refined data training and language-specific specialization. The PerTh watermarking system, designed to be imperceptible yet robust against manipulations, aligns with upcoming regulations like the EU AI Act, ensuring audio provenance and reducing reliance on listener-level detection, which has become unreliable. The model's Character Error Rate (CER) evaluations highlight its strengths and weaknesses across different languages, with Italian and German performing exceptionally well, while Korean and Vietnamese require further data enhancements. The release includes general-purpose and Single-Language Pack models to address specific language demands, reflecting deployment insights that favor dedicated models for high-volume languages. Chatterbox v3 is optimized for enterprise-scale deployment via NVIDIA NIM, offering significant improvements in latency and throughput, with a focus on continuous enhancement in language support, subjective quality metrics, and watermarking capabilities.