DramaBox TTS: Saving drama for the performance, not the security review
Blog post from Resemble AI
DramaBox is an innovative, prompt-driven text-to-speech (TTS) model developed by Resemble AI that aims to address the limitations of traditional TTS systems by allowing users to direct speech with natural language, similar to directing a human actor. Unlike conventional TTS models that struggle with expressiveness due to machine language parameters, DramaBox interprets plain language prompts to generate more human-like, dynamic performances by distinguishing between literal dialogue and performance directions. The model is highly versatile, capable of creating a wide tonal range and precise emotional control in audio outputs, while also embedding a watermark for ownership verification and compliance with emerging regulations like the EU AI Act. DramaBox is an English-only release, focusing on high-quality directable speech for applications such as game dialogue, audiobooks, and voice agents, with the advantage of proving audio provenance, making it suitable for scenarios where traditional flat TTS has been a bottleneck. The release is part of a broader initiative by Resemble AI to introduce a series of open-source TTS models, each designed to tackle specific challenges within the field.