Announcing Sonic: a low‑latency voice model for lifelike speech

Post Details

Company

Cartesia

Date Published

May 31, 2024

Author

Karan Goel

Word Count

867

Language

English

Hacker News Points

-

Source URL

cartesia.ai/blog/sonic

Summary

Cartesia, a company founded to create long-lived real-time intelligence for every device, is introducing a groundbreaking approach utilizing state space models (SSM) to achieve this vision. Their latest release, Sonic, is a low-latency voice model capable of generating lifelike speech, representing a step towards a future where AI can efficiently process any modality in real-time across various devices. By overcoming limitations of current models, such as high latency and cost, Cartesia's SSMs, including S4 and Mamba, are being widely adopted, influencing new advancements in language, vision, robotics, and biology. The company's focus is on making intelligence ubiquitous, efficient, and accessible, starting with real-time conversational AI that can understand and interact with users seamlessly. Demonstrating significant improvements in model quality, inference speed, and throughput over traditional Transformer models, Sonic is optimized for low latency and high throughput, available via a web playground and API for applications in customer support, entertainment, and more. Cartesia aims to expand these capabilities to enable real-time multimodal experiences on any device, with plans to support various modalities and open-source releases in the near future.