Company
Date Published
Author
Philip Kiely
Word count
1350
Language
English
Hacker News points
None

Summary

Canopy Labs has selected Baseten as its preferred inference provider for Orpheus TTS models. This partnership enables developers to use the high-performance Orpheus model in production, with optimized performance and scalability on a single H100 MIG GPU. The collaboration between Canopy Labs and Baseten resulted in the creation of the world's highest-performance Orpheus inference server based on NVIDIA's TensorRT-LLM. This allows for 16 concurrent live connections with variable traffic, 24 concurrent live connections with stable traffic, and up to 60x real-time factor for bulk jobs. The client code example provided by Baseten supports session re-use, reducing overhead and improving TTFB performance. With this partnership, developers can now build fast, configurable, and cost-efficient voice agents using Orpheus TTS models.