SILMA TTS: A Lightweight Open Bilingual Text to Speech Model
Blog post from HuggingFace
SILMA AI has introduced SILMA TTS v1, a lightweight, 150M-parameter bilingual text-to-speech model that supports both Arabic and English, leveraging the F5-TTS diffusion architecture. The model, which is open-source under the Apache 2.0 License, was meticulously pre-trained using a vast dataset of audio to ensure high-fidelity speech synthesis, instant voice cloning, and ultra-low latency, making it suitable for real-time applications. By optimizing the original F5-TTS model and focusing on Arabic language support, SILMA AI aims to address the scarcity of high-quality Arabic audio data and overcome previous licensing constraints, providing a valuable resource for both research and commercial purposes. The development involved significant architectural optimizations, extensive pretraining on high-quality data, and targeted fine-tuning for Arabic, enhancing text handling and audio quality. Users can easily implement the model via simple installation commands, with further resources available on platforms like GitHub and Hugging Face.