Magpie Speech — Applying an LLM Data Synthesis Method to an LLM-Based TTS Model to Synthesize a Speech Dataset
Blog post from HuggingFace
Magpie, a data synthesis method originally designed for large language model (LLM) instruction tuning, has been applied to the Orpheus-TTS model to create a synthetic speech dataset comprised of approximately 125,000 samples. This approach leverages the autoregressive nature of LLM-based text-to-speech (TTS) models, allowing for the reuse of LLM data-synthesis techniques with minimal adjustments. The process involves generating text instructions and corresponding audio tokens, which are then decoded into waveforms. The synthesized data undergoes a series of filtering steps, including deduplication, transcription accuracy checks, and audio quality assessments, to ensure high-quality outputs. While the downstream utility of this dataset in training models has not yet been validated, the methodology demonstrates the potential of LLM-style data generation techniques to expand the scope and quality of synthetic speech datasets.