Llasa Goes RL: Training LLaSA with GRPO for Improved Prosody and Expressiveness

Company

HuggingFace

Date Published

Nov. 5, 2025

Author

Steven Zheng

Word count

1120

Language

Hacker News points

None

URL

huggingface.co/blog/Steveeeeeeen/llasa-grpo

Summary

LLaSA has become a prominent framework for LLM-based speech synthesis, and recent efforts have focused on enhancing its prosody and expressiveness through Reinforcement Learning, specifically using Generative Reward Policy Optimization (GRPO). This approach shifts away from traditional maximum likelihood estimation, which often results in flat prosody, by training the model to prioritize qualities such as clarity, expressiveness, and rhythm. The GRPO training pipeline involves generating candidate outputs, scoring them using a reward model that combines word error rate and negative log-likelihood, and adjusting model parameters to favor high-reward sequences. Initial results indicate that GRPO significantly improves semantic consistency and the naturalness of synthesized speech, although speaker similarity gains are inconsistent, and some perceptual aspects of speech remain challenging to capture. Future work aims to develop a learned prosody reward model and incorporate human feedback to further enhance emotional quality, with the ultimate goal of achieving controllable, emotionally expressive multilingual speech.