LLaSA has become a prominent framework for LLM-based speech synthesis, and recent efforts have focused on enhancing its prosody and expressiveness through Reinforcement Learning, specifically using Generative Reward Policy Optimization (GRPO). This approach shifts away from traditional maximum likelihood estimation, which often results in flat prosody, by training the model to prioritize qualities such as clarity, expressiveness, and rhythm. The GRPO training pipeline involves generating candidate outputs, scoring them using a reward model that combines word error rate and negative log-likelihood, and adjusting model parameters to favor high-reward sequences. Initial results indicate that GRPO significantly improves semantic consistency and the naturalness of synthesized speech, although speaker similarity gains are inconsistent, and some perceptual aspects of speech remain challenging to capture. Future work aims to develop a learned prosody reward model and incorporate human feedback to further enhance emotional quality, with the ultimate goal of achieving controllable, emotionally expressive multilingual speech.