MaxText Expands Post-Training Capabilities: Introducing SFT and RL on Single-Host TPUs

Post Details

Company

Google Cloud

Date Published

April 16, 2026

Author

Wei Wei, and Weiren Yu

Word Count

493

Language

English

Hacker News Points

-

Source URL

developers.googleblog.com/maxtext-expands-post-training-capabilities-introducing-sft-and-rl-on-single-host-tpus

Summary

In the evolving field of large language models (LLMs), post-training techniques are crucial to enhance pre-trained models into specialized assistants or reasoning engines. MaxText introduces new post-training features, including Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL), designed for single-host TPU configurations like v5p-8 and v6e-8, utilizing the JAX library and Tunix for efficiency. SFT allows users to fine-tune models with labeled datasets using seamless integration with Hugging Face datasets and flexible checkpoints, while RL supports advanced reasoning capabilities with algorithms such as Group Relative Policy Optimization (GRPO) and Group Sequence Policy Optimization (GSPO), optimizing training stability and efficiency. These advancements offer a scalable, high-performance path for developers to refine their models, with the potential for transitioning to multi-host configurations for larger models and datasets in the future.