Forge: Scalable Agent RL Framework and Algorithm
Blog post from HuggingFace
Balancing system throughput, training stability, and agent flexibility presents a significant challenge in scaling reinforcement learning (RL) for complex, real-world applications. The Forge RL framework addresses this "impossible triangle" by incorporating a flexible system architecture, algorithmic design, and optimized asynchronous scheduling to enhance training-inference efficiency. The system supports arbitrary agent scaffolds through standardized interaction protocols, enabling the large-scale training of the MiniMax M2.5 model, which navigated over a hundred thousand real-world agent scaffolds and achieved significant improvements in reward convergence and model capabilities. Forge's architecture decouples agents from training infrastructure, integrates Context Management (CM) as a functional action, and supports both white-box and black-box agent architectures to ensure robust generalization across diverse environments. The framework also employs a hybrid scheduling strategy, Prefix Tree Merging, and extreme inference acceleration techniques to optimize training and inference processes. The use of the CISPO algorithm and a composite reward framework enhances credit assignment and optimization stability, ultimately advancing the mission of "Intelligence with Everyone."