Home / Companies / HuggingFace / Blog / Post Details
Content Deep Dive

Forge: Scalable Agent RL Framework and Algorithm

Blog post from HuggingFace

Post Details
Company
Date Published
Author
MiniMax, Hyn, zhi zhang, Jiayuan Song, Da Chen, xkc, Yaoyao, kennyKK, and zpysky1125
Word Count
3,387
Language
-
Hacker News Points
-
Summary

Balancing system throughput, training stability, and agent flexibility presents a significant challenge in scaling reinforcement learning (RL) for complex, real-world applications. The Forge RL framework addresses this "impossible triangle" by incorporating a flexible system architecture, algorithmic design, and optimized asynchronous scheduling to enhance training-inference efficiency. The system supports arbitrary agent scaffolds through standardized interaction protocols, enabling the large-scale training of the MiniMax M2.5 model, which navigated over a hundred thousand real-world agent scaffolds and achieved significant improvements in reward convergence and model capabilities. Forge's architecture decouples agents from training infrastructure, integrates Context Management (CM) as a functional action, and supports both white-box and black-box agent architectures to ensure robust generalization across diverse environments. The framework also employs a hybrid scheduling strategy, Prefix Tree Merging, and extreme inference acceleration techniques to optimize training and inference processes. The use of the CISPO algorithm and a composite reward framework enhances credit assignment and optimization stability, ultimately advancing the mission of "Intelligence with Everyone."