Forge: Scalable Agent RL Framework and Algorithm

Post Details

Company

Hugging Face

Date Published

Feb. 13, 2026

Author

MiniMax, Hyn, zhi zhang, Jiayuan Song, Da Chen, xkc, Yaoyao, kennyKK, and zpysky1125

Word Count

3,387

Company Posts That Month

55

Language

-

Hacker News Points

-

Post removed?

No

Source URL

huggingface.co/blog/MiniMax-AI/forge-scalable-agent-rl-framework-and-algorithm

Summary

Balancing system throughput, training stability, and agent flexibility presents a significant challenge in scaling reinforcement learning (RL) for complex, real-world applications. The Forge RL framework addresses this "impossible triangle" by incorporating a flexible system architecture, algorithmic design, and optimized asynchronous scheduling to enhance training-inference efficiency. The system supports arbitrary agent scaffolds through standardized interaction protocols, enabling the large-scale training of the MiniMax M2.5 model, which navigated over a hundred thousand real-world agent scaffolds and achieved significant improvements in reward convergence and model capabilities. Forge's architecture decouples agents from training infrastructure, integrates Context Management (CM) as a functional action, and supports both white-box and black-box agent architectures to ensure robust generalization across diverse environments. The framework also employs a hybrid scheduling strategy, Prefix Tree Merging, and extreme inference acceleration techniques to optimize training and inference processes. The use of the CISPO algorithm and a composite reward framework enhances credit assignment and optimization stability, ultimately advancing the mission of "Intelligence with Everyone."

Trends Found in this Post

Trend	Post Mentions	Total Month Mentions	Posts	Companies	MoM
LLM	3	5,138	781	181	+34%
MCP	2	3,346	363	139	+19%
Multi-agent systems	2	380	114	51	-10%
Reinforcement learning	1	122	54	33	-15%

Use This Data

Use this post, company, and trend context to find content marketing opportunities, perform competitive analysis, or address product feature gaps via the Plushcap MCP server or the Plushcap API.