Scaling OpenEnv: From Free Usage to Thousands of Concurrent Environments
Blog post from HuggingFace
OpenEnv, a collaborative effort by Meta, Unsloth, and Hugging Face, aims to standardize agent execution environments for reinforcement learning (RL) by addressing the bottleneck of environment throughput in post-training stages. This initiative allows for scaling environments using free tools like Hugging Face Spaces, which can handle up to 128 concurrent sessions, and further scaling to multi-node clusters supporting 16,384 sessions. OpenEnv provides a WebSocket interface for efficient concurrent session management, contrasting with the traditional HTTP interface that requires separate containers per session. The document outlines scaling strategies and benchmarks across various infrastructure configurations, such as local Docker, SLURM nodes, and HF Spaces, emphasizing the importance of high per-core efficiency to manage costs effectively. While HF Spaces offer a practical starting point for single-GPU training and evaluations, local Docker deployments provide better efficiency for up to 2,048 concurrent sessions on an 8-core machine. For large-scale experiments, multi-node clusters with Envoy load balancing are recommended, enabling scalability to thousands of parallel rollouts, essential for teams operating at a laboratory or corporate level.