Shipping a Trillion Parameters With a Hub Bucket: Delta Weight Sync in TRL
Blog post from HuggingFace
In a recent development, the process of asynchronous reinforcement learning (Async RL) has been made significantly more efficient by minimizing the data transfer between the trainer and inference engine. Traditionally, each training step required the entire model to be sent, which could be up to a terabyte for frontier models. However, it has been observed that between consecutive RL optimizer steps, over 98% of the weights remain unchanged, allowing for only the changed weights to be sent as a sparse safetensors file. This approach drastically reduces the payload size from gigabytes to mere megabytes. The implementation involves encoding the changes, uploading them to a Hugging Face Bucket, and fetching them with vLLM, which can operate independently on different servers or regions. This new method eliminates the need for shared clusters or complex networking setups, making Async RL more accessible and cost-effective while maintaining efficiency, especially for large-scale models.