Decentralized training of foundation models in heterogeneous environments

Company

Together AI

Date Published

June 2, 2023

Author

Binhang Yuan, Yongjun He, Jared Quincy Davis, Tianyi Zhang, Tri Dao, Beidi Chen, Percy Liang, Christopher Re, Ce Zhang

Word count

340

Language

English

Hacker News points

None

URL

www.together.ai/blog/decentralized-training-of-foundation-models-in-heterogeneous-environments

Summary

This paper presents a novel approach for training large foundation models in decentralized heterogeneous environments, where different computational "tasklets" are allocated to devices connected by slow networks. The authors propose a scheduling algorithm and formal cost model to optimize the allocation strategy, achieving significant speedup over prior state-of-the-art systems. Extensive experiments demonstrate that their approach can reduce training time by up to 4.8X compared to existing methods, while also providing efficient network compression. By leveraging decentralized and heterogeneous networks, this work aims to make large-scale foundation model training more accessible and cost-effective.