CocktailSGD: Fine-tuning foundation models over 500Mbps networks

Company

Together AI

Date Published

April 24, 2023

Author

Jue Wang, Binhang Yuan, Luka Rimanic, Yongjun He, Tri Dao, Beidi Chen, Christopher Re, Ce Zhang

Word count

234

Language

English

Hacker News points

None

URL

www.together.ai/blog/cocktailsgd

Summary

CocktailSGD is a novel communication-efficient training framework designed to train large language models (LLMs) over slow networks, such as 500Mbps connections. This approach combines three distinct compression techniques - random sparsification, top-K sparsification, and quantization - to achieve much greater compression than individual techniques alone. Theoretical analysis justifies the benefit of this hybrid approach, while empirical results show that CocktailSGD achieves up to 117x compression in fine-tuning LLMs without compromising convergence. On a slow network, CocktailSGD only incurs a small slowdown compared to data center networks. The RedPajama-V2 Dataset is conceptualized as a foundation for creating high-quality datasets, and its use requires filtering out data using quality signals that accompany it, depending on the intended application.