Company
Date Published
Author
Jue Wang, Binhang Yuan, Luka Rimanic, Yongjun He, Tri Dao, Beidi Chen, Christopher Re, Ce Zhang
Word count
234
Language
English
Hacker News points
None

Summary

CocktailSGD is a novel communication-efficient training framework designed to train large language models (LLMs) over slow networks, such as 500Mbps connections. This approach combines three distinct compression techniques - random sparsification, top-K sparsification, and quantization - to achieve much greater compression than individual techniques alone. Theoretical analysis justifies the benefit of this hybrid approach, while empirical results show that CocktailSGD achieves up to 117x compression in fine-tuning LLMs without compromising convergence. On a slow network, CocktailSGD only incurs a small slowdown compared to data center networks. The RedPajama-V2 Dataset is conceptualized as a foundation for creating high-quality datasets, and its use requires filtering out data using quality signals that accompany it, depending on the intended application.