NeurIPS 2022: Overcoming communication bottlenecks for decentralized training (2/2)

Company

Together AI

Date Published

Dec. 5, 2022

Author

Together

Word count

2188

Language

English

Hacker News points

None

URL

www.together.ai/blog/neurips-2022-overcoming-communication-bottlenecks-for-decentralized-training-2

Summary

The AQ-SGD algorithm provides a method for reducing communication bottlenecks in decentralized training by compressing activations, which can lead to significant speedups without compromising model quality. The algorithm is designed to work with pipeline parallelism and has been shown to achieve competitive convergence rates with vanilla SGD. Theoretical analysis shows that the convergence rate of AQ-SGD is O(1/√T) for non-convex objectives under standard assumptions. Empirical studies demonstrate that AQ-SGD can tolerate aggressive quantization without compromising model quality, achieving up to 4.3x speedup over no-compression baseline and 8.5x when combined with QuantizedAdam. Future directions include exploring the effectiveness of AQ-SGD in pre-train workflows and optimizing precision and scheduling jointly.