The AQ-SGD algorithm provides a method for reducing communication bottlenecks in decentralized training by compressing activations, which can lead to significant speedups without compromising model quality. The algorithm is designed to work with pipeline parallelism and has been shown to achieve competitive convergence rates with vanilla SGD. Theoretical analysis shows that the convergence rate of AQ-SGD is O(1/√T) for non-convex objectives under standard assumptions. Empirical studies demonstrate that AQ-SGD can tolerate aggressive quantization without compromising model quality, achieving up to 4.3x speedup over no-compression baseline and 8.5x when combined with QuantizedAdam. Future directions include exploring the effectiveness of AQ-SGD in pre-train workflows and optimizing precision and scheduling jointly.