Company
Date Published
Author
Albert Tseng, Zhaofeng Sun, and Chris De Sa
Word count
2091
Language
English
Hacker News points
None

Summary

YAQA (Yet Another Quantization Algorithm) is a new weight-only LLM post-training quantization method that quantizes models to directly preserve the original model's outputs. YAQA achieves state-of-the-art performance on downstream tasks by reducing the KL divergence to the original model by over 30% compared to existing rounding algorithms. It uses a near-optimal Kronecker-factored approximation of each linear layer's Hessian with respect to the KL, which is then used to quantize models with theoretical guarantees. YAQA has been shown to outperform existing methods in experiments, including reducing the cost of training by 20%, increasing network compression by 117x, and achieving faster training times by 4x.