Model-Preserving Adaptive Rounding with YAQA

Company

Together AI

Date Published

June 5, 2025

Author

Albert Tseng, Zhaofeng Sun, and Chris De Sa

Word count

2091

Language

English

Hacker News points

None

URL

www.together.ai/blog/yaqa

Summary

YAQA (Yet Another Quantization Algorithm) is a new weight-only LLM post-training quantization method that quantizes models to directly preserve the original model's outputs. YAQA achieves state-of-the-art performance on downstream tasks by reducing the KL divergence to the original model by over 30% compared to existing rounding algorithms. It uses a near-optimal Kronecker-factored approximation of each linear layer's Hessian with respect to the KL, which is then used to quantize models with theoretical guarantees. YAQA has been shown to outperform existing methods in experiments, including reducing the cost of training by 20%, increasing network compression by 117x, and achieving faster training times by 4x.