🌳 QAT: The Art of Growing a Bonsai Model

Company

HuggingFace

Date Published

Nov. 9, 2025

Author

Yi Cui

Word count

1267

Language

Hacker News points

None

URL

huggingface.co/blog/onekq/qat-bonsai

Summary

Kimi K2 Thinking introduces a groundbreaking approach to neural network models by implementing Quantization-Aware Training (QAT) to effectively run a 1 trillion parameter reasoning model at INT4 precision, achieving state-of-the-art performance with doubled generation speed and "lossless" accuracy. Unlike Post-Training Quantization (PTQ), which often results in significant accuracy loss due to its naive approach of compressing a pre-trained model, QAT integrates quantization into the training process, allowing the model to learn robustness to quantization from the outset. This process involves strategic placement of 16 quantization levels, focusing precision where it is most needed, which is particularly beneficial for the Mixture of Experts (MoE) components of the model, where the majority of parameters reside. By training the model to adapt to INT4 precision naturally, QAT prevents the accumulation of errors that typically occur in long reasoning sequences, demonstrating that neural networks can maintain high performance even with highly reduced precision levels.