Intriguing Properties of Quantization at Scale Paper

Company

Cohere

Date Published

June 29, 2023

Author

Multiple Authors

Word count

1655

Language

English

Hacker News points

None

URL

cohere.com/blog/intriguing-properties-of-quantization-at-scale

Summary

Cohere's recent exploration into quantization techniques for large language models (LLMs) has highlighted the challenges associated with scaling these models while maintaining efficiency. The study delves into the phenomenon of emergent properties, which make quantization difficult for larger models due to outlier dimensions in network hidden states. To address this, Cohere investigates the impact of various optimization hyper-parameters such as weight decay, gradient clipping, residual dropout, and data types like float16 and bfloat16 on post-quantization performance. Their findings indicate that certain pre-training choices, including the use of bfloat16, high weight decay, and gradient clipping, improve the efficiency of post-training quantization (PTQ). These insights were validated across models ranging from 410 million to 52 billion parameters, with Cohere's models demonstrating superior performance compared to OPT models, particularly in maintaining accuracy during quantization. The research underscores the influence of optimization choices and hardware on model performance post-quantization, suggesting that strategic pre-training can mitigate the challenges posed by emergent properties in large-scale LLMs.