Cohere's recent exploration into quantization techniques for large language models (LLMs) has highlighted the challenges associated with scaling these models while maintaining efficiency. The study delves into the phenomenon of emergent properties, which make quantization difficult for larger models due to outlier dimensions in network hidden states. To address this, Cohere investigates the impact of various optimization hyper-parameters such as weight decay, gradient clipping, residual dropout, and data types like float16 and bfloat16 on post-quantization performance. Their findings indicate that certain pre-training choices, including the use of bfloat16, high weight decay, and gradient clipping, improve the efficiency of post-training quantization (PTQ). These insights were validated across models ranging from 410 million to 52 billion parameters, with Cohere's models demonstrating superior performance compared to OPT models, particularly in maintaining accuracy during quantization. The research underscores the influence of optimization choices and hardware on model performance post-quantization, suggesting that strategic pre-training can mitigate the challenges posed by emergent properties in large-scale LLMs.