Fireworks emphasizes the importance of tailored quantization techniques for optimizing large language models (LLM) in various use cases, highlighting the role of Kullback-Leibler (KL) divergence as a precise metric for evaluating quantization quality. The company collaborates with client enterprises to achieve a balance between speed, cost, and quality, aiming to place their models favorably on the Pareto curve of these factors. They advise against using task-based metrics like MMLU for assessing quantization quality due to their noise and lack of precision, advocating instead for divergence metrics that more accurately reflect the effects of quantization on model outputs. Fireworks' approach has been well-received by clients such as Superhuman and Cursor, who report improved performance and cost efficiency. Their commitment to innovative quantization solutions is exemplified in the deployment of Llama 3.1 models, which offer significant improvements in speed and cost efficiency compared to competitors.