How we built the fastest Kimi K2.5 on Artificial Analysis

Post Details

Company

Baseten

Date Published

Feb. 12, 2026

Author

Tri Dao 3 others

Word Count

834

Language

English

Hacker News Points

-

Source URL

www.baseten.co/blog/how-we-built-the-fastest-kimi-k2-5-on-artificial-analysis

Summary

Kimi K2.5 is a frontier-grade, open-source model boasting one trillion parameters, making it the largest of its kind on the market. Recently benchmarked by Artificial Analysis, it achieved remarkable speeds of over 340 tokens per second, emphasizing its efficiency in handling reasoning tasks through speculative decoding. Critical to this performance is the use of NVIDIA Blackwell GPUs, optimized through an INT4 to NVFP4 conversion for enhanced latency and throughput, and a custom-built EAGLE-3 speculator model trained on synthetic queries. These innovations enable Kimi K2.5 to outperform other models like Claude Opus, offering an 8x cost reduction and 4.5x speed increase, making it a compelling choice for applications in code generation and agent tasks. The model's success is attributed to a combination of inference optimizations, including the Baseten Inference Stack and KV-aware routing, which collectively ensure minimal latency and high efficiency for real-world applications.