Home / Companies / Baseten / Blog / Post Details
Content Deep Dive

How we built the fastest Kimi K2.5 on Artificial Analysis

Blog post from Baseten

Post Details
Company
Date Published
Author
Tri Dao 3 others
Word Count
834
Language
English
Hacker News Points
-
Summary

Kimi K2.5 is a frontier-grade, open-source model boasting one trillion parameters, making it the largest of its kind on the market. Recently benchmarked by Artificial Analysis, it achieved remarkable speeds of over 340 tokens per second, emphasizing its efficiency in handling reasoning tasks through speculative decoding. Critical to this performance is the use of NVIDIA Blackwell GPUs, optimized through an INT4 to NVFP4 conversion for enhanced latency and throughput, and a custom-built EAGLE-3 speculator model trained on synthetic queries. These innovations enable Kimi K2.5 to outperform other models like Claude Opus, offering an 8x cost reduction and 4.5x speed increase, making it a compelling choice for applications in code generation and agent tasks. The model's success is attributed to a combination of inference optimizations, including the Baseten Inference Stack and KV-aware routing, which collectively ensure minimal latency and high efficiency for real-world applications.