The Five Eras of KVCache
Blog post from Modular
Key–Value Cache (KVCache) has become a crucial component in modern large language model (LLM) serving systems, evolving significantly through several eras. Initially, deep learning models like ResNet and YOLO did not require KVCaches, but with the advent of transformers in 2017, a need arose for continuous KVCaches to manage attention states efficiently. In 2023, PagedAttention improved memory utilization by allocating KV in fixed-size pages, becoming the standard for LLM serving. The rise of heterogeneous KVCaches in 2024 addressed the diverse needs of modern multimodal models, leading to more complex cache management systems. By 2025, distributed KVCaches began addressing the challenges of scaling LLMs across multiple nodes, while the latest developments focus on unifying hybrid KV memory systems to improve composability and efficiency. These advancements highlight the growing complexity of KVCache management, necessitating innovations across AI infrastructure layers to accommodate new models and optimizations.