DeepSeek-V4: a million-token context that agents can actually use
Blog post from HuggingFace
DeepSeek-V4 introduces significant advancements in handling large context lengths, making it a strong candidate for agentic tasks through its innovative design and efficient use of resources. Released in April 2026, it features two main models: DeepSeek-V4-Pro and DeepSeek-V4-Flash, both supporting a 1M-token context window. The architecture leverages Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) to optimize performance by reducing the KV cache and inference FLOPs, enabling faster execution on existing hardware. Additionally, post-training decisions enhance agent workflows by retaining reasoning across user interactions and introducing a robust tool-call schema. DeepSeek-V4's agent performance is competitive, particularly in extended tasks, as evidenced by its benchmark results. The infrastructure, including the DeepSeek Elastic Compute (DSec), supports efficient training and execution, contributing to its effectiveness in real-world applications.