DeepSeek V4 in the wild, and how to run it on Runpod
Blog post from RunPod
V4 is a new Mixture-of-Experts model featuring an innovative attention stack that combines Compressed Sparse Attention and Heavily Compressed Attention to significantly reduce computational costs and memory usage compared to its predecessor, V3.2. It introduces several optimizations, such as Manifold-Constrained Hyper-Connections and the Muon optimizer, resulting in a more compact model with improved performance in specific domains like competitive programming and formal math. Despite its strengths, V4 is limited to text processing, lacking multimodal capabilities seen in models like Gemini or Opus, but this can be mitigated by incorporating auxiliary models. V4's ease of integration into existing infrastructures, like Claude Code and OpenCode, and its cost-effectiveness make it an attractive option for workflows that were previously uneconomical at frontier-lab pricing. Deployment options on platforms like Runpod are straightforward, though the model's performance in long-context retrieval at its upper limits and its evolving nature as a preview version should be noted.