Home / Companies / Deepinfra / Blog / Post Details
Content Deep Dive

DeepSeek V4 Pro: Model Overview, Features & Performance Guide

Blog post from Deepinfra

Post Details
Company
Date Published
Author
Deep
Word Count
1,108
Language
English
Hacker News Points
-
Summary

DeepSeek V4 Pro, a 1.6-trillion parameter Mixture-of-Experts model developed by DeepInfra, is designed for complex reasoning, software engineering, and agentic tasks, featuring a new architecture with hybrid attention and manifold-constrained hyper-connections to enhance efficiency and stability. Released alongside the lighter DeepSeek-V4-Flash variant, the model supports a 1 million token context window and employs mixed precision training to optimize memory use. It outperforms its predecessor, DeepSeek-V3.2, across various benchmarks, including MMLU, GSM8K, and HumanEval, while offering configurable reasoning modes to balance latency and analytical depth. Notably, despite its competitive performance, the model exhibits a high hallucination tendency when uncertain about answers. Available through the DeepInfra platform, V4 Pro operates under a usage-based pricing model, with Think Max mode notably more token-intensive yet significantly cheaper than comparable models. Developers can access and integrate the model, with options for self-hosting and API use, while considering monitoring token usage for resource-intensive applications.