DeepSeek-V4 is a new series of large language models from DeepSeek AI, featuring a 1 million-token context window and improved agentic capabilities. It includes two Mixture-of-Experts (MoE) models, DeepSeek-V4-Pro and DeepSeek-V4-Flash, designed for various tasks from complex reasoning to fast responses. The models also introduce a sparse attention architecture for enhanced efficiency.
| Fact | Detail |
|---|---|
| Released by | DeepSeek AI |
| Release date | |
| What it is | A series of large language models with a 1 million-token context window and enhanced agentic capabilities. |
| Who it is for | Developers, enterprises, and researchers working with AI agents, long-context tasks, and complex coding. |
| Where to get it | Hugging Face, DeepSeek API |
| Price | Not yet disclosed. |
- DeepSeek-V4 models offer a 1 million-token context window for extensive input processing.
- The series includes DeepSeek-V4-Pro (1.6T parameters) and DeepSeek-V4-Flash (284B parameters).
- A new sparse attention architecture significantly improves efficiency and reduces FLOPs.
- DeepSeek-V4-Pro requires 27% of single-token inference FLOPs compared to DeepSeek-V3.2 at 1M tokens.
- The models demonstrate stronger reasoning and agentic capabilities for complex tasks.
- DeepSeek-V4 provides a 1 million-token context window, enabling AI agents to handle significantly longer interactions [3, 4].
- The models feature a sparse attention architecture, reducing computational costs for long contexts [3, 5].
- DeepSeek-V4-Pro uses 27% of the single-token inference FLOPs of DeepSeek-V3.2 at 1M tokens [1, 5].
- DeepSeek-V4-Pro has 1.6 trillion parameters (49 billion activated), while V4-Flash has 284 billion parameters (13 billion activated) [2].
- Both DeepSeek-V4-Pro and V4-Flash support dual modes (Thinking / Non-Thinking) for varied agentic tasks [6].
- The models are designed for enhanced reasoning, coding, and multi-step agent workflows [4, 8].
What is DeepSeek-V4
DeepSeek-V4 is a series of large language models developed by DeepSeek AI, distinguished by its 1 million-token context window [3, 4]. This capability allows the models to process and retain extensive amounts of information within a single interaction. The series includes two main models: DeepSeek-V4-Pro and DeepSeek-V4-Flash [2]. DeepSeek-V4-Pro is a Mixture-of-Experts (MoE) model with 1.6 trillion parameters, activating 49 billion parameters for inference [2]. DeepSeek-V4-Flash is a smaller MoE model with 284 billion parameters, activating 13 billion parameters [2]. Both models are designed to support advanced agentic capabilities and complex reasoning tasks [4, 8]. They also incorporate a new sparse attention architecture for improved efficiency [3].
What is new vs the previous version
DeepSeek-V4 introduces several key advancements over previous versions, particularly in context window size and efficiency.
- Context Window: DeepSeek-V4 offers a 1 million-token context window, a significant increase [3, 4].
- Efficiency: DeepSeek-V4-Pro requires 27% of the single-token inference FLOPs compared to DeepSeek-V3.2 at 1 million tokens [1, 5]. It also uses 10% of the key-value (KV) cache at 1 million tokens compared to V3.2 [5].
- Architecture: DeepSeek-V4 features a new sparse attention architecture [3].
- Agent Capabilities: The V4 models have stronger agentic capabilities [3, 4].
- Model Variants: DeepSeek-V4 includes DeepSeek-V4-Pro (1.6T parameters) and DeepSeek-V4-Flash (284B parameters) [2].
How does DeepSeek-V4 work
DeepSeek-V4 operates using a Mixture-of-Experts (MoE) architecture, which allows for efficient scaling of model parameters [2].
- Context Processing: The models can process up to 1 million tokens within their context window [3, 4]. This enables them to handle long tool-use trajectories and extensive documents [1, 7].
- Sparse Attention: A new sparse attention architecture is implemented to manage the computational cost of long sequences [3]. This design reduces the FLOPs required for single-token inference and the size of the KV cache [1, 5].
- Parameter Activation: In the MoE setup, DeepSeek-V4-Pro has 1.6 trillion parameters, but only 49 billion are activated during inference [2]. DeepSeek-V4-Flash has 284 billion parameters, with 13 billion activated [2].
- Dual Modes: Both DeepSeek-V4-Pro and DeepSeek-V4-Flash support dual modes: Thinking and Non-Thinking [6]. This allows for flexible use depending on the task’s complexity and desired response speed.
- Agentic Workflows: The architecture is optimized for agentic coding and multi-step tasks, where tool results are appended to the context [1, 6].
Benchmarks and evidence
DeepSeek-V4 demonstrates improved performance and efficiency based on internal testing and comparisons.
| Metric / Feature | DeepSeek-V4-Pro | DeepSeek-V3.2 (for comparison) | Source |
|---|---|---|---|
| Context Window | 1 million tokens | Not yet disclosed. | [3, 4] |
| Single-token Inference FLOPs (at 1M tokens) | 27% of V3.2 | 100% (baseline) | [1, 5] |
| KV Cache Size (at 1M tokens) | 10% of V3.2 | 100% (baseline) | [5] |
| Total Parameters | 1.6 trillion | Not yet disclosed. | [2] |
| Activated Parameters | 49 billion | Not yet disclosed. | [2] |
| Performance vs. GPT-5.2 / Gemini 3.0-Pro | Surpasses some | Not yet disclosed. | [4] |
| Performance vs. GPT-5.4 | Slightly below | Not yet disclosed. | [4] |
Who should care
DeepSeek-V4’s capabilities make it relevant for various stakeholders in the AI ecosystem.
Builders
Builders can leverage DeepSeek-V4 for developing advanced AI agents and long-context applications [1, 7]. The 1 million-token context window supports complex tool-use trajectories and extensive coding tasks [1, 8]. The efficiency improvements reduce computational overhead for long sequences [1, 5]. Builders can access the models via Hugging Face and DeepSeek API [6].
Enterprise
Enterprises can utilize DeepSeek-V4 for tasks requiring deep reasoning and processing large documents [4, 7]. This includes applications like automated customer support, legal document analysis, and complex data extraction. The improved agentic capabilities can streamline multi-step business processes [3].
End users
End users may experience more capable AI assistants and tools that can understand and remember more context [4]. This could lead to more nuanced and helpful interactions in applications powered by DeepSeek-V4. For example, coding assistants could maintain context across hundreds of commands [1].
Investors
Investors should note DeepSeek-V4’s efficiency gains and competitive performance against other leading models [4, 5]. The focus on long-context and agentic capabilities addresses critical needs in the evolving AI market. DeepSeek AI’s progress in model efficiency could indicate strong future market positioning.
How to use DeepSeek-V4 today
DeepSeek-V4 is available for developers to integrate into their applications.
- Access Models: The models, DeepSeek-V4-Pro and DeepSeek-V4-Flash, are available on Hugging Face [2].
- API Integration: Developers can use the DeepSeek API, which supports OpenAI ChatCompletions and Anthropic APIs [6].
- Update Model Name: To use the new models, update the model parameter to
deepseek-v4-proordeepseek-v4-flashin API calls [6]. Thebase_urlremains the same. - Utilize Dual Modes: Experiment with the Thinking and Non-Thinking modes for different task requirements [6]. Thinking mode is suitable for deeper reasoning, while Non-Thinking mode is for faster responses [6].
- Explore Agentic Coding: DeepSeek-V4-Pro is designed for agentic coding and complex workflows [6, 8]. Developers can test its capabilities on SWE-bench tasks or multi-step terminal sessions [1].
DeepSeek-V4 vs competitors
DeepSeek-V4 positions itself with a focus on long context and efficiency, comparing favorably against some established models.
| Feature | DeepSeek-V4 | GPT-5.2 / Gemini 3.0-Pro | GPT-5.4 |
|---|---|---|---|
| Context Window | 1 million tokens [3, 4] | Not yet disclosed. | Not yet disclosed. |
| Efficiency (FLOPs at 1M tokens) | 27% of DeepSeek-V3.2 [1, 5] | Not yet disclosed. | Not yet disclosed. |
| Agentic Capabilities | Stronger, designed for complex workflows [3, 8] | Not yet disclosed. | Not yet disclosed. |
| Reasoning Performance | Surpasses some GPT-5.2 / Gemini 3.0-Pro, slightly below GPT-5.4 [4] | Varies (some surpassed by V4) [4] | Slightly above V4 [4] |
| Architecture | MoE, sparse attention [2, 3] | Not yet disclosed. | Not yet disclosed. |
Risks, limits, and myths
- Needle-in-a-Haystack Problem: Aggressive compression of KV cache, while efficient, could potentially increase the risk of missing critical information in long contexts [5].
- Computational Cost at Scale: While efficient compared to previous versions, running 1 million-token contexts still demands significant computational resources, particularly for inference [1].
- Generalization of Agentic Capabilities: While strong for specific tasks, the generalization of agentic capabilities across all possible scenarios requires further testing and development [1].
- Benchmark Superiority: Claims of surpassing other models should be interpreted with caution, as performance can vary significantly across different benchmarks and real-world applications [4].
- Open-Source vs. Proprietary: DeepSeek-V4 offers open-source weights for some components, but the full extent of open-source availability for all features is not fully detailed [3].
FAQ
- What is the maximum context window for DeepSeek-V4? The maximum context window for DeepSeek-V4 models is 1 million tokens [3, 4].
- Which DeepSeek-V4 models are available? DeepSeek-V4 includes DeepSeek-V4-Pro and DeepSeek-V4-Flash [2].
- How many parameters does DeepSeek-V4-Pro have? DeepSeek-V4-Pro has 1.6 trillion parameters, with 49 billion activated during inference [2].
- How efficient is DeepSeek-V4 compared to DeepSeek-V3.2? DeepSeek-V4-Pro requires 27% of the single-token inference FLOPs and 10% of the KV cache of DeepSeek-V3.2 at 1 million tokens [1, 5].
- Can DeepSeek-V4 be used for coding? Yes, DeepSeek-V4-Pro is designed for agentic coding and complex agent workflows [6, 8].
- Does DeepSeek-V4 support dual modes? Yes, both DeepSeek-V4-Pro and DeepSeek-V4-Flash support Thinking and Non-Thinking modes [6].
- Where can I access DeepSeek-V4? DeepSeek-V4 is available on Hugging Face and through the DeepSeek API [2, 6].
- Is DeepSeek-V4 open source? DeepSeek-V4 offers open-source weights [3].
Glossary
- Context Window
- The maximum number of tokens an AI model can process and consider at one time [3, 4].
- Mixture-of-Experts (MoE)
- An AI model architecture where different “expert” sub-networks specialize in processing different parts of the input [2].
- Sparse Attention
- An attention mechanism that computes attention scores for only a subset of input tokens, improving efficiency for long sequences [3].
- FLOPs (Floating Point Operations Per Second)
- A measure of computational performance, indicating the number of floating-point operations a processor can perform per second [1].
- KV Cache (Key-Value Cache)
- A memory component that stores previously computed key and value states in transformer models, reducing redundant calculations [1].
- Agentic Capabilities
- The ability of an AI model to perform multi-step tasks, interact with tools, and adapt its behavior based on feedback [3, 8].
Sources
- DeepSeek-V4: a million-token context that agents can actually use — Hugging Face
- deepseek-ai/DeepSeek-V4-Pro · Hugging Face
- DeepSeek-V4 Preview: Million-Token Context & Agent Upgrades – Atlas Cloud Blog
- DeepSeek rolls out V4 update with 1 million-token context and stronger reasoning — TechXplore
- DeepSeek V4 Squeezes Million-Token Context Into 10% of V3.2’s Memory, Escalating China’s AI Efficiency War With OpenAI — Wccftech
- DeepSeek V4 Preview Release | DeepSeek API Docs
- Build with DeepSeek V4 Using NVIDIA Blackwell and GPU-Accelerated Endpoints | NVIDIA Technical Blog
- r/openclaw on Reddit: DeepSeek V4 hands-on test: 1M-token context + agent coding — is it actually good?