DeepSeek-V4 Pro Arrives on Together AI with 1M Context

DeepSeek-V4 Pro is a large-scale Mixture-of-Experts (MoE) language model now available through Together AI, DeepSeek API, DeepInfra, and OpenRouter. It features a 1M-token context window and 1.6 trillion total parameters, with 49 billion active parameters. The model supports multiple reasoning modes and is designed for complex tasks like code agents, document intelligence, and research synthesis, offering competitive pricing for its advanced capabilities.

Category	Detail
Released by	Together AI
Release date	Not yet disclosed.
What it is	A large-scale Mixture-of-Experts (MoE) language model with a 1M-token context window.
Who it is for	Developers and enterprises needing powerful AI for long-context reasoning, code agents, and document intelligence.
Where to get it	Together AI API, DeepSeek API, DeepInfra, OpenRouter.
Price	Input: $2.10/1M tokens (Together AI), $0.435/1M tokens (OpenRouter); Output: $4.40/1M tokens (Together AI), $0.87/1M tokens (OpenRouter).

DeepSeek-V4 Pro is a Mixture-of-Experts (MoE) model.
It features 1.6 trillion total parameters.
The model has 49 billion active parameters.
DeepSeek-V4 Pro supports a 1M-token context window.
It offers controllable reasoning modes.

What is DeepSeek-V4 Pro
What is new vs the previous version
How does DeepSeek-V4 Pro work
Benchmarks and evidence
Who should care
How to use DeepSeek-V4 Pro today
DeepSeek-V4 Pro vs competitors
Risks, limits, and myths
FAQ
Glossary
Next Step

DeepSeek-V4 Pro is a large-scale Mixture-of-Experts (MoE) model.
It features a 1M-token context window for extensive input processing.
The model has 1.6 trillion total parameters and 49 billion active parameters.
DeepSeek-V4 Pro offers controllable reasoning modes, including “Thinking” and “Non-think.”
It is available via Together AI, DeepSeek API, DeepInfra, and OpenRouter.
Pricing includes cached-input options for long-context workloads.
DeepSeek-V4 Pro supports use cases like code agents and document intelligence.

What is DeepSeek-V4 Pro

DeepSeek-V4 Pro is a large-scale Mixture-of-Experts (MoE) language model developed by DeepSeek [2, 7]. It features 1.6 trillion total parameters, with 49 billion active parameters [2, 7]. The model supports a 1M-token context window, enabling it to process significantly longer inputs [2, 5, 6, 7]. DeepSeek-V4 Pro offers controllable reasoning modes for diverse applications [5]. It is designed to outperform leading US models in speed and quality [6].

What is new vs the previous version

DeepSeek-V4 Pro introduces a significantly larger context window and enhanced reasoning capabilities compared to its predecessors.

Context Window: DeepSeek-V4 Pro supports a 1M-token context window [2, 5, 6, 7]. This allows for processing much longer inputs than previous versions [6].
Reasoning Modes: The model includes dual modes, “Thinking” and “Non-Thinking” [5]. These modes enable controllable reasoning for different tasks [5].
Parameter Count: DeepSeek-V4 Pro is an MoE model with 1.6 trillion total parameters [2, 7]. It utilizes 49 billion active parameters [2, 7].
Agent Capabilities: DeepSeek-V4 Pro features stronger agent capabilities [4]. This enhances its performance in complex, multi-step tasks [4].

How does DeepSeek-V4 Pro work

DeepSeek-V4 Pro operates as a Mixture-of-Experts (MoE) model, leveraging its architecture for efficient processing.

Expert Activation: DeepSeek-V4 Pro is an MoE model with 1.6 trillion total parameters [2, 7]. Only 49 billion parameters are actively used during inference [2, 7].
Context Processing: The model utilizes a 1M-token context window [2, 5, 6, 7]. This allows it to process and understand very long inputs [6].
Reasoning Modes: DeepSeek-V4 Pro supports “Thinking” and “Non-think” reasoning modes [5]. Users can select the appropriate mode for their task [5].
API Access: Users can access DeepSeek-V4 Pro via Together AI APIs using the endpoint deepseek-ai/DeepSeek-V4-Pro [1]. Authentication requires a Together AI API key [1].
API Compatibility: The model supports OpenAI ChatCompletions and Anthropic APIs [5]. This ensures broad compatibility with existing tools [5].

Benchmarks and evidence

DeepSeek-V4 Pro is designed to deliver high performance across various benchmarks.

Feature	Specification	Source
Total Parameters	1.6 Trillion (MoE)	[2, 7]
Active Parameters	49 Billion	[2, 7]
Context Window	1 Million Tokens	[2, 5, 6, 7]
Reasoning Modes	“Thinking” / “Non-Thinking”	[5]
API Compatibility	OpenAI ChatCompletions & Anthropic	[5]
Performance Goal	Outperform leading US models in speed and quality	[6]

Who should care

DeepSeek-V4 Pro offers significant advantages for various stakeholders in the AI ecosystem.

Builders

Builders can leverage DeepSeek-V4 Pro for developing advanced applications. Its 1M-token context window is ideal for complex code agents [6]. The model supports function calling and JSON mode [3]. Builders can access it via Together AI’s API [1].

Enterprise

Enterprise users benefit from DeepSeek-V4 Pro’s capabilities in document intelligence and research synthesis [6]. The model’s controllable reasoning modes can enhance business process automation [5]. Its competitive pricing for long-context workloads offers cost efficiency [3].

End users

End users will experience more coherent and contextually rich outputs [6]. The model’s stronger agent capabilities can improve interactive AI experiences [4]. Its ability to process longer inputs leads to better understanding [6].

Investors

Investors should note DeepSeek-V4 Pro’s potential to outperform leading US models [6]. Its availability on multiple platforms indicates strong market adoption [1, 2, 3, 7]. The model’s advanced features position it for significant growth in the AI market [4].

How to use DeepSeek-V4 Pro today

DeepSeek-V4 Pro is accessible through several platforms, including Together AI, DeepSeek API, DeepInfra, and OpenRouter.

To use DeepSeek-V4 Pro via Together AI:

Obtain API Key: Acquire a Together AI API key [1].
Authenticate: Use your API key in request headers for authentication [1].
Endpoint: Access the model using the endpoint deepseek-ai/DeepSeek-V4-Pro [1].
Select Mode: Choose between “Non-think” for fast responses or “Thinking” for detailed reasoning [1, 5].
Integrate: Utilize the model for tasks like code agents or document intelligence [6].

For DeepSeek API access, update your model to deepseek-v4-pro [5]. The DeepSeek API supports OpenAI ChatCompletions and Anthropic APIs [5].

DeepSeek-V4 Pro vs competitors

DeepSeek-V4 Pro differentiates itself with its large context window and MoE architecture.

Feature	DeepSeek-V4 Pro	DeepSeek V4 Flash
Model Type	MoE	Not yet disclosed.
Total Parameters	1.6 Trillion	Not yet disclosed.
Active Parameters	49 Billion	284 Billion
Context Window	1 Million Tokens	1 Million Tokens
Input Pricing (per 1M tokens, Together AI)	$2.10	Not yet disclosed.
Output Pricing (per 1M tokens, Together AI)	$4.40	Not yet disclosed.
Reasoning Modes	“Thinking” / “Non-Thinking”	“Thinking” / “Non-Thinking”
Function Calling	Yes	Yes
JSON Mode	Yes	Not yet disclosed.

Risks, limits, and myths

Myth: DeepSeek-V4 Pro is a dense model. Fact: DeepSeek-V4 Pro is a Mixture-of-Experts (MoE) model, not a dense model [2, 7].
Limit: While supporting a 1M-token context, managing such large contexts efficiently can be complex [6]. Developers need to optimize prompts and data handling.
Risk: The cost for extensive long-context usage can accumulate, despite cached-input pricing [3]. Users should monitor token consumption carefully.
Myth: DeepSeek-V4 Pro only works with DeepSeek’s native API. Fact: It supports OpenAI ChatCompletions and Anthropic APIs for broader integration [5].

FAQ

What is the context window size of DeepSeek-V4 Pro?
DeepSeek-V4 Pro has a 1M-token context window [2, 5, 6, 7].
How many parameters does DeepSeek-V4 Pro have?
DeepSeek-V4 Pro has 1.6 trillion total parameters and 49 billion active parameters [2, 7].
What reasoning modes does DeepSeek-V4 Pro offer?
DeepSeek-V4 Pro offers “Thinking” and “Non-think” reasoning modes [5].
Where can I access DeepSeek-V4 Pro?
You can access DeepSeek-V4 Pro via Together AI, DeepSeek API, DeepInfra, and OpenRouter [1, 2, 3, 4, 7].
Is DeepSeek-V4 Pro an MoE model?
Yes, DeepSeek-V4 Pro is a Mixture-of-Experts (MoE) model [2, 7].
What are the pricing details for DeepSeek-V4 Pro on Together AI?
On Together AI, input is $2.10 per 1M tokens and output is $4.40 per 1M tokens [3].
Does DeepSeek-V4 Pro support function calling?
Yes, DeepSeek-V4 Pro supports function calling [3].
What applications is DeepSeek-V4 Pro suitable for?
DeepSeek-V4 Pro is suitable for code agents, document intelligence, and research synthesis [6].

Glossary

Mixture-of-Experts (MoE): An AI model architecture where different “expert” sub-networks specialize in processing different types of data, with a gating network determining which experts to use for a given input [2, 7].
Context Window: The maximum number of tokens an AI model can process and consider at one time when generating a response [2, 6].
Token: A unit of text or code used by AI models, which can be a word, part of a word, or a punctuation mark [6].
Function Calling: A capability of language models to identify when a user’s intent can be fulfilled by calling an external tool or API, and then generating the appropriate function call [3].
Cached-Input Pricing: A pricing model where the cost for processing the initial input in a long conversation or document is reduced or handled differently, often to encourage long-context use [3].

Explore the DeepSeek-V4 Pro API on Together AI to integrate its advanced long-context capabilities into your applications today.

Sources

Author

Siegfried Kamgo

Founder and editorial lead at FrontierWisdom. Engineer turned operator-analyst writing about AI systems, automation infrastructure, decentralised stacks, and the practical economics of frontier technology. Focus: turning fast-moving releases into durable, implementation-ready playbooks.

DeepSeek-V4 Pro Arrives on Together AI with 1M Context

Turn this article into a repeatable weekly edge.

What is DeepSeek-V4 Pro

What is new vs the previous version

How does DeepSeek-V4 Pro work

Benchmarks and evidence

Who should care

Builders

Enterprise

End users

Investors

How to use DeepSeek-V4 Pro today

DeepSeek-V4 Pro vs competitors

Risks, limits, and myths

FAQ

Glossary

Sources

Author

Siegfried Kamgo

Get the next blueprint before it becomes common advice.

Related Articles

Residual-loss Anomaly Analysis of PINNs for Change-point Detection

Faithful Autoformalization: Roundtrip Verification Improves LLM Fidelity

NVIDIA TensorRT-LLM v1.3.0rc13 Adds Nemotron 3 Nano Omni Support

Leave a Reply Cancel reply