Skip to main content
Frontier Signal

DeepSeek-V4 Pro Arrives on Together AI with 1M Context

DeepSeek-V4 Pro is now available on Together AI, offering a 1M-token context window and 1.6T parameters. It supports multiple reasoning modes and competitive pricing.

Operator Briefing

Turn this article into a repeatable weekly edge.

Get implementation-minded writeups on frontier tools, systems, and income opportunities built for professionals.

No fluff. No generic AI listicles. Unsubscribe anytime.

DeepSeek-V4 Pro is a large-scale Mixture-of-Experts (MoE) language model now available through Together AI, DeepSeek API, DeepInfra, and OpenRouter. It features a 1M-token context window and 1.6 trillion total parameters, with 49 billion active parameters. The model supports multiple reasoning modes and is designed for complex tasks like code agents, document intelligence, and research synthesis, offering competitive pricing for its advanced capabilities.

Category Detail
Released by Together AI
Release date Not yet disclosed.
What it is A large-scale Mixture-of-Experts (MoE) language model with a 1M-token context window.
Who it is for Developers and enterprises needing powerful AI for long-context reasoning, code agents, and document intelligence.
Where to get it Together AI API, DeepSeek API, DeepInfra, OpenRouter.
Price Input: $2.10/1M tokens (Together AI), $0.435/1M tokens (OpenRouter); Output: $4.40/1M tokens (Together AI), $0.87/1M tokens (OpenRouter).
  • DeepSeek-V4 Pro is a Mixture-of-Experts (MoE) model.
  • It features 1.6 trillion total parameters.
  • The model has 49 billion active parameters.
  • DeepSeek-V4 Pro supports a 1M-token context window.
  • It offers controllable reasoning modes.
  • DeepSeek-V4 Pro is a large-scale Mixture-of-Experts (MoE) model.
  • It features a 1M-token context window for extensive input processing.
  • The model has 1.6 trillion total parameters and 49 billion active parameters.
  • DeepSeek-V4 Pro offers controllable reasoning modes, including “Thinking” and “Non-think.”
  • It is available via Together AI, DeepSeek API, DeepInfra, and OpenRouter.
  • Pricing includes cached-input options for long-context workloads.
  • DeepSeek-V4 Pro supports use cases like code agents and document intelligence.

What is DeepSeek-V4 Pro

DeepSeek-V4 Pro is a large-scale Mixture-of-Experts (MoE) language model developed by DeepSeek [2, 7]. It features 1.6 trillion total parameters, with 49 billion active parameters [2, 7]. The model supports a 1M-token context window, enabling it to process significantly longer inputs [2, 5, 6, 7]. DeepSeek-V4 Pro offers controllable reasoning modes for diverse applications [5]. It is designed to outperform leading US models in speed and quality [6].

What is new vs the previous version

DeepSeek-V4 Pro introduces a significantly larger context window and enhanced reasoning capabilities compared to its predecessors.

  • Context Window: DeepSeek-V4 Pro supports a 1M-token context window [2, 5, 6, 7]. This allows for processing much longer inputs than previous versions [6].
  • Reasoning Modes: The model includes dual modes, “Thinking” and “Non-Thinking” [5]. These modes enable controllable reasoning for different tasks [5].
  • Parameter Count: DeepSeek-V4 Pro is an MoE model with 1.6 trillion total parameters [2, 7]. It utilizes 49 billion active parameters [2, 7].
  • Agent Capabilities: DeepSeek-V4 Pro features stronger agent capabilities [4]. This enhances its performance in complex, multi-step tasks [4].

How does DeepSeek-V4 Pro work

DeepSeek-V4 Pro operates as a Mixture-of-Experts (MoE) model, leveraging its architecture for efficient processing.

  1. Expert Activation: DeepSeek-V4 Pro is an MoE model with 1.6 trillion total parameters [2, 7]. Only 49 billion parameters are actively used during inference [2, 7].
  2. Context Processing: The model utilizes a 1M-token context window [2, 5, 6, 7]. This allows it to process and understand very long inputs [6].
  3. Reasoning Modes: DeepSeek-V4 Pro supports “Thinking” and “Non-think” reasoning modes [5]. Users can select the appropriate mode for their task [5].
  4. API Access: Users can access DeepSeek-V4 Pro via Together AI APIs using the endpoint deepseek-ai/DeepSeek-V4-Pro [1]. Authentication requires a Together AI API key [1].
  5. API Compatibility: The model supports OpenAI ChatCompletions and Anthropic APIs [5]. This ensures broad compatibility with existing tools [5].

Benchmarks and evidence

DeepSeek-V4 Pro is designed to deliver high performance across various benchmarks.

Feature Specification Source
Total Parameters 1.6 Trillion (MoE) [2, 7]
Active Parameters 49 Billion [2, 7]
Context Window 1 Million Tokens [2, 5, 6, 7]
Reasoning Modes “Thinking” / “Non-Thinking” [5]
API Compatibility OpenAI ChatCompletions & Anthropic [5]
Performance Goal Outperform leading US models in speed and quality [6]

Who should care

DeepSeek-V4 Pro offers significant advantages for various stakeholders in the AI ecosystem.

Builders

Builders can leverage DeepSeek-V4 Pro for developing advanced applications. Its 1M-token context window is ideal for complex code agents [6]. The model supports function calling and JSON mode [3]. Builders can access it via Together AI’s API [1].

Enterprise

Enterprise users benefit from DeepSeek-V4 Pro’s capabilities in document intelligence and research synthesis [6]. The model’s controllable reasoning modes can enhance business process automation [5]. Its competitive pricing for long-context workloads offers cost efficiency [3].

End users

End users will experience more coherent and contextually rich outputs [6]. The model’s stronger agent capabilities can improve interactive AI experiences [4]. Its ability to process longer inputs leads to better understanding [6].

Investors

Investors should note DeepSeek-V4 Pro’s potential to outperform leading US models [6]. Its availability on multiple platforms indicates strong market adoption [1, 2, 3, 7]. The model’s advanced features position it for significant growth in the AI market [4].

How to use DeepSeek-V4 Pro today

DeepSeek-V4 Pro is accessible through several platforms, including Together AI, DeepSeek API, DeepInfra, and OpenRouter.

To use DeepSeek-V4 Pro via Together AI:

  1. Obtain API Key: Acquire a Together AI API key [1].
  2. Authenticate: Use your API key in request headers for authentication [1].
  3. Endpoint: Access the model using the endpoint deepseek-ai/DeepSeek-V4-Pro [1].
  4. Select Mode: Choose between “Non-think” for fast responses or “Thinking” for detailed reasoning [1, 5].
  5. Integrate: Utilize the model for tasks like code agents or document intelligence [6].

For DeepSeek API access, update your model to deepseek-v4-pro [5]. The DeepSeek API supports OpenAI ChatCompletions and Anthropic APIs [5].

DeepSeek-V4 Pro vs competitors

DeepSeek-V4 Pro differentiates itself with its large context window and MoE architecture.

Feature DeepSeek-V4 Pro DeepSeek V4 Flash
Model Type MoE Not yet disclosed.
Total Parameters 1.6 Trillion Not yet disclosed.
Active Parameters 49 Billion 284 Billion
Context Window 1 Million Tokens 1 Million Tokens
Input Pricing (per 1M tokens, Together AI) $2.10 Not yet disclosed.
Output Pricing (per 1M tokens, Together AI) $4.40 Not yet disclosed.
Reasoning Modes “Thinking” / “Non-Thinking” “Thinking” / “Non-Thinking”
Function Calling Yes Yes
JSON Mode Yes Not yet disclosed.

Risks, limits, and myths

  • Myth: DeepSeek-V4 Pro is a dense model. Fact: DeepSeek-V4 Pro is a Mixture-of-Experts (MoE) model, not a dense model [2, 7].
  • Limit: While supporting a 1M-token context, managing such large contexts efficiently can be complex [6]. Developers need to optimize prompts and data handling.
  • Risk: The cost for extensive long-context usage can accumulate, despite cached-input pricing [3]. Users should monitor token consumption carefully.
  • Myth: DeepSeek-V4 Pro only works with DeepSeek’s native API. Fact: It supports OpenAI ChatCompletions and Anthropic APIs for broader integration [5].

FAQ

  • What is the context window size of DeepSeek-V4 Pro?

    DeepSeek-V4 Pro has a 1M-token context window [2, 5, 6, 7].

  • How many parameters does DeepSeek-V4 Pro have?

    DeepSeek-V4 Pro has 1.6 trillion total parameters and 49 billion active parameters [2, 7].

  • What reasoning modes does DeepSeek-V4 Pro offer?

    DeepSeek-V4 Pro offers “Thinking” and “Non-think” reasoning modes [5].

  • Where can I access DeepSeek-V4 Pro?

    You can access DeepSeek-V4 Pro via Together AI, DeepSeek API, DeepInfra, and OpenRouter [1, 2, 3, 4, 7].

  • Is DeepSeek-V4 Pro an MoE model?

    Yes, DeepSeek-V4 Pro is a Mixture-of-Experts (MoE) model [2, 7].

  • What are the pricing details for DeepSeek-V4 Pro on Together AI?

    On Together AI, input is $2.10 per 1M tokens and output is $4.40 per 1M tokens [3].

  • Does DeepSeek-V4 Pro support function calling?

    Yes, DeepSeek-V4 Pro supports function calling [3].

  • What applications is DeepSeek-V4 Pro suitable for?

    DeepSeek-V4 Pro is suitable for code agents, document intelligence, and research synthesis [6].

Glossary

Mixture-of-Experts (MoE)
An AI model architecture where different “expert” sub-networks specialize in processing different types of data, with a gating network determining which experts to use for a given input [2, 7].
Context Window
The maximum number of tokens an AI model can process and consider at one time when generating a response [2, 6].
Token
A unit of text or code used by AI models, which can be a word, part of a word, or a punctuation mark [6].
Function Calling
A capability of language models to identify when a user’s intent can be fulfilled by calling an external tool or API, and then generating the appropriate function call [3].
Cached-Input Pricing
A pricing model where the cost for processing the initial input in a long conversation or document is reduced or handled differently, often to encourage long-context use [3].

Explore the DeepSeek-V4 Pro API on Together AI to integrate its advanced long-context capabilities into your applications today.

Author

  • Siegfried Kamgo

    Founder and editorial lead at FrontierWisdom. Engineer turned operator-analyst writing about AI systems, automation infrastructure, decentralised stacks, and the practical economics of frontier technology. Focus: turning fast-moving releases into durable, implementation-ready playbooks.

Keep Compounding Signal

Get the next blueprint before it becomes common advice.

Join the newsletter for future-economy playbooks, tactical prompts, and high-margin tool recommendations.

  • Actionable execution blueprints
  • High-signal tool and infrastructure breakdowns
  • New monetization angles before they saturate

No fluff. No generic AI listicles. Unsubscribe anytime.

Leave a Reply

Your email address will not be published. Required fields are marked *