DeepSeek V4
- Provider
- DeepSeek
- Status
- Current
- Context
- 1,000,000 tok
- SWE-bench
- 80.6%
- Price
- $0.435 / $0.87 /MTok
DeepSeek V4 is the current flagship model from the Chinese lab DeepSeek, released as a preview on 24 April 2026, and it is the strongest open-weight model on the market for cost-to-quality. It ships in two Mixture-of-Experts sizes — V4-Pro (1.6 trillion parameters, 49 billion active) and V4-Flash (284 billion, 13 billion active) — both with a 1-million-token context window, both under the permissive MIT licence. V4-Pro scores 80.6% on SWE-bench Verified (vendor-reported) and costs $0.87 per million output tokens after a permanent 75% price cut, roughly 34 times cheaper than GPT-5.5.
The honest summary is that V4 is the best open-weight model on price and one of the best on capability, but it is not the absolute coding king. It is tied at the top of the open-weight field on the older SWE-bench Verified (level with Kimi K2.6), yet on the harder, memorisation-resistant SWE-bench Pro it sits behind Kimi K2.6 and GLM-5.1, and the US government’s NIST evaluation puts it about eight months behind the American frontier. What V4 wins decisively is value: frontier-adjacent results, a 1M-token context, native open weights, and the lowest hosted price of any major model.
Quick specs
| Provider | DeepSeek |
| Released | 24 April 2026 (preview; web, app and API) |
| Variants | V4-Pro (1.6T / 49B active), V4-Flash (284B / 13B active) |
| Architecture | Mixture-of-Experts with sparse attention |
| Context window | 1,000,000 tokens |
| Max output | 384,000 tokens |
| Knowledge cutoff | Not publicly disclosed |
| Licence | Open weights, MIT |
| Input price (V4-Pro) | $0.435 / MTok |
| Output price (V4-Pro) | $0.87 / MTok |
| V4-Flash price | $0.14 / $0.28 per MTok |
| SWE-bench Verified | 80.6% (vendor) |
| SWE-bench Pro | 55.4% (vendor); ~55% (CAISI) |
| Artificial Analysis Index | 52 (top open tier; behind Kimi K2.6) |
| Best for | Cost-sensitive coding agents, self-hosting, long-context work on a budget |
| Limitations | Trails Kimi K2.6/GLM-5.1 on SWE-bench Pro; ~8 months behind the frontier (CAISI) |
Try DeepSeek V4 in the app · Open weights on Hugging Face
What’s new in DeepSeek V4
V4 is a substantial step up from the V3 line, and the changes cluster in four areas (DeepSeek API docs, MIT Technology Review).
Two sizes, one architecture
DeepSeek split V4 into V4-Pro (1.6 trillion parameters, 49 billion active) and V4-Flash (284 billion, 13 billion active). Both are sparse Mixture-of-Experts models that activate only a fraction of their parameters per token, which is how DeepSeek keeps inference cheap at trillion-parameter scale. V4-Pro is the quality flagship; V4-Flash is built for fast, high-volume and agentic workloads where cost per call matters most.
A 1M-token context agents can use
Both variants carry a 1-million-token context window, up from 128K on V3, paired with sparse attention so the long context stays affordable rather than just nominal. DeepSeek’s framing is that the window is large enough for whole-repository and whole-corpus agentic work, not a headline number that degrades in practice.
Thinking modes
V4 is a hybrid reasoning model with three modes: Non-think (fast, intuitive answers), Think High (chain-of-thought reasoning) and Think Max (maximum-depth reasoning for the hardest problems). Think Max can generate very long reasoning traces, which is why DeepSeek recommends keeping a large context window available for it. This puts V4’s reasoning control in the same family as Claude’s effort levels and OpenAI’s reasoning settings, inside a single open-weight model.
Agentic and vision gains
DeepSeek positions V4 as an agentic model first, with stronger tool use and function calling than V3, and adds vision (image understanding) to a line that was previously text-only at its core. Some independent analyses describe V4 as moving toward native multimodal processing; DeepSeek’s own preview notes lead on agentic and reasoning gains rather than multimodality, so treat the broader multimodal claims as vendor- or analyst-sourced until tested.
The DeepSeek V4 family
V4 is a family, not a single model. The split between Pro and Flash is the main decision a user makes.
| Variant | Parameters | Active | Best for | API price (in / out) |
|---|---|---|---|---|
| V4-Pro | 1.6T | 49B | Top-quality coding, reasoning, long-context work | $0.435 / $0.87 |
| V4-Flash | 284B | 13B | Fast, high-volume and agentic loops on a budget | $0.14 / $0.28 |
Both share the 1M-token context, the 384K max output, the three thinking modes and the MIT licence. Reasoning is also available as a separate dedicated line — the DeepSeek R-series — which powers the DeepThink mode in the DeepSeek app; V4 folds lighter reasoning into the same model via its thinking modes.
Benchmark performance
DeepSeek’s published numbers are strong, but the independent picture is more nuanced. The headline figures below are vendor-reported unless marked otherwise.
Coding
| Benchmark | V4-Pro | Note |
|---|---|---|
| SWE-bench Verified | 80.6% | Vendor; older Python-only variant — reads as a ceiling |
| SWE-bench Pro | 55.4% (vendor) / ~55% (CAISI) | Harder, memorisation-resistant; behind Kimi K2.6 and GLM-5.1 |
| LiveCodeBench | 93.5% | Vendor |
V4-Pro is tied at the top of the open-weight field on SWE-bench Verified at 80.6% (level with Kimi K2.6’s ~80.2%), within striking distance of proprietary leaders and among the top open scores on the best AI models board. But SWE-bench Verified is the older, contaminated benchmark; on the harder SWE-bench Pro, V4-Pro scores about 55%, and NIST’s CAISI evaluation places it behind Kimi K2.6 (58.6%), GLM-5.1 (58.4%) and Claude Opus 4.7 (64.3%). The two-benchmark gap is the clearest sign that V4 is excellent value rather than the outright coding leader. See best AI for coding for the full field.
Reasoning and knowledge
| Benchmark | V4-Pro | Note |
|---|---|---|
| GPQA Diamond | 90.1% | Vendor; graduate-level science |
| MMLU-Pro | 87.5% | Vendor; broad knowledge |
| Artificial Analysis Intelligence Index | 52 | Independent; second only to Kimi K2.6 (54) among open weights |
On general reasoning and knowledge, V4-Pro is genuinely strong: 90.1% on GPQA Diamond and 87.5% on MMLU-Pro put it in the frontier-adjacent band. The most useful independent read is the Artificial Analysis Intelligence Index, where V4-Pro scores 52, second only to Kimi K2.6 (54) among open-weight models. That ordering matters: it confirms V4 is a top open model, not the single best.
The independent verdict (CAISI)
The US government’s Center for AI Standards and Innovation (CAISI), part of NIST, evaluated V4-Pro across nine benchmarks in five domains (cyber, software engineering, natural sciences, abstract reasoning and maths) using Item Response Theory. Its conclusion: V4-Pro is the most capable Chinese model evaluated to date, but lags the leading US models by about eight months (NIST). DeepSeek’s own technical report puts the gap closer to three-to-six months; both can be true depending on the benchmark.
Pricing breakdown
V4’s pricing is its sharpest weapon. In May 2026 DeepSeek made a 75% cut to V4-Pro pricing permanent — the discount became the list price (Apidog).
| Model | Input (per MTok) | Output (per MTok) | Cache hit (input) |
|---|---|---|---|
| V4-Pro | $0.435 | $0.87 | ~$0.0036 |
| V4-Flash | $0.14 | $0.28 | lower still |
On top of the API, the model is free two more ways: the DeepSeek app is free with no paid tier, and the MIT-licensed weights are free to self-host. For data-sensitive users, self-hosting is also the route around the app’s China data residency (see the DeepSeek provider page).
Cost comparison with contemporaries
| Model | Input | Output | Notes |
|---|---|---|---|
| DeepSeek V4-Pro | $0.435 | $0.87 | Frontier-adjacent quality at the lowest price |
| DeepSeek V4-Flash | $0.14 | $0.28 | Cheapest capable model on the best models board |
| Claude Opus 4.7 | $5.00 | $25.00 | ~29x more expensive on output than V4-Pro |
| GPT-5.4 | $2.50 | $15.00 | The value frontier pick; ~17x V4-Pro on output |
V4-Pro output is roughly 34x cheaper than GPT-5.5 and ~29x cheaper than Claude Opus 4.7, while landing within striking distance of both on knowledge and coding benchmarks. That ratio — not the absolute benchmark, is the reason to choose V4.
How to access DeepSeek V4
Via API
V4 is available with no waitlist on the DeepSeek API, which is OpenAI-compatible, with prompt caching that cuts cost further. Current rates are on the pricing page.
from openai import OpenAI
client = OpenAI(api_key="YOUR_KEY", base_url="https://api.deepseek.com")
resp = client.chat.completions.create(
model="deepseek-v4-pro",
messages=[{"role": "user", "content": "Your prompt here"}],
)
print(resp.choices[0].message.content)
Via the DeepSeek app
V4 powers the free DeepSeek app across web, iOS and Android — no paid tier, no message caps advertised. The app exposes the thinking modes and a web-search toggle.
Via open weights
The weights for V4-Pro and V4-Flash are published under the MIT licence on Hugging Face, so the model can be self-hosted, fine-tuned and used commercially. For quality, serve in bf16 rather than fp8 quantisation. V4 is also widely available through third-party inference providers.
How DeepSeek V4 compares
V4’s June-2026 contemporaries are the other Chinese open-weight leaders — Kimi K2.6 and Qwen 3.6 — plus OpenAI’s value pick GPT-5.4.
vs Kimi K2.6
Kimi K2.6, from Moonshot AI, is V4’s closest open-weight rival. The two are closely matched: they are near-tied on SWE-bench Verified (80.6% vs ~80.2%), with V4-Pro ahead on LiveCodeBench and Codeforces, while Kimi K2.6 edges ahead on the harder SWE-bench Pro (58.6% vs ~55%) and tops the Artificial Analysis open-weight index (54 vs 52). Choose V4-Pro for cheaper coding agents and broad knowledge; Kimi K2.6 for the hardest agentic-coding tasks and very long Chinese-language context.
vs Qwen 3.6
Alibaba’s Qwen line (Qwen 3.6 / Qwen3.7 Max) sits in the same open-weight cluster, within a fraction of a point of V4 on SWE-bench Verified. Qwen’s edge is multilingual breadth and vision; V4’s edge is price and agentic coding. For a multilingual or vision-heavy workload, Qwen is the stronger open pick; for cost-sensitive coding, V4.
vs GPT-5.4
GPT-5.4 is OpenAI’s value-frontier model and a useful proprietary yardstick. On BenchLM’s composite, V4-Pro (87) is close behind GPT-5.4 (88), and both trail Gemini 3.1 Pro (93). GPT-5.4 is more capable on the hardest reasoning and better supported in tooling; V4-Pro is roughly 17x cheaper on output and open-weight. For most production coding, V4 is the cost-rational choice; for the hardest frontier reasoning, GPT-5.4.
The practical verdict
V4 is the best-value model on the market and the leading open-weight model on cost-to-quality, narrowly the second-best open model on capability behind Kimi K2.6. If price or self-hosting matters, V4 wins; if you need the single strongest open model on the hardest coding, look at Kimi K2.6. See best AI models and best AI for coding for where V4 sits across the field.
Known limitations
Not the top open model on hard coding. On the memorisation-resistant SWE-bench Pro, V4-Pro (~55%) trails Kimi K2.6 (58.6%) and GLM-5.1 (58.4%). Its open-weight lead is on the older SWE-bench Verified.
Behind the frontier. NIST’s CAISI puts V4-Pro about eight months behind the leading US models; DeepSeek says three-to-six. Either way it trails Claude Opus 4.8, GPT-5.5 and Gemini 3.1 Pro on the hardest tasks.
Vendor-reported headline numbers. SWE-bench Verified, LiveCodeBench, GPQA and MMLU-Pro figures are DeepSeek-run; the independent CAISI and Artificial Analysis figures are the ones to anchor on.
Hosted-app privacy. Used through the China-hosted app, V4 carries data-residency and regulatory caveats; the open weights avoid them.
Multimodal claims are soft. Vision (image understanding) is supported, but broader native-multimodal claims come from analysts rather than DeepSeek’s preview notes.
Community reception
Reception was strongly positive on value and open-weight access, and measured on absolute capability. Independent reviewers consistently framed V4 as the model that made frontier-adjacent quality genuinely cheap and self-hostable, while noting it does not top the hardest benchmarks. The Hugging Face write-up highlighted the 1M-token context as one “agents can actually use,” and developer coverage focused on the permanent price cut as the bigger story than any single benchmark. The CAISI “eight months behind” framing drew debate, with several analysts arguing the gap is narrower on cost-adjusted terms.
Version history
| Version | Released | Key changes |
|---|---|---|
| DeepSeek V4 | 24 Apr 2026 | V4-Pro and V4-Flash; 1M context; thinking modes; vision; MIT; permanent 75% price cut (May 2026) |
| DeepSeek V3.2 | 2025 | Previous general model; long-context chat; text-first |
| DeepSeek V3 / V3.1 | Dec 2024 – 2025 | The V3 line that established DeepSeek’s efficiency reputation |
DeepSeek’s reasoning successor, R2, has not been released as of June 2026 — the company is reportedly holding it back rather than ship below its bar, so the R-series reasoning line still centres on R1.
FAQ
What is DeepSeek V4?
DeepSeek V4 is the current flagship model from the Chinese lab DeepSeek, released on 24 April 2026. It is an open-weight (MIT) Mixture-of-Experts model in two sizes — V4-Pro (1.6T parameters, 49B active) and V4-Flash (284B, 13B active) — both with a 1-million-token context window and three thinking modes.
How much does DeepSeek V4 cost?
V4-Pro costs $0.435 per million input tokens and $0.87 per million output tokens after a permanent 75% price cut; V4-Flash is cheaper at $0.14 / $0.28. The DeepSeek app is free, and the MIT-licensed weights are free to self-host. V4-Pro output is roughly 34x cheaper than GPT-5.5.
Is DeepSeek V4 the best open-weight model?
It is the best on value and among the best on capability. V4-Pro is tied at the top of the open-weight field on SWE-bench Verified (80.6%, level with Kimi K2.6’s ~80.2%), but Kimi K2.6 edges ahead on the harder SWE-bench Pro (58.6% vs ~55%) and tops the Artificial Analysis open-weight index (54 vs 52).
How good is DeepSeek V4 at coding?
Very good for the price. It scores 80.6% on SWE-bench Verified (vendor) and 93.5% on LiveCodeBench, but on the harder SWE-bench Pro it scores about 55%, behind Kimi K2.6, GLM-5.1 and Claude Opus 4.7. It is the cost-rational choice for coding agents rather than the outright leader.
Does DeepSeek V4 have a reasoning mode?
Yes. V4 is a hybrid reasoning model with three modes — Non-think, Think High and Think Max — selectable per request. For dedicated deliberate reasoning, DeepSeek also offers the separate R-series, which powers the app’s DeepThink mode.
What is the context window of DeepSeek V4?
Both V4-Pro and V4-Flash have a 1-million-token context window and up to 384,000 output tokens, paired with sparse attention to keep long-context use affordable.
Is DeepSeek V4 open source?
The weights are released under the permissive MIT licence, so V4 can be downloaded, self-hosted, fine-tuned and used commercially. Weights for V4-Pro and V4-Flash are on Hugging Face.
How does DeepSeek V4 compare to GPT-5.4?
On BenchLM’s composite, V4-Pro (87) is just behind GPT-5.4 (88), with both trailing Gemini 3.1 Pro (93). GPT-5.4 is stronger on the hardest reasoning and better supported in tooling; V4-Pro is roughly 17x cheaper on output and open-weight, making it the cost-rational choice for most production work.