THE AI RANKINGS

DeepSeek

DeepSeek V4

Provider
DeepSeek
Status
Current
Context
1,000,000 tok
SWE-bench
80.6%
Price
$0.435 / $0.87 /MTok

DeepSeek V4 is the current flagship model from the Chinese lab DeepSeek, released as a preview on 24 April 2026, and it is the strongest open-weight model on the market for cost-to-quality. It ships in two Mixture-of-Experts sizes — V4-Pro (1.6 trillion parameters, 49 billion active) and V4-Flash (284 billion, 13 billion active) — both with a 1-million-token context window, both under the permissive MIT licence. V4-Pro scores 80.6% on SWE-bench Verified (vendor-reported) and costs $0.87 per million output tokens after a permanent 75% price cut, roughly 34 times cheaper than GPT-5.5.

The honest summary is that V4 is the best open-weight model on price and one of the best on capability, but it is not the absolute coding king. It is tied at the top of the open-weight field on the older SWE-bench Verified (level with Kimi K2.6), yet on the harder, memorisation-resistant SWE-bench Pro it sits behind Kimi K2.6 and GLM-5.1, and the US government’s NIST evaluation puts it about eight months behind the American frontier. What V4 wins decisively is value: frontier-adjacent results, a 1M-token context, native open weights, and the lowest hosted price of any major model.

Quick specs

ProviderDeepSeek
Released24 April 2026 (preview; web, app and API)
VariantsV4-Pro (1.6T / 49B active), V4-Flash (284B / 13B active)
ArchitectureMixture-of-Experts with sparse attention
Context window1,000,000 tokens
Max output384,000 tokens
Knowledge cutoffNot publicly disclosed
LicenceOpen weights, MIT
Input price (V4-Pro)$0.435 / MTok
Output price (V4-Pro)$0.87 / MTok
V4-Flash price$0.14 / $0.28 per MTok
SWE-bench Verified80.6% (vendor)
SWE-bench Pro55.4% (vendor); ~55% (CAISI)
Artificial Analysis Index52 (top open tier; behind Kimi K2.6)
Best forCost-sensitive coding agents, self-hosting, long-context work on a budget
LimitationsTrails Kimi K2.6/GLM-5.1 on SWE-bench Pro; ~8 months behind the frontier (CAISI)

Try DeepSeek V4 in the app · Open weights on Hugging Face

What’s new in DeepSeek V4

V4 is a substantial step up from the V3 line, and the changes cluster in four areas (DeepSeek API docs, MIT Technology Review).

Two sizes, one architecture

DeepSeek split V4 into V4-Pro (1.6 trillion parameters, 49 billion active) and V4-Flash (284 billion, 13 billion active). Both are sparse Mixture-of-Experts models that activate only a fraction of their parameters per token, which is how DeepSeek keeps inference cheap at trillion-parameter scale. V4-Pro is the quality flagship; V4-Flash is built for fast, high-volume and agentic workloads where cost per call matters most.

A 1M-token context agents can use

Both variants carry a 1-million-token context window, up from 128K on V3, paired with sparse attention so the long context stays affordable rather than just nominal. DeepSeek’s framing is that the window is large enough for whole-repository and whole-corpus agentic work, not a headline number that degrades in practice.

Thinking modes

V4 is a hybrid reasoning model with three modes: Non-think (fast, intuitive answers), Think High (chain-of-thought reasoning) and Think Max (maximum-depth reasoning for the hardest problems). Think Max can generate very long reasoning traces, which is why DeepSeek recommends keeping a large context window available for it. This puts V4’s reasoning control in the same family as Claude’s effort levels and OpenAI’s reasoning settings, inside a single open-weight model.

Agentic and vision gains

DeepSeek positions V4 as an agentic model first, with stronger tool use and function calling than V3, and adds vision (image understanding) to a line that was previously text-only at its core. Some independent analyses describe V4 as moving toward native multimodal processing; DeepSeek’s own preview notes lead on agentic and reasoning gains rather than multimodality, so treat the broader multimodal claims as vendor- or analyst-sourced until tested.

The DeepSeek V4 family

V4 is a family, not a single model. The split between Pro and Flash is the main decision a user makes.

VariantParametersActiveBest forAPI price (in / out)
V4-Pro1.6T49BTop-quality coding, reasoning, long-context work$0.435 / $0.87
V4-Flash284B13BFast, high-volume and agentic loops on a budget$0.14 / $0.28

Both share the 1M-token context, the 384K max output, the three thinking modes and the MIT licence. Reasoning is also available as a separate dedicated line — the DeepSeek R-series — which powers the DeepThink mode in the DeepSeek app; V4 folds lighter reasoning into the same model via its thinking modes.

Benchmark performance

DeepSeek’s published numbers are strong, but the independent picture is more nuanced. The headline figures below are vendor-reported unless marked otherwise.

Coding

BenchmarkV4-ProNote
SWE-bench Verified80.6%Vendor; older Python-only variant — reads as a ceiling
SWE-bench Pro55.4% (vendor) / ~55% (CAISI)Harder, memorisation-resistant; behind Kimi K2.6 and GLM-5.1
LiveCodeBench93.5%Vendor

V4-Pro is tied at the top of the open-weight field on SWE-bench Verified at 80.6% (level with Kimi K2.6’s ~80.2%), within striking distance of proprietary leaders and among the top open scores on the best AI models board. But SWE-bench Verified is the older, contaminated benchmark; on the harder SWE-bench Pro, V4-Pro scores about 55%, and NIST’s CAISI evaluation places it behind Kimi K2.6 (58.6%), GLM-5.1 (58.4%) and Claude Opus 4.7 (64.3%). The two-benchmark gap is the clearest sign that V4 is excellent value rather than the outright coding leader. See best AI for coding for the full field.

Reasoning and knowledge

BenchmarkV4-ProNote
GPQA Diamond90.1%Vendor; graduate-level science
MMLU-Pro87.5%Vendor; broad knowledge
Artificial Analysis Intelligence Index52Independent; second only to Kimi K2.6 (54) among open weights

On general reasoning and knowledge, V4-Pro is genuinely strong: 90.1% on GPQA Diamond and 87.5% on MMLU-Pro put it in the frontier-adjacent band. The most useful independent read is the Artificial Analysis Intelligence Index, where V4-Pro scores 52, second only to Kimi K2.6 (54) among open-weight models. That ordering matters: it confirms V4 is a top open model, not the single best.

The independent verdict (CAISI)

The US government’s Center for AI Standards and Innovation (CAISI), part of NIST, evaluated V4-Pro across nine benchmarks in five domains (cyber, software engineering, natural sciences, abstract reasoning and maths) using Item Response Theory. Its conclusion: V4-Pro is the most capable Chinese model evaluated to date, but lags the leading US models by about eight months (NIST). DeepSeek’s own technical report puts the gap closer to three-to-six months; both can be true depending on the benchmark.

Pricing breakdown

V4’s pricing is its sharpest weapon. In May 2026 DeepSeek made a 75% cut to V4-Pro pricing permanent — the discount became the list price (Apidog).

ModelInput (per MTok)Output (per MTok)Cache hit (input)
V4-Pro$0.435$0.87~$0.0036
V4-Flash$0.14$0.28lower still

On top of the API, the model is free two more ways: the DeepSeek app is free with no paid tier, and the MIT-licensed weights are free to self-host. For data-sensitive users, self-hosting is also the route around the app’s China data residency (see the DeepSeek provider page).

Cost comparison with contemporaries

ModelInputOutputNotes
DeepSeek V4-Pro$0.435$0.87Frontier-adjacent quality at the lowest price
DeepSeek V4-Flash$0.14$0.28Cheapest capable model on the best models board
Claude Opus 4.7$5.00$25.00~29x more expensive on output than V4-Pro
GPT-5.4$2.50$15.00The value frontier pick; ~17x V4-Pro on output

V4-Pro output is roughly 34x cheaper than GPT-5.5 and ~29x cheaper than Claude Opus 4.7, while landing within striking distance of both on knowledge and coding benchmarks. That ratio — not the absolute benchmark, is the reason to choose V4.

How to access DeepSeek V4

Via API

V4 is available with no waitlist on the DeepSeek API, which is OpenAI-compatible, with prompt caching that cuts cost further. Current rates are on the pricing page.

from openai import OpenAI
client = OpenAI(api_key="YOUR_KEY", base_url="https://api.deepseek.com")

resp = client.chat.completions.create(
    model="deepseek-v4-pro",
    messages=[{"role": "user", "content": "Your prompt here"}],
)
print(resp.choices[0].message.content)

Via the DeepSeek app

V4 powers the free DeepSeek app across web, iOS and Android — no paid tier, no message caps advertised. The app exposes the thinking modes and a web-search toggle.

Via open weights

The weights for V4-Pro and V4-Flash are published under the MIT licence on Hugging Face, so the model can be self-hosted, fine-tuned and used commercially. For quality, serve in bf16 rather than fp8 quantisation. V4 is also widely available through third-party inference providers.

How DeepSeek V4 compares

V4’s June-2026 contemporaries are the other Chinese open-weight leaders — Kimi K2.6 and Qwen 3.6 — plus OpenAI’s value pick GPT-5.4.

vs Kimi K2.6

Kimi K2.6, from Moonshot AI, is V4’s closest open-weight rival. The two are closely matched: they are near-tied on SWE-bench Verified (80.6% vs ~80.2%), with V4-Pro ahead on LiveCodeBench and Codeforces, while Kimi K2.6 edges ahead on the harder SWE-bench Pro (58.6% vs ~55%) and tops the Artificial Analysis open-weight index (54 vs 52). Choose V4-Pro for cheaper coding agents and broad knowledge; Kimi K2.6 for the hardest agentic-coding tasks and very long Chinese-language context.

vs Qwen 3.6

Alibaba’s Qwen line (Qwen 3.6 / Qwen3.7 Max) sits in the same open-weight cluster, within a fraction of a point of V4 on SWE-bench Verified. Qwen’s edge is multilingual breadth and vision; V4’s edge is price and agentic coding. For a multilingual or vision-heavy workload, Qwen is the stronger open pick; for cost-sensitive coding, V4.

vs GPT-5.4

GPT-5.4 is OpenAI’s value-frontier model and a useful proprietary yardstick. On BenchLM’s composite, V4-Pro (87) is close behind GPT-5.4 (88), and both trail Gemini 3.1 Pro (93). GPT-5.4 is more capable on the hardest reasoning and better supported in tooling; V4-Pro is roughly 17x cheaper on output and open-weight. For most production coding, V4 is the cost-rational choice; for the hardest frontier reasoning, GPT-5.4.

The practical verdict

V4 is the best-value model on the market and the leading open-weight model on cost-to-quality, narrowly the second-best open model on capability behind Kimi K2.6. If price or self-hosting matters, V4 wins; if you need the single strongest open model on the hardest coding, look at Kimi K2.6. See best AI models and best AI for coding for where V4 sits across the field.

Known limitations

Not the top open model on hard coding. On the memorisation-resistant SWE-bench Pro, V4-Pro (~55%) trails Kimi K2.6 (58.6%) and GLM-5.1 (58.4%). Its open-weight lead is on the older SWE-bench Verified.

Behind the frontier. NIST’s CAISI puts V4-Pro about eight months behind the leading US models; DeepSeek says three-to-six. Either way it trails Claude Opus 4.8, GPT-5.5 and Gemini 3.1 Pro on the hardest tasks.

Vendor-reported headline numbers. SWE-bench Verified, LiveCodeBench, GPQA and MMLU-Pro figures are DeepSeek-run; the independent CAISI and Artificial Analysis figures are the ones to anchor on.

Hosted-app privacy. Used through the China-hosted app, V4 carries data-residency and regulatory caveats; the open weights avoid them.

Multimodal claims are soft. Vision (image understanding) is supported, but broader native-multimodal claims come from analysts rather than DeepSeek’s preview notes.

Community reception

Reception was strongly positive on value and open-weight access, and measured on absolute capability. Independent reviewers consistently framed V4 as the model that made frontier-adjacent quality genuinely cheap and self-hostable, while noting it does not top the hardest benchmarks. The Hugging Face write-up highlighted the 1M-token context as one “agents can actually use,” and developer coverage focused on the permanent price cut as the bigger story than any single benchmark. The CAISI “eight months behind” framing drew debate, with several analysts arguing the gap is narrower on cost-adjusted terms.

Version history

VersionReleasedKey changes
DeepSeek V424 Apr 2026V4-Pro and V4-Flash; 1M context; thinking modes; vision; MIT; permanent 75% price cut (May 2026)
DeepSeek V3.22025Previous general model; long-context chat; text-first
DeepSeek V3 / V3.1Dec 2024 – 2025The V3 line that established DeepSeek’s efficiency reputation

DeepSeek’s reasoning successor, R2, has not been released as of June 2026 — the company is reportedly holding it back rather than ship below its bar, so the R-series reasoning line still centres on R1.

FAQ

What is DeepSeek V4?

DeepSeek V4 is the current flagship model from the Chinese lab DeepSeek, released on 24 April 2026. It is an open-weight (MIT) Mixture-of-Experts model in two sizes — V4-Pro (1.6T parameters, 49B active) and V4-Flash (284B, 13B active) — both with a 1-million-token context window and three thinking modes.

How much does DeepSeek V4 cost?

V4-Pro costs $0.435 per million input tokens and $0.87 per million output tokens after a permanent 75% price cut; V4-Flash is cheaper at $0.14 / $0.28. The DeepSeek app is free, and the MIT-licensed weights are free to self-host. V4-Pro output is roughly 34x cheaper than GPT-5.5.

Is DeepSeek V4 the best open-weight model?

It is the best on value and among the best on capability. V4-Pro is tied at the top of the open-weight field on SWE-bench Verified (80.6%, level with Kimi K2.6’s ~80.2%), but Kimi K2.6 edges ahead on the harder SWE-bench Pro (58.6% vs ~55%) and tops the Artificial Analysis open-weight index (54 vs 52).

How good is DeepSeek V4 at coding?

Very good for the price. It scores 80.6% on SWE-bench Verified (vendor) and 93.5% on LiveCodeBench, but on the harder SWE-bench Pro it scores about 55%, behind Kimi K2.6, GLM-5.1 and Claude Opus 4.7. It is the cost-rational choice for coding agents rather than the outright leader.

Does DeepSeek V4 have a reasoning mode?

Yes. V4 is a hybrid reasoning model with three modes — Non-think, Think High and Think Max — selectable per request. For dedicated deliberate reasoning, DeepSeek also offers the separate R-series, which powers the app’s DeepThink mode.

What is the context window of DeepSeek V4?

Both V4-Pro and V4-Flash have a 1-million-token context window and up to 384,000 output tokens, paired with sparse attention to keep long-context use affordable.

Is DeepSeek V4 open source?

The weights are released under the permissive MIT licence, so V4 can be downloaded, self-hosted, fine-tuned and used commercially. Weights for V4-Pro and V4-Flash are on Hugging Face.

How does DeepSeek V4 compare to GPT-5.4?

On BenchLM’s composite, V4-Pro (87) is just behind GPT-5.4 (88), with both trailing Gemini 3.1 Pro (93). GPT-5.4 is stronger on the hardest reasoning and better supported in tooling; V4-Pro is roughly 17x cheaper on output and open-weight, making it the cost-rational choice for most production work.