DeepSeek-R1
- Provider
- DeepSeek
- Status
- Superseded
- Context
- 128,000 tok
- Price
- $0.55 / $2.19 /MTok
DeepSeek-R1 is the open-weight reasoning model that DeepSeek released on 20 January 2025 — and arguably the single most consequential AI release of the decade so far. It matched the quality of OpenAI’s o1 on maths, code and reasoning, was given away under the permissive MIT licence, and was trained largely through reinforcement learning at a fraction of a Western training run’s reported cost. Its launch triggered a global tech sell-off in which Nvidia lost close to $600 billion in market value in a single day (CNBC).
It is now a historic model rather than a current one. DeepSeek never shipped an “R2”: after R1 and its R1-0528 update, the company folded reasoning into its main line, and the current best DeepSeek reasoning option is DeepSeek V4-Pro in thinking mode. The hosted deepseek-reasoner (R1) endpoint is being retired after 24 July 2026 (DeepSeek), though the open weights live on. This page documents R1 — what it was, why it mattered, and what replaced it.
Quick specs
| Provider | DeepSeek |
| Released | 20 January 2025 (R1-0528 update: May 2025) |
| Status | Superseded — by DeepSeek V4; API retires 24 July 2026 |
| Architecture | Mixture-of-Experts — 671B total / 37B active |
| Context window | 128,000 tokens |
| Modalities | Text in, text out |
| Licence | MIT (open weights, self-hostable) |
| Training | Large-scale reinforcement learning (GRPO) |
| AIME 2024 | 79.8% (≈ OpenAI o1) |
| Best for | Historic reference; distilled variants for cheap local reasoning |
| Now use instead | DeepSeek V4-Pro thinking mode |
What DeepSeek-R1 was
DeepSeek-R1 was a Mixture-of-Experts model — 671 billion total parameters with 37 billion active, built on the DeepSeek-V3 base — with a 128K-token context window, released as MIT open weights (Hugging Face). Its significance was less the architecture than the training method: DeepSeek showed that strong reasoning could emerge from large-scale reinforcement learning with minimal supervised data.
The companion model DeepSeek-R1-Zero was trained by pure RL with no supervised fine-tuning, and exhibited an “aha moment” — spontaneously learning to re-check its work and adapt strategy mid-solution. The full R1 added a cold-start dataset and alternating RL and supervised-fine-tuning stages, using DeepSeek’s Group Relative Policy Optimization (GRPO) algorithm. The work was later published in Nature (September 2025) after independent peer review — a first for a major LLM (Nature).
R1 also shipped six distilled variants (on Qwen2.5 at 1.5B/7B/14B/32B and Llama 3 at 8B/70B), which put o1-class reasoning onto hardware as small as a laptop and remain widely used today.
Why it mattered
R1’s release was a genuine inflection point in the AI industry, for three reasons:
- The market shock. On 27 January 2025, news that a Chinese lab had matched o1 cheaply and openly wiped roughly $589–600 billion off Nvidia’s market cap in one day — the largest single-day loss in stock-market history — and dragged the whole semiconductor complex down (CNBC).
- The efficiency narrative. DeepSeek’s claim that the underlying model cost only a few million dollars to train reframed the AI race around efficiency rather than raw spend — though those cost figures are DeepSeek’s own and were never independently audited.
- Open weights at the frontier. By releasing a frontier-class reasoning model under MIT, R1 accelerated the open-weight movement and set the template that Alibaba’s Qwen, Moonshot’s Kimi and Zhipu’s GLM now follow.
Benchmark performance
DeepSeek positioned R1 at parity with OpenAI’s o1 at launch; the May 2025 R1-0528 update pushed it further.
| Benchmark | DeepSeek-R1 | R1-0528 |
|---|---|---|
| AIME | 79.8% (2024) | 87.5% (2025) |
| MATH-500 | 97.3% | — |
| GPQA Diamond | 71.5% | 81.0% |
| Codeforces (rating) | 2029 | — |
| MMLU | 90.8% | — |
| SWE-bench Verified | — | 57.6% |
These were frontier-class in 2025 but are now well behind the current field. For context, DeepSeek V4-Pro — R1’s effective successor — scores around 80.6% on SWE-bench Verified and sits near the top of the open-weight tier on the independent Artificial Analysis index. See best AI models for current standings.
What replaced it
DeepSeek never released a “DeepSeek-R2.” Instead, the reasoning capability pioneered in R1 was merged into the main line:
- DeepSeek-V3.2 (December 2025) integrated thinking directly with tool use.
- DeepSeek V4 (April 2026) — V4-Pro and V4-Flash — made thinking a per-request setting (
reasoning_effortup tomax), unifying the general and reasoning lineages in one model.
As a result, the standalone deepseek-reasoner (R1) endpoint now routes to V4-Flash and will be fully retired after 24 July 2026 (DeepSeek). If you want DeepSeek reasoning today, use V4-Pro in thinking mode; R1’s open weights and distills remain on Hugging Face for self-hosting and research.
Controversies
- Distillation allegations. OpenAI and Microsoft alleged in 2025 that R1 was partly trained on ChatGPT outputs, a claim OpenAI escalated to US lawmakers in 2026; DeepSeek denies it (Rest of World).
- Government bans and data residency. Because the hosted app stores data on servers in China, the US (federal agencies and several states), Italy, Australia, South Korea, the Czech Republic and others restricted or banned it (BankInfoSecurity). Self-hosting the open weights avoids this.
- Content controls. The hosted model aligns with Chinese content rules and avoids politically sensitive topics.
FAQ
What is DeepSeek-R1?
DeepSeek-R1 is the open-weight (MIT) reasoning model DeepSeek released in January 2025. A 671B-parameter Mixture-of-Experts model trained largely by reinforcement learning, it matched OpenAI’s o1 on maths, code and reasoning and became a landmark in open AI.
Is there a DeepSeek-R2?
No. DeepSeek never released an R2. After R1 and the R1-0528 update, the company folded reasoning into its main line, and the current best DeepSeek reasoning model is DeepSeek V4-Pro in thinking mode.
Can I still use DeepSeek-R1?
The hosted deepseek-reasoner endpoint is being retired after 24 July 2026 and currently routes to DeepSeek V4-Flash. The MIT open weights and the six distilled variants remain freely available on Hugging Face for self-hosting.
Why did DeepSeek-R1 crash Nvidia’s stock?
Its January 2025 release showed a Chinese lab matching OpenAI’s o1 cheaply and openly, which made investors question the need for massive AI hardware spend. Nvidia lost close to $600 billion in market value in a single day — the largest one-day loss in stock-market history.
How was DeepSeek-R1 trained?
Largely through reinforcement learning. The R1-Zero variant used pure RL with no supervised fine-tuning; full R1 added cold-start data and alternating RL and supervised stages, using DeepSeek’s GRPO algorithm. The work was peer-reviewed and published in Nature in September 2025.
Last verified 19 June 2026. R1’s architecture, MIT licence, 128K context and launch benchmarks are confirmed via DeepSeek’s Hugging Face model card and the Nature paper; the endpoint-retirement date and V4 succession are from DeepSeek’s official V4 launch notes. R1 is a historic, superseded model — for current DeepSeek reasoning use V4-Pro. Benchmark numbers are DeepSeek-reported from 2025 and should be read as period figures.