GPT-5.2
Specifications
| Model ID | gpt-5.2 |
| Provider | OpenAI |
| Architecture | transformer |
| Context Window | 400K tokens |
| Max Input | 272K tokens |
| Max Output | 128K tokens |
| Knowledge Cutoff | 2025-08-31 |
| License | proprietary |
| Open Weights | No |
Capabilities
Modalities
Reasoning
Features
Variants
| VARIANT | API ID |
|---|---|
| GPT-5.2 Thinking | gpt-5.2 |
| GPT-5.2 Instant | gpt-5.2-chat-latest |
| GPT-5.2 Pro | gpt-5.2-pro |
| GPT-5.2 (Dated Snapshot) | gpt-5.2-2025-12-11 |
API Pricing
40% price increase over GPT-5.1. GPT-5.2 Pro available only via Responses API. Distillation supported.
ChatGPT Access
| TIER | PRICE |
|---|---|
| Free | Free |
| Plus | $20/mo |
| Pro Pro Model | $200/mo |
| Business | Free |
| Enterprise | Free |
Benchmarks
Coding
| SWE-bench Verified | 80% |
Reasoning
| GPQA Diamond | 92.4% |
| ARC-AGI-2 | 52.9% |
Math
| AIME 2025 | 100% |
SWE-bench Pro is contamination-resistant and more industrially relevant than SWE-bench Verified. ARC-AGI-2 jumped 35 points from GPT-5.1's 17.6%. GDPval is OpenAI's proprietary benchmark awaiting independent validation. Third-party Vals.ai shows 75.40% on SWE-bench.
GPT-5.2 is OpenAI’s response to competitive pressure from Google’s Gemini 3, released December 11, 2025—just four weeks after GPT-5.1 and days after CEO Sam Altman declared an internal “code red.” Internally codenamed “Garlic,” the model targets professional knowledge work with a 400,000-token context window, 38% fewer errors than its predecessor, and a new three-tier architecture (Instant, Thinking, Pro) designed for enterprise workflows.
At 80.0% on SWE-bench Verified, GPT-5.2 nearly matches Claude Opus 4.5’s industry-leading 80.9%—but achieves state-of-the-art on SWE-bench Pro (55.6%) and ARC-AGI-2 abstract reasoning (52.9%), a 35-point jump from GPT-5.1’s 17.6%. The trade-off: a 40% price increase and widespread complaints about “robotic” tone in casual conversations.
Quick specs
| Provider | OpenAI |
| Released | December 11, 2025 |
| Context window | 400K tokens (128K output) |
| Knowledge cutoff | August 31, 2025 |
| Input price | $1.75 / MTok |
| Output price | $14.00 / MTok |
| Cached input | $0.175 / MTok (90% discount) |
| SWE-bench Verified | 80.0% |
| SWE-bench Pro | 55.6% (state-of-the-art) |
| ARC-AGI-2 | 52.9% (up from 17.6%) |
| AIME 2025 | 100% (perfect score) |
| Best for | Enterprise coding, complex reasoning, professional documents |
| Limitations | 40% more expensive, slower Thinking mode, stricter guardrails |
What’s new in GPT-5.2
GPT-5.2 represents OpenAI’s strategic pivot toward enterprise-grade reasoning and professional workflows. The improvements fall into four categories.
Three-tier architecture
The model now ships as three distinct variants rather than a unified system:
GPT-5.2 Instant (gpt-5.2-chat-latest) handles everyday tasks—writing, translation, quick questions—with low-latency responses. It trades reasoning depth for speed.
GPT-5.2 Thinking (gpt-5.2) provides deep chain-of-thought reasoning for coding, maths, and multi-step projects. Supports five reasoning effort levels: none, low, medium, high, and the new xhigh setting for maximum compute.
GPT-5.2 Pro (gpt-5.2-pro) deploys maximum compute for tasks where “failure is not an option.” Processing can take up to 30 minutes for complex problems. Available only via the Responses API.
Dramatic reasoning improvements
The headline number: GPT-5.2 achieves 52.9% on ARC-AGI-2, up from GPT-5.1’s 17.6%—a 35-point improvement in abstract reasoning. GPT-5.2 Pro became the first model to cross 90% on ARC-AGI-1.
On FrontierMath (research-level mathematics), GPT-5.2 scores 40.3% versus GPT-5.1’s 31.0%. AIME 2025 (competition mathematics) hits a perfect 100% without tools.
Error reduction and accuracy
GPT-5.2 Thinking produces 38% fewer errors than GPT-5.1, with response error rate dropping from 8.8% to 6.2%. Hallucinations decreased approximately 30%. On OpenAI’s GDPval benchmark measuring professional knowledge work across 44 occupations, GPT-5.2 achieves a 70.9% win rate against industry professionals—up from 38.8% for GPT-5.
New developer tools
Several API features target agentic workflows:
apply_patchtool — Structured diffs for iterative code editing without full file rewriteslocal_shelltool — Command-line interaction for autonomous agents/compactendpoint — Context compaction for effective extension beyond 400K tokens- Preambles — Brief explanations before tool invocations for better chain-of-thought visibility
- Concise reasoning summaries — Improved visibility into model thinking
The GPT-5.2 model family
| Variant | API Identifier | Purpose | Notes |
|---|---|---|---|
| GPT-5.2 Instant | gpt-5.2-chat-latest | Low-latency everyday tasks | Writing, translation, quick questions |
| GPT-5.2 Thinking | gpt-5.2 | Deep reasoning with chain-of-thought | Coding, maths, complex work |
| GPT-5.2 Pro | gpt-5.2-pro | Maximum compute, highest accuracy | Responses API only, up to 30 min |
| GPT-5.2 (Dated) | gpt-5.2-2025-12-11 | Reproducible outputs | Dated snapshot |
A GPT-5.2 Codex variant optimised for agentic coding is expected “in the coming weeks.” Until then, GPT-5.1-Codex-Max remains recommended for Codex environments.
Benchmark performance
GPT-5.2 delivers genuine state-of-the-art results on several benchmarks while narrowly trailing Claude Opus 4.5 on others. Independent testing shows smaller improvements than OpenAI’s marketing in some areas.
Coding benchmarks
| Model | SWE-bench Verified | SWE-bench Pro | Notes |
|---|---|---|---|
| Claude Opus 4.5 | 80.9% | — | Industry leader on Verified |
| GPT-5.2 | 80.0% | 55.6% | State-of-the-art on Pro |
| Claude Sonnet 4.5 | 77.2% | — | Best value for coding |
| GPT-5.1 | 76.3% | 50.8% | — |
| Gemini 3 Pro | 76.2% | 43.4% | — |
Critical context: SWE-bench Pro is more contamination-resistant and “industrially relevant” than SWE-bench Verified. GPT-5.2’s leadership here matters for real-world enterprise coding. However, third-party Vals.ai shows GPT-5.2 at 75.40% on SWE-bench—below OpenAI’s claimed 80%.
Science and maths
| Benchmark | GPT-5.2 Thinking | GPT-5.2 Pro | Best in class |
|---|---|---|---|
| GPQA Diamond | 92.4% | 93.2% | Gemini 3 Deep Think (93.8%) |
| AIME 2025 | 100% | 100% | Perfect score, no tools |
| FrontierMath | 40.3% | — | Up from 31.0% |
| Humanity’s Last Exam | 34.5% | 36.6% | Gemini Deep Think (41.0%) |
GPT-5.2’s perfect AIME 2025 score demonstrates frontier-level mathematical reasoning. FrontierMath improvement (+9.3 points) shows genuine progress on research-level problems.
Abstract reasoning
| Benchmark | GPT-5.2 Thinking | GPT-5.2 Pro | GPT-5.1 |
|---|---|---|---|
| ARC-AGI-1 | 86.2% | 90.5% | — |
| ARC-AGI-2 | 52.9% | 54.2% | 17.6% |
The ARC-AGI-2 improvement is the release’s most striking result—a 35-point jump suggesting major advances in abstract reasoning. GPT-5.2 Pro became the first model to cross 90% on ARC-AGI-1.
Visual reasoning
| Benchmark | GPT-5.2 | Claude Opus 4.5 | Gemini 3 Pro |
|---|---|---|---|
| MMMU-Pro | 86.5% | 80.7% | — |
| Video-MMMU | 90.5% | — | 87.6% |
GPT-5.2 leads on visual reasoning benchmarks despite no new image generation capabilities. Chart reasoning and UI understanding error rates dropped approximately 50%.
Pricing breakdown
GPT-5.2 carries a 40% price increase over GPT-5.1. OpenAI claims greater token efficiency offsets the increase.
| Tier | Input (per MTok) | Output (per MTok) |
|---|---|---|
| GPT-5.2 Thinking | $1.75 | $14.00 |
| GPT-5.2 Pro | $21.00 | $168.00 |
| Cached input | $0.175 | — |
| Batch API | $0.875 | $7.00 |
Cost comparison with competitors
| Model | Input | Output | Notes |
|---|---|---|---|
| GPT-5.2 | $1.75 | $14.00 | 40% more than GPT-5.1 |
| GPT-5.1 | $1.25 | $10.00 | Available 3 months |
| Claude Opus 4.5 | $5.00 | $25.00 | 2.9x more expensive input |
| Claude Sonnet 4.5 | $3.00 | $15.00 | Best coding value |
| Gemini 3 Pro | ~$2.00 | ~$12.00 | Competitive |
GPT-5.2 undercuts Claude on input tokens but sits mid-market on output pricing. For cost-sensitive deployments, GPT-5.1 remains available and represents better value unless you need the reasoning improvements.
Hidden cost: reasoning tokens
Be aware that hidden reasoning tokens can inflate costs significantly for complex tasks. GPT-5.2’s Thinking mode generates internal “thinking” tokens that count against your output quota but aren’t visible in responses. The new xhigh reasoning effort level uses substantially more compute.
How to access GPT-5.2
Via API
GPT-5.2 is generally available with no waitlist. Basic usage:
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-5.2",
messages=[{"role": "user", "content": "Your prompt here"}],
reasoning_effort="medium" # none, low, medium, high, xhigh
)
For GPT-5.2 Pro (Responses API only):
response = client.responses.create(
model="gpt-5.2-pro",
input="Your complex problem here"
)
Rate limits scale with spend: 500K tokens/minute at Tier 1 ($5 spent), up to 40M tokens/minute at Tier 5 ($1,000+ spent).
Via ChatGPT
Access varies by subscription:
| Tier | Context | Rate limit | Cost |
|---|---|---|---|
| Free | 8K tokens | 10 msgs / 5 hours | $0 |
| Plus | 32K tokens | 160 msgs / 3 hours | $20/mo |
| Pro | 400K tokens | Unlimited | $200/mo |
| Business/Enterprise | 400K tokens | Unlimited | Custom |
Free and Plus users fall back to GPT-mini when limits are reached. Pro subscribers get full access to GPT-5.2 Pro.
Via integrations
- GitHub Copilot — Public preview via BYOK (bring your own key)
- Azure AI Foundry — Available
- Microsoft 365 Copilot — Integration underway
How GPT-5.2 compares
vs Claude Opus 4.5
Claude narrowly leads on SWE-bench Verified coding (80.9% vs 80.0%) and offers superior prompt injection resistance (~95%). Claude can sustain 30+ hour autonomous operation for complex projects.
However, GPT-5.2 dominates on abstract reasoning (ARC-AGI-2: 54.2% vs 37.6%) and offers 2x the context window (400K vs 200K standard). GPT-5.2 also wins on SWE-bench Pro (55.6% vs unavailable).
Choose Claude for: complex refactoring, multi-file codebases, long-running autonomous agents, when code quality matters more than cost.
Choose GPT-5.2 for: abstract reasoning tasks, large document analysis, professional knowledge work, when you need maximum context.
vs Gemini 3 Pro
Gemini leads on Humanity’s Last Exam (41.0% vs 36.6%) and offers a 1 million token context window—2.5x GPT-5.2’s capacity. Gemini’s multimodal capabilities (native video and audio processing) surpass GPT-5.2’s text+image limitation.
However, GPT-5.2 outperforms on SWE-bench Pro (55.6% vs 43.4%) and ARC-AGI-2 reasoning (54.2% vs 45.1%).
Choose Gemini for: massive document processing, multimodal workflows with video/audio, when you need 1M+ context.
Choose GPT-5.2 for: enterprise coding, abstract reasoning, professional document generation.
The practical consensus
From community feedback: GPT-5.2 excels at full-codebase analysis, spreadsheet generation, and professional document work. Claude Opus 4.5 leads for iterative coding and long-running autonomous agents. Gemini 3 Pro dominates for massive document processing and multimodal workflows. No single model dominates across all dimensions.
Known limitations
Independent testing and community reports reveal several weaknesses:
Speed penalties in Thinking mode. Developer Matt Shumer noted it’s “very slow for most questions.” GPT-5.2 Pro can take up to 30 minutes for complex problems.
Instant mode quality concerns. Multiple sources report GPT-5.2 Instant feels “bland,” refuses more requests, and “sounds like someone who just finished corporate compliance training.” OpenAI’s own system card acknowledges “regressions in certain modes.”
Stricter content filtering. Developers report frustration with guardrails triggering on mundane conversations. Reddit threads describe conversations as less “open and free” compared to earlier GPT-5 releases.
Hallucination rate with low thinking effort. GPT-5.2 with low thinking shows an 8.4% hallucination rate—higher than DeepSeek V3.2’s 6.3%.
Missing features. No image generation improvements despite competitive pressure. Canvas is unavailable with GPT-5.2 Pro. No audio modality support.
Context handling imperfections. Real-world testing found the model “ignored repeated information when earlier context contradicted later details” in messy documents.
Community reception
The developer community response is sharply divided between enterprise users and casual consumers.
The positives
Enterprise and developer users report concrete wins:
- Aaron Levie (Box CEO): GPT-5.2 performs “7 points better than GPT-5.1” on reasoning tests, with complex extraction tasks dropping from 46 seconds to 12 seconds
- Dan Shipper (Every CEO): The model “worked for two hours straight” analysing a P&L statement with accurate results
- Augment Code: Selected GPT-5.2 specifically for their code review agent, citing “substantially stronger deep code capabilities”
The negatives
Consumer and creative users tell a different story. Reddit threads with thousands of upvotes describe GPT-5.2 as “boring,” “robotic,” and “everything I hate about 5 and 5.1, but worse.” Common complaints:
- Rigid tone and excessive markdown formatting (one user received “58 bullets and numbered points” for a simple question)
- Stricter content filtering than predecessors
- Loss of conversational warmth from GPT-4o era
Developer Mehul Gupta reported the model “invented APIs that didn’t exist” and missed document clause references even after explicit correction. Katie Parrott (Every) found GPT-5.2 “less resourceful” than Claude Opus 4.5—when asked to deduce information from email data, “Opus 4.5 thought to search his email and nailed it. GPT-5.2 didn’t think to try.”
The verdict
The consensus: GPT-5.2 optimises for enterprise power users rather than casual chat. It’s genuinely better at professional knowledge work but worse at being a conversational companion.
Version history
| Version | Released | Key changes |
|---|---|---|
| GPT-5.2 | Dec 11, 2025 | Three-tier architecture, 38% fewer errors, ARC-AGI-2 breakthrough |
| GPT-5.1-Codex-Max | Nov 19, 2025 | Flagship agentic coding, 24+ hour tasks |
| GPT-5.1 | Nov 12-13, 2025 | Adaptive reasoning, 24h caching, 2-3x speed |
| GPT-5 | Aug 2025 | Initial GPT-5 release |
| GPT-4.1 | Apr 2025 | 1M context, agentic focus |
| GPT-4o | Nov 2024 | Multimodal, 128K context |
GPT-5.1 remains available for approximately 3 months post-launch.
FAQ
Is GPT-5.2 better than Claude Opus 4.5?
For abstract reasoning and large document analysis—yes. GPT-5.2 leads on ARC-AGI-2 (54.2% vs 37.6%) and offers 2x the context window. For coding, Claude narrowly leads on SWE-bench Verified (80.9% vs 80.0%) but GPT-5.2 leads on SWE-bench Pro (55.6%).
How much does GPT-5.2 cost?
$1.75 per million input tokens, $14.00 per million output tokens. That’s 40% more than GPT-5.1. Cached inputs drop to $0.175/MTok (90% off). GPT-5.2 Pro costs $21.00/$168.00.
Can I use GPT-5.2 for free?
Yes, but with tight limits: 10 messages per 5 hours on ChatGPT Free, then you fall back to GPT-mini. API access requires payment.
What’s the difference between GPT-5.2 Thinking and Pro?
Thinking mode handles most reasoning tasks with configurable effort levels (none through xhigh). Pro mode uses maximum compute for highest accuracy but can take up to 30 minutes and costs 12x more. Pro is only available via the Responses API.
Is GPT-5.2 worth the 40% price increase over GPT-5.1?
If you need the abstract reasoning improvements (ARC-AGI-2 jumped 35 points) or require state-of-the-art SWE-bench Pro performance—yes. For general use, GPT-5.1 remains excellent value and stays available for 3 months.
When will GPT-5.2 Codex be available?
OpenAI says “in the coming weeks.” Until then, GPT-5.1-Codex-Max remains recommended for agentic coding in Codex environments.
Why do people say GPT-5.2 feels “robotic”?
OpenAI optimised GPT-5.2 for professional knowledge work, prioritising accuracy and structured outputs. The trade-off is a more formal, less conversational tone that many casual users find less engaging than earlier models.
Official links
| Resource | URL |
|---|---|
| Announcement | openai.com/index/introducing-gpt-5-2 |
| Science & Math | openai.com/index/gpt-5-2-for-science-and-math |
| API Documentation | platform.openai.com/docs/guides/latest-model |
| Model Reference | platform.openai.com/docs/models/gpt-5.2-pro |
| ChatGPT | chat.openai.com |
| Pricing | openai.com/api/pricing |
| GitHub Copilot Preview | github.blog/changelog/2025-12-11 |