GPT-5.2: Complete Guide to OpenAI's December 2025 Flagship - Pricing, Benchmarks & Review

VARIANT	API ID	DESCRIPTION	SWE
GPT-5.2 Thinking	`gpt-5.2`	Deep reasoning with chain-of-thought for coding, math, and complex work	—
GPT-5.2 Instant	`gpt-5.2-chat-latest`	Low-latency responses for everyday tasks—writing, translation, quick questions	—
GPT-5.2 Pro	`gpt-5.2-pro`	Maximum compute for highest-accuracy responses, up to 30 min processing	—
GPT-5.2 (Dated Snapshot)	`gpt-5.2-2025-12-11`	Dated snapshot for reproducible outputs	—

TIER	PRICE	CONTEXT	RATE LIMIT
Free	Free	8K	10 messages per 5 hours
Plus	$20/mo	32K	160 messages per 3 hours
Pro Pro Model	$200/mo	400K	Unlimited (abuse guardrails)
Business	Free	400K	Unlimited
Enterprise	Free	400K	Unlimited

GPT-5.2 is OpenAI’s response to competitive pressure from Google’s Gemini 3, released December 11, 2025—just four weeks after GPT-5.1 and days after CEO Sam Altman declared an internal “code red.” Internally codenamed “Garlic,” the model targets professional knowledge work with a 400,000-token context window, 38% fewer errors than its predecessor, and a new three-tier architecture (Instant, Thinking, Pro) designed for enterprise workflows.

At 80.0% on SWE-bench Verified, GPT-5.2 nearly matches Claude Opus 4.5’s industry-leading 80.9%—but achieves state-of-the-art on SWE-bench Pro (55.6%) and ARC-AGI-2 abstract reasoning (52.9%), a 35-point jump from GPT-5.1’s 17.6%. The trade-off: a 40% price increase and widespread complaints about “robotic” tone in casual conversations.

Quick specs


Provider	OpenAI
Released	December 11, 2025
Context window	400K tokens (128K output)
Knowledge cutoff	August 31, 2025
Input price	$1.75 / MTok
Output price	$14.00 / MTok
Cached input	$0.175 / MTok (90% discount)
SWE-bench Verified	80.0%
SWE-bench Pro	55.6% (state-of-the-art)
ARC-AGI-2	52.9% (up from 17.6%)
AIME 2025	100% (perfect score)
Best for	Enterprise coding, complex reasoning, professional documents
Limitations	40% more expensive, slower Thinking mode, stricter guardrails

TRY GPT-5.2 →

What’s new in GPT-5.2

GPT-5.2 represents OpenAI’s strategic pivot toward enterprise-grade reasoning and professional workflows. The improvements fall into four categories.

Three-tier architecture

The model now ships as three distinct variants rather than a unified system:

GPT-5.2 Instant (gpt-5.2-chat-latest) handles everyday tasks—writing, translation, quick questions—with low-latency responses. It trades reasoning depth for speed.

GPT-5.2 Thinking (gpt-5.2) provides deep chain-of-thought reasoning for coding, maths, and multi-step projects. Supports five reasoning effort levels: none, low, medium, high, and the new xhigh setting for maximum compute.

GPT-5.2 Pro (gpt-5.2-pro) deploys maximum compute for tasks where “failure is not an option.” Processing can take up to 30 minutes for complex problems. Available only via the Responses API.

Dramatic reasoning improvements

The headline number: GPT-5.2 achieves 52.9% on ARC-AGI-2, up from GPT-5.1’s 17.6%—a 35-point improvement in abstract reasoning. GPT-5.2 Pro became the first model to cross 90% on ARC-AGI-1.

On FrontierMath (research-level mathematics), GPT-5.2 scores 40.3% versus GPT-5.1’s 31.0%. AIME 2025 (competition mathematics) hits a perfect 100% without tools.

Error reduction and accuracy

GPT-5.2 Thinking produces 38% fewer errors than GPT-5.1, with response error rate dropping from 8.8% to 6.2%. Hallucinations decreased approximately 30%. On OpenAI’s GDPval benchmark measuring professional knowledge work across 44 occupations, GPT-5.2 achieves a 70.9% win rate against industry professionals—up from 38.8% for GPT-5.

New developer tools

Several API features target agentic workflows:

apply_patch tool — Structured diffs for iterative code editing without full file rewrites
local_shell tool — Command-line interaction for autonomous agents
/compact endpoint — Context compaction for effective extension beyond 400K tokens
Preambles — Brief explanations before tool invocations for better chain-of-thought visibility
Concise reasoning summaries — Improved visibility into model thinking

The GPT-5.2 model family

Variant	API Identifier	Purpose	Notes
GPT-5.2 Instant	`gpt-5.2-chat-latest`	Low-latency everyday tasks	Writing, translation, quick questions
GPT-5.2 Thinking	`gpt-5.2`	Deep reasoning with chain-of-thought	Coding, maths, complex work
GPT-5.2 Pro	`gpt-5.2-pro`	Maximum compute, highest accuracy	Responses API only, up to 30 min
GPT-5.2 (Dated)	`gpt-5.2-2025-12-11`	Reproducible outputs	Dated snapshot

A GPT-5.2 Codex variant optimised for agentic coding is expected “in the coming weeks.” Until then, GPT-5.1-Codex-Max remains recommended for Codex environments.

Benchmark performance

GPT-5.2 delivers genuine state-of-the-art results on several benchmarks while narrowly trailing Claude Opus 4.5 on others. Independent testing shows smaller improvements than OpenAI’s marketing in some areas.

Coding benchmarks

Model	SWE-bench Verified	SWE-bench Pro	Notes
Claude Opus 4.5	80.9%	—	Industry leader on Verified
GPT-5.2	80.0%	55.6%	State-of-the-art on Pro
Claude Sonnet 4.5	77.2%	—	Best value for coding
GPT-5.1	76.3%	50.8%	—
Gemini 3 Pro	76.2%	43.4%	—

Critical context: SWE-bench Pro is more contamination-resistant and “industrially relevant” than SWE-bench Verified. GPT-5.2’s leadership here matters for real-world enterprise coding. However, third-party Vals.ai shows GPT-5.2 at 75.40% on SWE-bench—below OpenAI’s claimed 80%.

Science and maths

Benchmark	GPT-5.2 Thinking	GPT-5.2 Pro	Best in class
GPQA Diamond	92.4%	93.2%	Gemini 3 Deep Think (93.8%)
AIME 2025	100%	100%	Perfect score, no tools
FrontierMath	40.3%	—	Up from 31.0%
Humanity’s Last Exam	34.5%	36.6%	Gemini Deep Think (41.0%)

GPT-5.2’s perfect AIME 2025 score demonstrates frontier-level mathematical reasoning. FrontierMath improvement (+9.3 points) shows genuine progress on research-level problems.

Abstract reasoning

Benchmark	GPT-5.2 Thinking	GPT-5.2 Pro	GPT-5.1
ARC-AGI-1	86.2%	90.5%	—
ARC-AGI-2	52.9%	54.2%	17.6%

The ARC-AGI-2 improvement is the release’s most striking result—a 35-point jump suggesting major advances in abstract reasoning. GPT-5.2 Pro became the first model to cross 90% on ARC-AGI-1.

Visual reasoning

Benchmark	GPT-5.2	Claude Opus 4.5	Gemini 3 Pro
MMMU-Pro	86.5%	80.7%	—
Video-MMMU	90.5%	—	87.6%

GPT-5.2 leads on visual reasoning benchmarks despite no new image generation capabilities. Chart reasoning and UI understanding error rates dropped approximately 50%.

Pricing breakdown

GPT-5.2 carries a 40% price increase over GPT-5.1. OpenAI claims greater token efficiency offsets the increase.

Tier	Input (per MTok)	Output (per MTok)
GPT-5.2 Thinking	$1.75	$14.00
GPT-5.2 Pro	$21.00	$168.00
Cached input	$0.175	—
Batch API	$0.875	$7.00

Cost comparison with competitors

Model	Input	Output	Notes
GPT-5.2	$1.75	$14.00	40% more than GPT-5.1
GPT-5.1	$1.25	$10.00	Available 3 months
Claude Opus 4.5	$5.00	$25.00	2.9x more expensive input
Claude Sonnet 4.5	$3.00	$15.00	Best coding value
Gemini 3 Pro	~$2.00	~$12.00	Competitive

GPT-5.2 undercuts Claude on input tokens but sits mid-market on output pricing. For cost-sensitive deployments, GPT-5.1 remains available and represents better value unless you need the reasoning improvements.

Hidden cost: reasoning tokens

Be aware that hidden reasoning tokens can inflate costs significantly for complex tasks. GPT-5.2’s Thinking mode generates internal “thinking” tokens that count against your output quota but aren’t visible in responses. The new xhigh reasoning effort level uses substantially more compute.

How to access GPT-5.2

Via API

GPT-5.2 is generally available with no waitlist. Basic usage:

from openai import OpenAI
client = OpenAI()

response = client.chat.completions.create(
    model="gpt-5.2",
    messages=[{"role": "user", "content": "Your prompt here"}],
    reasoning_effort="medium"  # none, low, medium, high, xhigh
)

For GPT-5.2 Pro (Responses API only):

response = client.responses.create(
    model="gpt-5.2-pro",
    input="Your complex problem here"
)

Rate limits scale with spend: 500K tokens/minute at Tier 1 ($5 spent), up to 40M tokens/minute at Tier 5 ($1,000+ spent).

Via ChatGPT

Access varies by subscription:

Tier	Context	Rate limit	Cost
Free	8K tokens	10 msgs / 5 hours	$0
Plus	32K tokens	160 msgs / 3 hours	$20/mo
Pro	400K tokens	Unlimited	$200/mo
Business/Enterprise	400K tokens	Unlimited	Custom

Free and Plus users fall back to GPT-mini when limits are reached. Pro subscribers get full access to GPT-5.2 Pro.

Via integrations

GitHub Copilot — Public preview via BYOK (bring your own key)
Azure AI Foundry — Available
Microsoft 365 Copilot — Integration underway

How GPT-5.2 compares

vs Claude Opus 4.5

Claude narrowly leads on SWE-bench Verified coding (80.9% vs 80.0%) and offers superior prompt injection resistance (~95%). Claude can sustain 30+ hour autonomous operation for complex projects.

However, GPT-5.2 dominates on abstract reasoning (ARC-AGI-2: 54.2% vs 37.6%) and offers 2x the context window (400K vs 200K standard). GPT-5.2 also wins on SWE-bench Pro (55.6% vs unavailable).

Choose Claude for: complex refactoring, multi-file codebases, long-running autonomous agents, when code quality matters more than cost.

Choose GPT-5.2 for: abstract reasoning tasks, large document analysis, professional knowledge work, when you need maximum context.

vs Gemini 3 Pro

Gemini leads on Humanity’s Last Exam (41.0% vs 36.6%) and offers a 1 million token context window—2.5x GPT-5.2’s capacity. Gemini’s multimodal capabilities (native video and audio processing) surpass GPT-5.2’s text+image limitation.

However, GPT-5.2 outperforms on SWE-bench Pro (55.6% vs 43.4%) and ARC-AGI-2 reasoning (54.2% vs 45.1%).

Choose Gemini for: massive document processing, multimodal workflows with video/audio, when you need 1M+ context.

Choose GPT-5.2 for: enterprise coding, abstract reasoning, professional document generation.

The practical consensus

From community feedback: GPT-5.2 excels at full-codebase analysis, spreadsheet generation, and professional document work. Claude Opus 4.5 leads for iterative coding and long-running autonomous agents. Gemini 3 Pro dominates for massive document processing and multimodal workflows. No single model dominates across all dimensions.

Known limitations

Independent testing and community reports reveal several weaknesses:

Speed penalties in Thinking mode. Developer Matt Shumer noted it’s “very slow for most questions.” GPT-5.2 Pro can take up to 30 minutes for complex problems.

Instant mode quality concerns. Multiple sources report GPT-5.2 Instant feels “bland,” refuses more requests, and “sounds like someone who just finished corporate compliance training.” OpenAI’s own system card acknowledges “regressions in certain modes.”

Stricter content filtering. Developers report frustration with guardrails triggering on mundane conversations. Reddit threads describe conversations as less “open and free” compared to earlier GPT-5 releases.

Hallucination rate with low thinking effort. GPT-5.2 with low thinking shows an 8.4% hallucination rate—higher than DeepSeek V3.2’s 6.3%.

Missing features. No image generation improvements despite competitive pressure. Canvas is unavailable with GPT-5.2 Pro. No audio modality support.

Context handling imperfections. Real-world testing found the model “ignored repeated information when earlier context contradicted later details” in messy documents.

Community reception

The developer community response is sharply divided between enterprise users and casual consumers.

The positives

Enterprise and developer users report concrete wins:

Aaron Levie (Box CEO): GPT-5.2 performs “7 points better than GPT-5.1” on reasoning tests, with complex extraction tasks dropping from 46 seconds to 12 seconds
Dan Shipper (Every CEO): The model “worked for two hours straight” analysing a P&L statement with accurate results
Augment Code: Selected GPT-5.2 specifically for their code review agent, citing “substantially stronger deep code capabilities”

The negatives

Consumer and creative users tell a different story. Reddit threads with thousands of upvotes describe GPT-5.2 as “boring,” “robotic,” and “everything I hate about 5 and 5.1, but worse.” Common complaints:

Rigid tone and excessive markdown formatting (one user received “58 bullets and numbered points” for a simple question)
Stricter content filtering than predecessors
Loss of conversational warmth from GPT-4o era

Developer Mehul Gupta reported the model “invented APIs that didn’t exist” and missed document clause references even after explicit correction. Katie Parrott (Every) found GPT-5.2 “less resourceful” than Claude Opus 4.5—when asked to deduce information from email data, “Opus 4.5 thought to search his email and nailed it. GPT-5.2 didn’t think to try.”

The verdict

The consensus: GPT-5.2 optimises for enterprise power users rather than casual chat. It’s genuinely better at professional knowledge work but worse at being a conversational companion.

Version history

Version	Released	Key changes
GPT-5.2	Dec 11, 2025	Three-tier architecture, 38% fewer errors, ARC-AGI-2 breakthrough
GPT-5.1-Codex-Max	Nov 19, 2025	Flagship agentic coding, 24+ hour tasks
GPT-5.1	Nov 12-13, 2025	Adaptive reasoning, 24h caching, 2-3x speed
GPT-5	Aug 2025	Initial GPT-5 release
GPT-4.1	Apr 2025	1M context, agentic focus
GPT-4o	Nov 2024	Multimodal, 128K context

GPT-5.1 remains available for approximately 3 months post-launch.

FAQ

Is GPT-5.2 better than Claude Opus 4.5?

For abstract reasoning and large document analysis—yes. GPT-5.2 leads on ARC-AGI-2 (54.2% vs 37.6%) and offers 2x the context window. For coding, Claude narrowly leads on SWE-bench Verified (80.9% vs 80.0%) but GPT-5.2 leads on SWE-bench Pro (55.6%).

How much does GPT-5.2 cost?

$1.75 per million input tokens, $14.00 per million output tokens. That’s 40% more than GPT-5.1. Cached inputs drop to $0.175/MTok (90% off). GPT-5.2 Pro costs $21.00/$168.00.

Can I use GPT-5.2 for free?

Yes, but with tight limits: 10 messages per 5 hours on ChatGPT Free, then you fall back to GPT-mini. API access requires payment.

What’s the difference between GPT-5.2 Thinking and Pro?

Thinking mode handles most reasoning tasks with configurable effort levels (none through xhigh). Pro mode uses maximum compute for highest accuracy but can take up to 30 minutes and costs 12x more. Pro is only available via the Responses API.

Is GPT-5.2 worth the 40% price increase over GPT-5.1?

If you need the abstract reasoning improvements (ARC-AGI-2 jumped 35 points) or require state-of-the-art SWE-bench Pro performance—yes. For general use, GPT-5.1 remains excellent value and stays available for 3 months.

When will GPT-5.2 Codex be available?

OpenAI says “in the coming weeks.” Until then, GPT-5.1-Codex-Max remains recommended for agentic coding in Codex environments.

Why do people say GPT-5.2 feels “robotic”?

OpenAI optimised GPT-5.2 for professional knowledge work, prioritising accuracy and structured outputs. The trade-off is a more formal, less conversational tone that many casual users find less engaging than earlier models.

Official links

Resource	URL
Announcement	openai.com/index/introducing-gpt-5-2
Science & Math	openai.com/index/gpt-5-2-for-science-and-math
API Documentation	platform.openai.com/docs/guides/latest-model
Model Reference	platform.openai.com/docs/models/gpt-5.2-pro
ChatGPT	chat.openai.com
Pricing	openai.com/api/pricing
GitHub Copilot Preview	github.blog/changelog/2025-12-11

Model ID	`gpt-5.2`
Provider	OpenAI
Architecture	transformer
Context Window	400K tokens
Max Input	272K tokens
Max Output	128K tokens
Knowledge Cutoff	2025-08-31
License	proprietary
Open Weights	No

Specifications

Capabilities

Modalities

Reasoning

Features

Variants

API Pricing

ChatGPT Access

Benchmarks

Coding