Specifications

Model ID gpt-5.2
Provider OpenAI
Architecture transformer
Context Window 400K tokens
Max Input 272K tokens
Max Output 128K tokens
Knowledge Cutoff 2025-08-31
License proprietary
Open Weights No

Capabilities

Modalities

Input: text, images
Output: text

Reasoning

Reasoning Model Agentic
Effort Levels: nonelowmediumhighxhigh

Features

function-callingjson-modestructured-outputsvisionstreamingparallel-tool-callingapply-patch-toollocal-shell-toolreasoning-effort-controlprompt-caching-24hcompact-endpointpreamblesdistillation

Variants

VARIANT API ID DESCRIPTION SWE
GPT-5.2 Thinking gpt-5.2 Deep reasoning with chain-of-thought for coding, math, and complex work
GPT-5.2 Instant gpt-5.2-chat-latest Low-latency responses for everyday tasks—writing, translation, quick questions
GPT-5.2 Pro gpt-5.2-pro Maximum compute for highest-accuracy responses, up to 30 min processing
GPT-5.2 (Dated Snapshot) gpt-5.2-2025-12-11 Dated snapshot for reproducible outputs

API Pricing

Input $1.75 per 1M tokens
Output $14 per 1M tokens
Cached Input $0.175 per 1M tokens
Batch Input $0.875 per 1M tokens
Batch Output $7 per 1M tokens

40% price increase over GPT-5.1. GPT-5.2 Pro available only via Responses API. Distillation supported.

ChatGPT Access

TIER PRICE CONTEXT RATE LIMIT
Free Free 8K 10 messages per 5 hours
Plus $20/mo 32K 160 messages per 3 hours
Pro Pro Model $200/mo 400K Unlimited (abuse guardrails)
Business Free 400K Unlimited
Enterprise Free 400K Unlimited

Benchmarks

Coding

SWE-bench Verified 80%

Reasoning

GPQA Diamond 92.4%
ARC-AGI-2 52.9%

Math

AIME 2025 100%

SWE-bench Pro is contamination-resistant and more industrially relevant than SWE-bench Verified. ARC-AGI-2 jumped 35 points from GPT-5.1's 17.6%. GDPval is OpenAI's proprietary benchmark awaiting independent validation. Third-party Vals.ai shows 75.40% on SWE-bench.

GPT-5.2 is OpenAI’s response to competitive pressure from Google’s Gemini 3, released December 11, 2025—just four weeks after GPT-5.1 and days after CEO Sam Altman declared an internal “code red.” Internally codenamed “Garlic,” the model targets professional knowledge work with a 400,000-token context window, 38% fewer errors than its predecessor, and a new three-tier architecture (Instant, Thinking, Pro) designed for enterprise workflows.

At 80.0% on SWE-bench Verified, GPT-5.2 nearly matches Claude Opus 4.5’s industry-leading 80.9%—but achieves state-of-the-art on SWE-bench Pro (55.6%) and ARC-AGI-2 abstract reasoning (52.9%), a 35-point jump from GPT-5.1’s 17.6%. The trade-off: a 40% price increase and widespread complaints about “robotic” tone in casual conversations.

Quick specs

ProviderOpenAI
ReleasedDecember 11, 2025
Context window400K tokens (128K output)
Knowledge cutoffAugust 31, 2025
Input price$1.75 / MTok
Output price$14.00 / MTok
Cached input$0.175 / MTok (90% discount)
SWE-bench Verified80.0%
SWE-bench Pro55.6% (state-of-the-art)
ARC-AGI-252.9% (up from 17.6%)
AIME 2025100% (perfect score)
Best forEnterprise coding, complex reasoning, professional documents
Limitations40% more expensive, slower Thinking mode, stricter guardrails

TRY GPT-5.2 →

What’s new in GPT-5.2

GPT-5.2 represents OpenAI’s strategic pivot toward enterprise-grade reasoning and professional workflows. The improvements fall into four categories.

Three-tier architecture

The model now ships as three distinct variants rather than a unified system:

GPT-5.2 Instant (gpt-5.2-chat-latest) handles everyday tasks—writing, translation, quick questions—with low-latency responses. It trades reasoning depth for speed.

GPT-5.2 Thinking (gpt-5.2) provides deep chain-of-thought reasoning for coding, maths, and multi-step projects. Supports five reasoning effort levels: none, low, medium, high, and the new xhigh setting for maximum compute.

GPT-5.2 Pro (gpt-5.2-pro) deploys maximum compute for tasks where “failure is not an option.” Processing can take up to 30 minutes for complex problems. Available only via the Responses API.

Dramatic reasoning improvements

The headline number: GPT-5.2 achieves 52.9% on ARC-AGI-2, up from GPT-5.1’s 17.6%—a 35-point improvement in abstract reasoning. GPT-5.2 Pro became the first model to cross 90% on ARC-AGI-1.

On FrontierMath (research-level mathematics), GPT-5.2 scores 40.3% versus GPT-5.1’s 31.0%. AIME 2025 (competition mathematics) hits a perfect 100% without tools.

Error reduction and accuracy

GPT-5.2 Thinking produces 38% fewer errors than GPT-5.1, with response error rate dropping from 8.8% to 6.2%. Hallucinations decreased approximately 30%. On OpenAI’s GDPval benchmark measuring professional knowledge work across 44 occupations, GPT-5.2 achieves a 70.9% win rate against industry professionals—up from 38.8% for GPT-5.

New developer tools

Several API features target agentic workflows:

  • apply_patch tool — Structured diffs for iterative code editing without full file rewrites
  • local_shell tool — Command-line interaction for autonomous agents
  • /compact endpoint — Context compaction for effective extension beyond 400K tokens
  • Preambles — Brief explanations before tool invocations for better chain-of-thought visibility
  • Concise reasoning summaries — Improved visibility into model thinking

The GPT-5.2 model family

VariantAPI IdentifierPurposeNotes
GPT-5.2 Instantgpt-5.2-chat-latestLow-latency everyday tasksWriting, translation, quick questions
GPT-5.2 Thinkinggpt-5.2Deep reasoning with chain-of-thoughtCoding, maths, complex work
GPT-5.2 Progpt-5.2-proMaximum compute, highest accuracyResponses API only, up to 30 min
GPT-5.2 (Dated)gpt-5.2-2025-12-11Reproducible outputsDated snapshot

A GPT-5.2 Codex variant optimised for agentic coding is expected “in the coming weeks.” Until then, GPT-5.1-Codex-Max remains recommended for Codex environments.

Benchmark performance

GPT-5.2 delivers genuine state-of-the-art results on several benchmarks while narrowly trailing Claude Opus 4.5 on others. Independent testing shows smaller improvements than OpenAI’s marketing in some areas.

Coding benchmarks

ModelSWE-bench VerifiedSWE-bench ProNotes
Claude Opus 4.580.9%Industry leader on Verified
GPT-5.280.0%55.6%State-of-the-art on Pro
Claude Sonnet 4.577.2%Best value for coding
GPT-5.176.3%50.8%
Gemini 3 Pro76.2%43.4%

Critical context: SWE-bench Pro is more contamination-resistant and “industrially relevant” than SWE-bench Verified. GPT-5.2’s leadership here matters for real-world enterprise coding. However, third-party Vals.ai shows GPT-5.2 at 75.40% on SWE-bench—below OpenAI’s claimed 80%.

Science and maths

BenchmarkGPT-5.2 ThinkingGPT-5.2 ProBest in class
GPQA Diamond92.4%93.2%Gemini 3 Deep Think (93.8%)
AIME 2025100%100%Perfect score, no tools
FrontierMath40.3%Up from 31.0%
Humanity’s Last Exam34.5%36.6%Gemini Deep Think (41.0%)

GPT-5.2’s perfect AIME 2025 score demonstrates frontier-level mathematical reasoning. FrontierMath improvement (+9.3 points) shows genuine progress on research-level problems.

Abstract reasoning

BenchmarkGPT-5.2 ThinkingGPT-5.2 ProGPT-5.1
ARC-AGI-186.2%90.5%
ARC-AGI-252.9%54.2%17.6%

The ARC-AGI-2 improvement is the release’s most striking result—a 35-point jump suggesting major advances in abstract reasoning. GPT-5.2 Pro became the first model to cross 90% on ARC-AGI-1.

Visual reasoning

BenchmarkGPT-5.2Claude Opus 4.5Gemini 3 Pro
MMMU-Pro86.5%80.7%
Video-MMMU90.5%87.6%

GPT-5.2 leads on visual reasoning benchmarks despite no new image generation capabilities. Chart reasoning and UI understanding error rates dropped approximately 50%.

Pricing breakdown

GPT-5.2 carries a 40% price increase over GPT-5.1. OpenAI claims greater token efficiency offsets the increase.

TierInput (per MTok)Output (per MTok)
GPT-5.2 Thinking$1.75$14.00
GPT-5.2 Pro$21.00$168.00
Cached input$0.175
Batch API$0.875$7.00

Cost comparison with competitors

ModelInputOutputNotes
GPT-5.2$1.75$14.0040% more than GPT-5.1
GPT-5.1$1.25$10.00Available 3 months
Claude Opus 4.5$5.00$25.002.9x more expensive input
Claude Sonnet 4.5$3.00$15.00Best coding value
Gemini 3 Pro~$2.00~$12.00Competitive

GPT-5.2 undercuts Claude on input tokens but sits mid-market on output pricing. For cost-sensitive deployments, GPT-5.1 remains available and represents better value unless you need the reasoning improvements.

Hidden cost: reasoning tokens

Be aware that hidden reasoning tokens can inflate costs significantly for complex tasks. GPT-5.2’s Thinking mode generates internal “thinking” tokens that count against your output quota but aren’t visible in responses. The new xhigh reasoning effort level uses substantially more compute.

How to access GPT-5.2

Via API

GPT-5.2 is generally available with no waitlist. Basic usage:

from openai import OpenAI
client = OpenAI()

response = client.chat.completions.create(
    model="gpt-5.2",
    messages=[{"role": "user", "content": "Your prompt here"}],
    reasoning_effort="medium"  # none, low, medium, high, xhigh
)

For GPT-5.2 Pro (Responses API only):

response = client.responses.create(
    model="gpt-5.2-pro",
    input="Your complex problem here"
)

Rate limits scale with spend: 500K tokens/minute at Tier 1 ($5 spent), up to 40M tokens/minute at Tier 5 ($1,000+ spent).

Via ChatGPT

Access varies by subscription:

TierContextRate limitCost
Free8K tokens10 msgs / 5 hours$0
Plus32K tokens160 msgs / 3 hours$20/mo
Pro400K tokensUnlimited$200/mo
Business/Enterprise400K tokensUnlimitedCustom

Free and Plus users fall back to GPT-mini when limits are reached. Pro subscribers get full access to GPT-5.2 Pro.

Via integrations

  • GitHub CopilotPublic preview via BYOK (bring your own key)
  • Azure AI Foundry — Available
  • Microsoft 365 Copilot — Integration underway

How GPT-5.2 compares

vs Claude Opus 4.5

Claude narrowly leads on SWE-bench Verified coding (80.9% vs 80.0%) and offers superior prompt injection resistance (~95%). Claude can sustain 30+ hour autonomous operation for complex projects.

However, GPT-5.2 dominates on abstract reasoning (ARC-AGI-2: 54.2% vs 37.6%) and offers 2x the context window (400K vs 200K standard). GPT-5.2 also wins on SWE-bench Pro (55.6% vs unavailable).

Choose Claude for: complex refactoring, multi-file codebases, long-running autonomous agents, when code quality matters more than cost.

Choose GPT-5.2 for: abstract reasoning tasks, large document analysis, professional knowledge work, when you need maximum context.

vs Gemini 3 Pro

Gemini leads on Humanity’s Last Exam (41.0% vs 36.6%) and offers a 1 million token context window—2.5x GPT-5.2’s capacity. Gemini’s multimodal capabilities (native video and audio processing) surpass GPT-5.2’s text+image limitation.

However, GPT-5.2 outperforms on SWE-bench Pro (55.6% vs 43.4%) and ARC-AGI-2 reasoning (54.2% vs 45.1%).

Choose Gemini for: massive document processing, multimodal workflows with video/audio, when you need 1M+ context.

Choose GPT-5.2 for: enterprise coding, abstract reasoning, professional document generation.

The practical consensus

From community feedback: GPT-5.2 excels at full-codebase analysis, spreadsheet generation, and professional document work. Claude Opus 4.5 leads for iterative coding and long-running autonomous agents. Gemini 3 Pro dominates for massive document processing and multimodal workflows. No single model dominates across all dimensions.

Known limitations

Independent testing and community reports reveal several weaknesses:

Speed penalties in Thinking mode. Developer Matt Shumer noted it’s “very slow for most questions.” GPT-5.2 Pro can take up to 30 minutes for complex problems.

Instant mode quality concerns. Multiple sources report GPT-5.2 Instant feels “bland,” refuses more requests, and “sounds like someone who just finished corporate compliance training.” OpenAI’s own system card acknowledges “regressions in certain modes.”

Stricter content filtering. Developers report frustration with guardrails triggering on mundane conversations. Reddit threads describe conversations as less “open and free” compared to earlier GPT-5 releases.

Hallucination rate with low thinking effort. GPT-5.2 with low thinking shows an 8.4% hallucination rate—higher than DeepSeek V3.2’s 6.3%.

Missing features. No image generation improvements despite competitive pressure. Canvas is unavailable with GPT-5.2 Pro. No audio modality support.

Context handling imperfections. Real-world testing found the model “ignored repeated information when earlier context contradicted later details” in messy documents.

Community reception

The developer community response is sharply divided between enterprise users and casual consumers.

The positives

Enterprise and developer users report concrete wins:

  • Aaron Levie (Box CEO): GPT-5.2 performs “7 points better than GPT-5.1” on reasoning tests, with complex extraction tasks dropping from 46 seconds to 12 seconds
  • Dan Shipper (Every CEO): The model “worked for two hours straight” analysing a P&L statement with accurate results
  • Augment Code: Selected GPT-5.2 specifically for their code review agent, citing “substantially stronger deep code capabilities”

The negatives

Consumer and creative users tell a different story. Reddit threads with thousands of upvotes describe GPT-5.2 as “boring,” “robotic,” and “everything I hate about 5 and 5.1, but worse.” Common complaints:

  • Rigid tone and excessive markdown formatting (one user received “58 bullets and numbered points” for a simple question)
  • Stricter content filtering than predecessors
  • Loss of conversational warmth from GPT-4o era

Developer Mehul Gupta reported the model “invented APIs that didn’t exist” and missed document clause references even after explicit correction. Katie Parrott (Every) found GPT-5.2 “less resourceful” than Claude Opus 4.5—when asked to deduce information from email data, “Opus 4.5 thought to search his email and nailed it. GPT-5.2 didn’t think to try.”

The verdict

The consensus: GPT-5.2 optimises for enterprise power users rather than casual chat. It’s genuinely better at professional knowledge work but worse at being a conversational companion.

Version history

VersionReleasedKey changes
GPT-5.2Dec 11, 2025Three-tier architecture, 38% fewer errors, ARC-AGI-2 breakthrough
GPT-5.1-Codex-MaxNov 19, 2025Flagship agentic coding, 24+ hour tasks
GPT-5.1Nov 12-13, 2025Adaptive reasoning, 24h caching, 2-3x speed
GPT-5Aug 2025Initial GPT-5 release
GPT-4.1Apr 20251M context, agentic focus
GPT-4oNov 2024Multimodal, 128K context

GPT-5.1 remains available for approximately 3 months post-launch.

FAQ

Is GPT-5.2 better than Claude Opus 4.5?

For abstract reasoning and large document analysis—yes. GPT-5.2 leads on ARC-AGI-2 (54.2% vs 37.6%) and offers 2x the context window. For coding, Claude narrowly leads on SWE-bench Verified (80.9% vs 80.0%) but GPT-5.2 leads on SWE-bench Pro (55.6%).

How much does GPT-5.2 cost?

$1.75 per million input tokens, $14.00 per million output tokens. That’s 40% more than GPT-5.1. Cached inputs drop to $0.175/MTok (90% off). GPT-5.2 Pro costs $21.00/$168.00.

Can I use GPT-5.2 for free?

Yes, but with tight limits: 10 messages per 5 hours on ChatGPT Free, then you fall back to GPT-mini. API access requires payment.

What’s the difference between GPT-5.2 Thinking and Pro?

Thinking mode handles most reasoning tasks with configurable effort levels (none through xhigh). Pro mode uses maximum compute for highest accuracy but can take up to 30 minutes and costs 12x more. Pro is only available via the Responses API.

Is GPT-5.2 worth the 40% price increase over GPT-5.1?

If you need the abstract reasoning improvements (ARC-AGI-2 jumped 35 points) or require state-of-the-art SWE-bench Pro performance—yes. For general use, GPT-5.1 remains excellent value and stays available for 3 months.

When will GPT-5.2 Codex be available?

OpenAI says “in the coming weeks.” Until then, GPT-5.1-Codex-Max remains recommended for agentic coding in Codex environments.

Why do people say GPT-5.2 feels “robotic”?

OpenAI optimised GPT-5.2 for professional knowledge work, prioritising accuracy and structured outputs. The trade-off is a more formal, less conversational tone that many casual users find less engaging than earlier models.

ResourceURL
Announcementopenai.com/index/introducing-gpt-5-2
Science & Mathopenai.com/index/gpt-5-2-for-science-and-math
API Documentationplatform.openai.com/docs/guides/latest-model
Model Referenceplatform.openai.com/docs/models/gpt-5.2-pro
ChatGPTchat.openai.com
Pricingopenai.com/api/pricing
GitHub Copilot Previewgithub.blog/changelog/2025-12-11
guest@theairankings:~$_