THE AI RANKINGS

OpenAI

GPT-5.1

Provider
OpenAI
Status
Current
Context
400,000 tok
SWE-bench
76.3%
Price
$1.25 / $10 /MTok
Knowledge
2024-09-30

GPT-5.1 is OpenAI’s mid-cycle refinement to the GPT-5 family, released November 12, 2025—three months after GPT-5’s August debut. Rather than a capability leap, it’s an optimisation release focused on speed, efficiency, and developer experience. The model introduces adaptive reasoning that dynamically allocates compute based on query complexity, delivering 2-3x faster responses on straightforward tasks while using 88% fewer tokens on easy queries.

At 76.3% on SWE-bench Verified, GPT-5.1 trails Claude Opus 4.5’s industry-leading 80.9% but costs 4x less per input token. The release also introduced GPT-5.1-Codex-Max, OpenAI’s most ambitious coding model—the first trained to operate coherently across millions of tokens and sustain autonomous work for 24+ hours.

Quick specs

ProviderOpenAI
ReleasedNovember 12, 2025 (ChatGPT) / November 13, 2025 (API)
Context window400K tokens (272K input / 128K output)
Knowledge cutoffSeptember 30, 2024
Input price$1.25 / MTok
Output price$10.00 / MTok
Cached input$0.125 / MTok (90% discount)
SWE-bench Verified76.3% (77.9% Codex-Max)
GPQA Diamond~87%
MMMU85.4%
Best forFast iteration, production apps, price-sensitive deployments
LimitationsCoding trails Claude, abstract reasoning trails Gemini

TRY GPT-5.1 →

What’s new in GPT-5.1

GPT-5.1 addresses GPT-5’s most criticised shortcomings: speed and cost efficiency. The improvements fall into three categories.

Adaptive reasoning architecture

The model now operates in two modes. Instant mode (gpt-5.1-chat-latest) handles everyday queries with a warmer, more conversational tone and 2-3x faster response times. Thinking mode (gpt-5.1) engages deeper reasoning for complex problems, allocating twice the compute on difficult tasks while using 88% fewer tokens on easy ones. An automatic routing layer selects between modes based on query complexity.

A new reasoning_effort parameter gives developers fine-grained control with four levels: none, low, medium, and high. The none setting effectively transforms GPT-5.1 into a non-reasoning model for latency-sensitive applications.

Extended prompt caching

The standout infrastructure improvement is 24-hour prompt caching (enabled via prompt_cache_retention='24h'), a massive upgrade from the previous few-minute retention. Early adopters report 73% average cache hit rates, making the 90% input discount practical for production workloads rather than theoretical.

New developer tools

Two tools target agentic coding workflows. The apply_patch tool enables surgical code editing using structured diffs instead of full file rewrites—dramatically improving accuracy for large codebases. The shell tool provides controlled command-line access for autonomous agents.

The GPT-5.1 model family

OpenAI released seven variants designed for different use cases:

VariantAPI IdentifierPurposeNotes
GPT-5.1 Instantgpt-5.1-chat-latestFast conversational responses2-3x faster than GPT-5
GPT-5.1 Thinkinggpt-5.1Complex multi-step reasoningAdaptive compute allocation
GPT-5.1 AutoAutomatic mode routingSelects Instant vs Thinking
GPT-5.1 ProResearch-grade intelligencePro subscribers only ($200/mo)
GPT-5.1-Codexgpt-5.1-codexStandard coding tasks
GPT-5.1-Codex-Minigpt-5.1-codex-miniLightweight coding
GPT-5.1-Codex-Maxgpt-5.1-codex-maxLong-running agentic coding24+ hour tasks, 77.9% SWE-bench

GPT-5.1-Codex-Max deserves special attention. Released November 19, 2025, it’s OpenAI’s first model trained to operate coherently across millions of tokens in single tasks. It uses 30% fewer thinking tokens than standard Codex while achieving 77.9% on SWE-bench Verified. OpenAI claims it completed a 24-hour internal coding task autonomously.

Benchmark performance

GPT-5.1 shows incremental rather than dramatic gains over GPT-5. Independent testing from Vals.ai reveals smaller improvements than OpenAI’s marketing suggests.

Coding benchmarks

ModelSWE-bench VerifiedNotes
Claude Opus 4.580.9%Industry leader
Claude Sonnet 4.577.2%Best value for coding
GPT-5.1-Codex-Max77.9%OpenAI’s best
GPT-5.176.3%+1.4pp over GPT-5
GPT-574.9%
Gemini 3 Pro Preview76.2%

SWE-bench Verified tests models on real GitHub pull requests. A 76.3% score means GPT-5.1 can resolve roughly 3 out of 4 actual bug reports without human intervention—though real-world IDE performance typically runs 40-60% lower due to latency constraints.

Reasoning benchmarks

BenchmarkGPT-5.1GPT-5Best in class
GPQA Diamond~87%87.3%Claude Opus 4.5 (87%)
AIME 202594.0%94.6%Virtually tied
ARC-AGI-217.6%Gemini 3 Pro (45.1%)
MMMU85.4%84.2%GPT-5.1 leads

The AIME 2025 result is notable: GPT-5.1 actually scores marginally lower than GPT-5 on this math competition benchmark. Abstract reasoning (ARC-AGI-2) significantly trails Gemini 3 Pro. GPT-5.1’s strength is visual reasoning (MMMU 85.4%).

Industry rankings

Artificial Analysis ranks GPT-5.1 as the second most intelligent LLM as of November 2025, with the GPT-5 family scoring 68 on their composite Intelligence Index at high reasoning effort.

Pricing breakdown

GPT-5.1 matches GPT-5’s pricing while introducing more aggressive caching discounts.

TierInput (per MTok)Output (per MTok)
Standard$1.25$10.00
Cached input$0.125
Batch API$0.625$5.00
Priority$2.50$20.00

Cost comparison with competitors

ModelInputOutputNotes
GPT-5.1$1.25$10.00Best price-performance
Claude Opus 4.5$5.00$25.004x more expensive input
Claude Sonnet 4.5$3.00$15.00Best coding value
Gemini 3 Pro~$1.25~$5.00Competitive

One developer reported spending $2,220 on GPT-5.1 versus $6,020 on Claude Opus 4.5 for identical workloads—though Claude’s code quality rated higher. The trade-off is real: GPT-5.1 wins on cost, Claude wins on capability.

Hidden cost: reasoning tokens

Be aware that hidden reasoning tokens can inflate costs 4-5x compared to GPT-4.1 for complex tasks. The model generates internal “thinking” tokens that count against your output quota but aren’t visible in responses.

How to access GPT-5.1

Via API

GPT-5.1 is generally available with no waitlist. Basic usage:

from openai import OpenAI
client = OpenAI()

response = client.chat.completions.create(
    model="gpt-5.1",
    messages=[{"role": "user", "content": "Your prompt here"}],
    reasoning_effort="medium"  # none, low, medium, high
)

Enable 24-hour prompt caching:

response = client.chat.completions.create(
    model="gpt-5.1",
    messages=[{"role": "user", "content": "Your prompt"}],
    extra_body={"prompt_cache_retention": "24h"}
)

Rate limits scale with spend: 500K tokens/minute at Tier 1 ($5 spent), up to 40M tokens/minute at Tier 5 ($1,000+ spent).

Via ChatGPT

Access varies dramatically by subscription:

TierContextRate limitCost
Free8K tokens10 msgs / 5 hours$0
Plus32K tokens160 msgs / 3 hours$20/mo
Pro196K tokensUnlimited$200/mo
Enterprise196K tokensUnlimitedCustom

Free and Plus users fall back to GPT-5.1-mini when limits are reached. Pro subscribers get access to GPT-5.1 Pro, the research-grade variant.

How GPT-5.1 compares

vs Claude Opus 4.5

Claude leads on raw coding capability (80.9% vs 76.3% SWE-bench) and sustained engineering work. Developer Mckay Wrigley declared “Claude Code with Opus is still king and frankly it’s not close.” However, Claude costs 4x more for input tokens and 2.5x more for output.

Choose Claude for: complex refactoring, multi-file codebases, when code quality matters more than cost.

Choose GPT-5.1 for: rapid iteration, cost-sensitive production deployments, when “good enough” code ships faster.

vs Gemini 3 Pro

Gemini dominates abstract reasoning (45.1% vs 17.6% on ARC-AGI-2) and offers the largest context window at 1M tokens. Developers report reaching for Gemini “for UI/frontend tasks” where it excels.

Choose Gemini for: abstract reasoning tasks, UI/UX design, when you need massive context.

Choose GPT-5.1 for: most general-purpose work, better ecosystem integration.

The practical consensus

From community feedback: developers use GPT-5.1 for speed and daily iteration, Gemini for UI tasks, and reserve Claude or GPT-5.1 Pro “when they cannot afford to be wrong.”

Known limitations

Independent testing and community reports reveal several weaknesses:

Context degradation begins noticeably around 120K tokens despite the 272K theoretical input limit. Performance degrades on very long inputs.

Hidden reasoning costs can inflate bills 4-5x. The model generates invisible “thinking” tokens that count against output quotas.

Creative writing feels “pre-packaged” according to professional writers. The model resists style customisation and tends toward generic responses.

Settings drift over extended conversations. Personal preferences like tone and punctuation fade as context grows.

Frontend/UX design remains “far worse than Gemini 3” according to multiple reviewers.

Safety routing silently redirects some users to stricter model variants without notification—a major source of community frustration.

Community reception

The developer community responded more favourably to GPT-5.1 than GPT-5’s mixed August launch, though significant frustrations persist.

The positives

Enterprise partners report concrete wins:

AI influencer Matt Shumer called GPT-5.1 Pro “ridiculously smart… feels like a better reasoner than most humans.” JetBrains’ Denis Shiryaev described it as “genuinely agentic, the most naturally autonomous model I’ve ever tested.”

The negatives

OpenAI’s Reddit AMA about GPT-5.1 became a “karma massacre” with 1,300+ downvotes and 1,200+ comments—most highly critical. Primary complaints:

The verdict

The consensus: Claude excels for engineers who need peak code quality; GPT-5.1 wins for builders focused on speed, cost, and iteration.

Version history

VersionReleasedKey changes
GPT-5.1-Codex-MaxNov 19, 2025Flagship agentic coding, 24+ hour tasks
GPT-5.1Nov 12-13, 2025Adaptive reasoning, 24h caching, 2-3x speed
GPT-5Aug 2025Initial GPT-5 release
GPT-4.1Apr 20251M context, agentic focus
GPT-4oNov 2024Multimodal, 128K context

GPT-5 remains available in the ChatGPT legacy dropdown for approximately 90 days post-launch.

FAQ

Is GPT-5.1 better than Claude Opus 4.5?

Not for coding—Claude leads with 80.9% vs 76.3% on SWE-bench. GPT-5.1 wins on price (4x cheaper input) and speed. Choose based on whether you prioritise capability or cost.

How much does GPT-5.1 cost?

$1.25 per million input tokens, $10.00 per million output tokens. Cached inputs drop to $0.125/MTok (90% off). A typical 1,000-word conversation costs roughly $0.01-0.03.

Can I use GPT-5.1 for free?

Yes, but with tight limits: 10 messages per 5 hours on ChatGPT Free, then you fall back to GPT-5.1-mini. API access requires payment.

What’s the difference between GPT-5.1 and GPT-5.1-Codex-Max?

Codex-Max is optimised for long-running agentic coding tasks—it can work autonomously for 24+ hours across millions of tokens. Standard GPT-5.1 is for general-purpose use.

Is GPT-5.1 worth upgrading from GPT-5?

If you use the API regularly, yes—the 2-3x speed improvement and extended caching offer real value. If you’re on ChatGPT Plus, the upgrade is automatic.

When will fine-tuning be available?

OpenAI hasn’t announced fine-tuning for GPT-5.1. Currently only GPT-4o and older models support fine-tuning.