Specifications

Model ID claude-opus-4-5
Provider Anthropic
Architecture transformer
Context Window 200K tokens
Max Input 200K tokens
Max Output 64K tokens
Knowledge Cutoff 2025-05-31
License proprietary
Open Weights No

Capabilities

Modalities

Input: text, images, pdf
Output: text

Reasoning

Reasoning Model Agentic
Effort Levels: lowmediumhigh

Features

function-callingjson-modestructured-outputsvisionstreamingparallel-tool-callingcomputer-useweb-searchcitationsprompt-cachingbatch-processingextended-thinkinginterleaved-thinkingmemory-tool

Variants

VARIANT API ID DESCRIPTION SWE
Claude Opus 4.5 claude-opus-4-5-20251101 Flagship model with extended thinking capabilities
Claude Opus 4.5 (alias) claude-opus-4-5 Always points to latest Opus 4.5 version

API Pricing

Input $5 per 1M tokens
Output $25 per 1M tokens
Cached Input $0.5 per 1M tokens
Batch Input $2.5 per 1M tokens
Batch Output $12.5 per 1M tokens

67% price reduction from Opus 4.1 ($15/$75). Prompt caching offers up to 90% savings on repeated context. Batch API provides 50% discount with 24-hour turnaround.

Claude Access

TIER PRICE CONTEXT RATE LIMIT
Free Free
Pro $20/mo 200K Available via model dropdown
Max $100/mo 200K Same allocation as Sonnet previously had
Team Standard $25/mo 200K
Team Premium $150/mo 200K
Enterprise Free 200K Custom

Benchmarks

Coding

SWE-bench Verified 80.9%
TerminalBench 2 59.3%

Reasoning

GPQA Diamond 87%
MMLU 90.8%
ARC-AGI-2 37.6%

Math

AIME 2025 100%

Vision

MMMU 80.7%

Rankings

Artificial Analysis #2
AA Intelligence Index 70

First model to break 80% on SWE-bench Verified. ARC-AGI-2 score (37.6%) demonstrates genuine reasoning capability—more than 2x GPT-5.1. Terminal-bench Hard (44%) is highest ever recorded. OSWorld (66.3%) establishes new SOTA for computer-using agents.

Claude Opus 4.5 is Anthropic’s flagship AI model, released November 24, 2025—the first model to break 80% on SWE-bench Verified. At 80.9%, it leads all competitors on the industry’s most realistic coding benchmark, resolving 4 out of 5 real GitHub issues autonomously. The model arrives with a 67% price reduction from Opus 4.1 ($5/$25 per million tokens versus $15/$75), making flagship-tier capabilities economically viable for production workloads for the first time.

Opus 4.5 excels at long-horizon agentic tasks, achieving state-of-the-art results on OSWorld (66.3%) and Terminal-bench Hard (44%—the highest ever recorded). Its 37.6% on ARC-AGI-2—more than double GPT-5.1’s 17.6%—demonstrates genuine reasoning rather than pattern matching. Developers consistently describe it as “just getting it”—handling ambiguity without extensive prompting.

Quick specs

ProviderAnthropic
ReleasedNovember 24, 2025
Context window200K tokens
Maximum output64K tokens
Knowledge cutoffMay 2025 (reliable)
Input price$5.00 / MTok
Output price$25.00 / MTok
Cached input$0.50 / MTok (90% discount)
SWE-bench Verified80.9% (industry-leading)
ARC-AGI-237.6% (2x GPT-5.1)
OSWorld66.3% (best computer-using model)
Best forAgentic coding, complex reasoning, computer use, multi-file refactoring
LimitationsHigher price than competitors, slower than Sonnet

TRY CLAUDE OPUS 4.5 →

What’s new in Opus 4.5

Opus 4.5 completes Anthropic’s 4.5 model family following Sonnet 4.5 (September) and Haiku 4.5 (October). The release focuses on three major improvements.

Token efficiency revolution

The most significant advancement is dramatic token efficiency. Opus 4.5 uses up to 76% fewer output tokens than Opus 4.1 while matching or exceeding quality. At medium effort, it matches Sonnet 4.5’s best SWE-bench score while using 76% fewer tokens. At high effort (default), it exceeds Sonnet 4.5 by 4.3 percentage points while using 48% fewer tokens.

Amp’s analysis found Opus 4.5 actually cheaper per task than Sonnet despite higher per-token pricing: $1.30/thread versus Sonnet’s $1.83/thread. JetBrains reported 50-75% reduction in tool calling and build errors during internal testing.

Extended thinking with effort control

A new effort parameter (low/medium/high) gives developers fine-grained control over reasoning depth:

  • Low: Minimises latency and cost; best for simple queries
  • Medium: Balances speed and intelligence; matches Sonnet 4.5’s best scores
  • High (default): Maximum reasoning; best for complex coding and research

Extended thinking mode preserves thinking blocks from previous turns by default—a change from earlier Claude models that discarded this information between responses.

Agentic capabilities and computer use

Opus 4.5 introduces a new zoom tool for computer use, allowing the model to request magnified screen regions for better visual inspection during complex navigation tasks. This capability, combined with state-of-the-art results on OSWorld and WebArena, establishes Opus 4.5 as the most capable model for autonomous computer operation.

Rakuten’s testing found agents “autonomously refine their own capabilities—achieving peak performance in 4 iterations while other models couldn’t match that quality after 10.”

The Claude 4.5 model family

Claude Opus 4.5 is the flagship tier in Anthropic’s three-tier model lineup:

ModelAPI IdentifierPurposeSWE-benchPrice (in/out)
Claude Opus 4.5claude-opus-4-5-20251101Maximum intelligence80.9%$5/$25
Claude Sonnet 4.5claude-sonnet-4-5-20250929Best value for most tasks77.2%$3/$15
Claude Haiku 4.5claude-haiku-4-5-20251001Speed and efficiency73.3%$1/$5

The Opus tier has always designated Anthropic’s largest and most capable models. Opus 4.5 replaces Opus 4.1 (August 2025) as the flagship, though Opus 4.1 remains available via API for users who need it.

Notable: Opus 4.5 is currently the only Claude model without 1 million token context support—only Sonnet 4 and Sonnet 4.5 offer the 1M context beta.

Benchmark performance

Opus 4.5 sets new industry standards on coding and agentic benchmarks while showing competitive but not dominant results on some reasoning tasks.

Coding benchmarks

ModelSWE-bench VerifiedNotes
Claude Opus 4.580.9%Industry-leading, first to break 80%
GPT-5.1-Codex-Max77.9%OpenAI’s best
Claude Sonnet 4.577.2%Best value for coding
GPT-5.176.3%
Gemini 3 Pro Preview76.2%

SWE-bench Verified tests models on real GitHub pull requests—actual bug reports from open-source projects. An 80.9% score means Opus 4.5 can resolve 4 out of 5 real-world issues without human intervention, a significant leap from the ~50% scores of models just a year ago.

Additional coding results reinforce the lead:

BenchmarkOpus 4.5Best competitor
SWE-bench Pro52.0%
SWE-bench Multilingual76.2%Leads 7/8 languages
Terminal-bench 2.059.3%GPT-5.1-Codex-Max (58.1%)
Terminal-bench Hard44.0%Highest ever recorded

Agentic benchmarks

Opus 4.5 establishes clear leadership on autonomous task completion:

BenchmarkOpus 4.5What it measures
OSWorld66.3%Computer use (clicks, typing, navigation)
τ²-bench Retail88.9%Multi-step retail agent tasks
τ²-bench Telecom98.2%Multi-step telecom agent tasks
MCP Atlas62.3%Model Context Protocol integration
WebArena65.3%Single-agent web navigation

The OSWorld result (66.3%) is particularly significant—it measures real computer operation including clicking, typing, and navigating complex interfaces. This is the highest score ever recorded for a computer-using AI model.

Reasoning benchmarks

BenchmarkOpus 4.5GPT-5.1Gemini 3 Pro
GPQA Diamond87.0%88.1%91.9%
MMLU90.8%91.0%91.8%
MMMU (vision)80.7%85.4%81.0%
ARC-AGI-237.6%17.6%31.1%
AIME 2025100%94.0%

The ARC-AGI-2 result (37.6%) stands out—more than double GPT-5.1’s score. This benchmark tests fluid intelligence on novel problems that cannot be memorised, suggesting Opus 4.5 excels at genuine reasoning rather than pattern matching. The AIME 2025 perfect score (100%) was achieved with Python tools enabled.

Industry rankings

Artificial Analysis ranks Opus 4.5 at 70 on the Intelligence Index (reasoning mode), tying GPT-5.1 for second place behind Gemini 3 Pro’s 73. On LMArena/Chatbot Arena, Opus 4.5 holds the highest Expert Advantage Score (+85) of any non-thinking model—expert users strongly prefer it over alternatives when solving difficult problems.

Pricing breakdown

The 67% price reduction from Opus 4.1 represents Anthropic’s most significant pricing move, making flagship-tier capabilities accessible for production workloads.

TierInput (per MTok)Output (per MTok)
Standard$5.00$25.00
Cache write$6.25
Cache read$0.50
Batch API$2.50$12.50

Price evolution

ModelInputOutputChange
Claude Opus 4.5$5.00$25.00Current
Claude Opus 4.1$15.00$75.00-67%
Claude Opus 4$15.00$75.00

Cost comparison with competitors

ModelInputOutputNotes
GPT-5.1$1.25$10.004x cheaper input
Gemini 3 Pro$2.00$12.002.5x cheaper input
Claude Opus 4.5$5.00$25.00Premium pricing
Claude Sonnet 4.5$3.00$15.00Best coding value

Despite the premium per-token pricing, Opus 4.5 can be cheaper per task due to superior efficiency. Amp’s testing found $1.30/thread for Opus 4.5 versus $1.83/thread for Sonnet—the model solves problems with fewer tokens, offsetting the higher rate.

The efficiency advantage

For agentic workflows, token efficiency matters more than per-token price. Opus 4.5’s ability to solve tasks in fewer iterations means:

  • Fewer API calls
  • Less context accumulation
  • Faster completion times
  • Lower total cost despite higher rates

How to access Claude Opus 4.5

Via API

Opus 4.5 is generally available with no waitlist. Basic usage:

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-opus-4-5-20251101",
    max_tokens=8192,
    messages=[{"role": "user", "content": "Your prompt here"}]
)

Enable extended thinking with effort control:

response = client.messages.create(
    model="claude-opus-4-5-20251101",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000
    },
    messages=[{"role": "user", "content": "Complex reasoning task"}]
)

Rate limits scale with usage tier. Tier 1 ($100 spent): 1,000 requests/minute, 20K tokens/minute. Tier 4 ($10,000+ spent): 4,000 requests/minute, 400K tokens/minute.

Via Claude.ai

Access varies by subscription tier:

TierAccessPriceNotes
Free$0Opus not available
Pro✅ Dropdown$20/moShared token allocation
Max✅ Default$100+/moNo Opus-specific caps
Team Standard✅ Dropdown$25/user/mo
Team Premium✅ Default$150/user/moPriority access
EnterpriseCustom

Critically, Anthropic removed Opus-specific usage caps—Max users now receive roughly the same token allocation for Opus 4.5 as they previously had for Sonnet, making the flagship model practical as a daily driver.

Via cloud providers

Opus 4.5 is available on all three major cloud platforms—the only frontier model with this breadth:

PlatformStatusModel ID
Amazon BedrockGAanthropic.claude-opus-4-5-20251101-v1:0
Google Cloud Vertex AIGAclaude-opus-4-5@20251101
Microsoft Azure FoundryPreview
GitHub CopilotPreviewPaid tiers

How Claude Opus 4.5 compares

vs GPT-5.1

Opus 4.5 leads on raw coding capability (80.9% vs 76.3% SWE-bench) and abstract reasoning (37.6% vs 17.6% ARC-AGI-2). McKay Wrigley declared “Claude Code + Opus 4.5 is the best AI coding tool in the world… and it’s not close.”

However, GPT-5.1 costs 4x less for input tokens and offers faster response times. Multiple developers report GPT-5.1 better for rapid iteration cycles.

Choose Opus 4.5 for: complex refactoring, multi-file codebases, agentic workflows, when code quality matters more than speed.

Choose GPT-5.1 for: rapid iteration, cost-sensitive deployments, when speed trumps peak capability.

vs Gemini 3 Pro

Gemini leads on some reasoning benchmarks (91.9% vs 87.0% GPQA Diamond) and offers 1M+ token context—5x larger than Opus 4.5’s 200K. Developers report Gemini excels at UI/frontend tasks.

However, Opus 4.5 dominates agentic and coding benchmarks. Amp’s team switched from Gemini 3 to Opus 4.5 within a week: “Gemini’s impressive highs came with lows… Opus 4.5 seems more polished.”

Choose Gemini for: massive context needs, UI/UX design, multimodal workflows.

Choose Opus 4.5 for: coding, agentic tasks, when reliability matters more than context size.

vs Claude Sonnet 4.5

Sonnet 4.5 offers 60% of Opus’s pricing ($3/$15 vs $5/$25) with 95% of the capability on many tasks. For straightforward coding and writing, Sonnet is often sufficient.

Choose Sonnet for: daily coding tasks, cost-sensitive production, when speed matters.

Choose Opus for: complex architecture decisions, difficult debugging, “when you cannot afford to be wrong.”

The practical consensus

From community feedback: developers reserve Opus 4.5 for hard problems—complex refactoring, architectural decisions, stubborn bugs. They use Sonnet for daily work and GPT-5.1 for rapid iteration. Simon Willison noted he “switched back to Sonnet 4.5 and kept on working at the same pace” for routine tasks.

Known limitations

Independent testing and community reports reveal several areas where Opus 4.5 falls short:

Price premium persists: Despite the 67% cut from Opus 4.1, Opus 4.5 costs 2-4x more per token than GPT-5.1 or Gemini 3 Pro. Zvi Mowshowitz noted: “Price is the biggest weakness. Even with a cut, $5/$25 is still on the high end.”

Speed tradeoffs: Multiple reviewers note “Opus is slower than Sonnet. You’ll notice this.” For rapid iteration cycles, the latency can be frustrating.

Over-engineering tendency: Several comparisons noted Opus 4.5 “sometimes introduces architecture or testing infrastructure beyond what the user requested,” producing “more infrastructure than needed” and “thinking like a platform architect rather than service engineer.”

Context window limitations: At 200K tokens, Opus 4.5 has the smallest context among flagships—GPT-5.1 offers 400K and Gemini offers 1M+. No 1M context beta is available for Opus (only Sonnet).

Hallucination rate: Artificial Analysis testing found a 58% hallucination rate (4th lowest among frontier models, but still notable).

Prompt injection risk: While achieving industry-leading 4.7% attack success rate (versus 21.9% for GPT-5.1), Simon Willison cautioned: “Single attempts at prompt injection still work 1/20 times.”

Community reception

The developer community has responded overwhelmingly positively to Opus 4.5, with several reviewers calling it the most significant model release since GPT-4.

The positives

Coding excellence: McKay Wrigley declared “This is the best model for both code and for agents, and it’s not close.” Simon Willison completed 20 commits across 39 files on sqlite-utils with Opus 4.5 handling most of the work during a weekend preview.

“Just gets it” factor: Internal testers and developers consistently note the model handles ambiguity without extensive prompting. Alex Albert, Anthropic’s head of developer relations: “Tasks that were near-impossible for Sonnet 4.5 just a few weeks ago are now within reach.”

Token efficiency: JetBrains reported 50-75% reduction in tool calling and build errors. Multiple teams found Opus 4.5 cheaper per task than Sonnet despite higher per-token pricing.

Agentic reliability: Rakuten found agents “autonomously refine their own capabilities—achieving peak performance in 4 iterations while other models couldn’t match that quality after 10.”

The criticisms

Price concerns: Despite the dramatic cut, the premium pricing remains the primary criticism. Many developers default to Sonnet and only “reach for Opus when stuck.”

Speed: “Opus is slower than Sonnet—you’ll notice this” appears in multiple reviews. The trade-off between intelligence and latency is real.

Occasionally over-engineers: Some users report Opus “produces more infrastructure than needed,” suggesting too much capability can be a liability for simple tasks.

Expert verdict

Zvi Mowshowitz’s analysis: “Claude Opus 4.5 Is The Best Model Available… This is clearly the best model. The benchmark lead is clear and consistent.”

The consensus: Opus 4.5 is the capability leader, but Sonnet remains the daily driver for most workflows.

Version history

VersionReleasedKey changes
Claude Opus 4.5Nov 24, 202580.9% SWE-bench, 67% price cut, effort parameter
Claude Haiku 4.5Oct 2025Fast tier for 4.5 family
Claude Sonnet 4.5Sep 2025Best-value coding model
Claude Opus 4.1Aug 2025Previous flagship
Claude Opus 4May 2025Initial Claude 4 flagship
Claude Sonnet 4May 2025Tool use, computer use SOTA

Opus 4.1 remains available via API for users who need backward compatibility.

FAQ

Is Claude Opus 4.5 better than GPT-5.1?

For coding—yes. Opus 4.5 leads with 80.9% vs 76.3% on SWE-bench and 37.6% vs 17.6% on ARC-AGI-2. However, GPT-5.1 costs 4x less for input tokens and is faster. Choose based on whether you prioritise capability or cost.

How much does Claude Opus 4.5 cost?

$5.00 per million input tokens, $25.00 per million output tokens. Cached inputs drop to $0.50/MTok (90% off). Batch API offers 50% discount. This is 67% cheaper than Opus 4.1’s $15/$75 pricing.

Can I use Claude Opus 4.5 for free?

No. Opus 4.5 is not available on the free tier. You need Claude Pro ($20/month), Claude Max ($100+/month), or API access. Pro users access Opus via the model dropdown.

What’s the difference between Opus 4.5 and Sonnet 4.5?

Opus 4.5 (80.9% SWE-bench) is Anthropic’s most capable model for complex tasks. Sonnet 4.5 (77.2% SWE-bench) offers ~95% of the capability at 60% of the price ($3/$15 vs $5/$25). Most developers use Sonnet daily and reserve Opus for hard problems.

Is Opus 4.5 worth upgrading from Opus 4.1?

Yes—you get better performance at 67% lower cost. The token efficiency improvements mean most tasks actually cost less despite the capability increase.

Does Opus 4.5 support 1 million token context?

No. Opus 4.5 has a 200K token context window. Only Sonnet 4 and Sonnet 4.5 support the 1M context beta. This is a notable limitation versus GPT-5.1 (400K) and Gemini 3 Pro (1M+).

What is Opus 4.5 best at?

Complex coding tasks, multi-file refactoring, agentic workflows, computer use, and problems requiring genuine reasoning. It’s the model to use “when you cannot afford to be wrong.”

Where is Opus 4.5 available?

Claude.ai (Pro/Max/Team/Enterprise), Anthropic API, Amazon Bedrock, Google Cloud Vertex AI, Microsoft Azure Foundry (preview), GitHub Copilot (preview), Cursor, and OpenRouter.

guest@theairankings:~$_