Claude Opus 4.5
Specifications
| Model ID | claude-opus-4-5 |
| Provider | Anthropic |
| Architecture | transformer |
| Context Window | 200K tokens |
| Max Input | 200K tokens |
| Max Output | 64K tokens |
| Knowledge Cutoff | 2025-05-31 |
| License | proprietary |
| Open Weights | No |
Capabilities
Modalities
Reasoning
Features
Variants
| VARIANT | API ID |
|---|---|
| Claude Opus 4.5 | claude-opus-4-5-20251101 |
| Claude Opus 4.5 (alias) | claude-opus-4-5 |
API Pricing
67% price reduction from Opus 4.1 ($15/$75). Prompt caching offers up to 90% savings on repeated context. Batch API provides 50% discount with 24-hour turnaround.
Claude Access
| TIER | PRICE |
|---|---|
| Free | Free |
| Pro | $20/mo |
| Max | $100/mo |
| Team Standard | $25/mo |
| Team Premium | $150/mo |
| Enterprise | Free |
Benchmarks
Coding
| SWE-bench Verified | 80.9% |
| TerminalBench 2 | 59.3% |
Reasoning
| GPQA Diamond | 87% |
| MMLU | 90.8% |
| ARC-AGI-2 | 37.6% |
Math
| AIME 2025 | 100% |
Vision
| MMMU | 80.7% |
Rankings
| Artificial Analysis | #2 |
| AA Intelligence Index | 70 |
First model to break 80% on SWE-bench Verified. ARC-AGI-2 score (37.6%) demonstrates genuine reasoning capability—more than 2x GPT-5.1. Terminal-bench Hard (44%) is highest ever recorded. OSWorld (66.3%) establishes new SOTA for computer-using agents.
Claude Opus 4.5 is Anthropic’s flagship AI model, released November 24, 2025—the first model to break 80% on SWE-bench Verified. At 80.9%, it leads all competitors on the industry’s most realistic coding benchmark, resolving 4 out of 5 real GitHub issues autonomously. The model arrives with a 67% price reduction from Opus 4.1 ($5/$25 per million tokens versus $15/$75), making flagship-tier capabilities economically viable for production workloads for the first time.
Opus 4.5 excels at long-horizon agentic tasks, achieving state-of-the-art results on OSWorld (66.3%) and Terminal-bench Hard (44%—the highest ever recorded). Its 37.6% on ARC-AGI-2—more than double GPT-5.1’s 17.6%—demonstrates genuine reasoning rather than pattern matching. Developers consistently describe it as “just getting it”—handling ambiguity without extensive prompting.
Quick specs
| Provider | Anthropic |
| Released | November 24, 2025 |
| Context window | 200K tokens |
| Maximum output | 64K tokens |
| Knowledge cutoff | May 2025 (reliable) |
| Input price | $5.00 / MTok |
| Output price | $25.00 / MTok |
| Cached input | $0.50 / MTok (90% discount) |
| SWE-bench Verified | 80.9% (industry-leading) |
| ARC-AGI-2 | 37.6% (2x GPT-5.1) |
| OSWorld | 66.3% (best computer-using model) |
| Best for | Agentic coding, complex reasoning, computer use, multi-file refactoring |
| Limitations | Higher price than competitors, slower than Sonnet |
What’s new in Opus 4.5
Opus 4.5 completes Anthropic’s 4.5 model family following Sonnet 4.5 (September) and Haiku 4.5 (October). The release focuses on three major improvements.
Token efficiency revolution
The most significant advancement is dramatic token efficiency. Opus 4.5 uses up to 76% fewer output tokens than Opus 4.1 while matching or exceeding quality. At medium effort, it matches Sonnet 4.5’s best SWE-bench score while using 76% fewer tokens. At high effort (default), it exceeds Sonnet 4.5 by 4.3 percentage points while using 48% fewer tokens.
Amp’s analysis found Opus 4.5 actually cheaper per task than Sonnet despite higher per-token pricing: $1.30/thread versus Sonnet’s $1.83/thread. JetBrains reported 50-75% reduction in tool calling and build errors during internal testing.
Extended thinking with effort control
A new effort parameter (low/medium/high) gives developers fine-grained control over reasoning depth:
- Low: Minimises latency and cost; best for simple queries
- Medium: Balances speed and intelligence; matches Sonnet 4.5’s best scores
- High (default): Maximum reasoning; best for complex coding and research
Extended thinking mode preserves thinking blocks from previous turns by default—a change from earlier Claude models that discarded this information between responses.
Agentic capabilities and computer use
Opus 4.5 introduces a new zoom tool for computer use, allowing the model to request magnified screen regions for better visual inspection during complex navigation tasks. This capability, combined with state-of-the-art results on OSWorld and WebArena, establishes Opus 4.5 as the most capable model for autonomous computer operation.
Rakuten’s testing found agents “autonomously refine their own capabilities—achieving peak performance in 4 iterations while other models couldn’t match that quality after 10.”
The Claude 4.5 model family
Claude Opus 4.5 is the flagship tier in Anthropic’s three-tier model lineup:
| Model | API Identifier | Purpose | SWE-bench | Price (in/out) |
|---|---|---|---|---|
| Claude Opus 4.5 | claude-opus-4-5-20251101 | Maximum intelligence | 80.9% | $5/$25 |
| Claude Sonnet 4.5 | claude-sonnet-4-5-20250929 | Best value for most tasks | 77.2% | $3/$15 |
| Claude Haiku 4.5 | claude-haiku-4-5-20251001 | Speed and efficiency | 73.3% | $1/$5 |
The Opus tier has always designated Anthropic’s largest and most capable models. Opus 4.5 replaces Opus 4.1 (August 2025) as the flagship, though Opus 4.1 remains available via API for users who need it.
Notable: Opus 4.5 is currently the only Claude model without 1 million token context support—only Sonnet 4 and Sonnet 4.5 offer the 1M context beta.
Benchmark performance
Opus 4.5 sets new industry standards on coding and agentic benchmarks while showing competitive but not dominant results on some reasoning tasks.
Coding benchmarks
| Model | SWE-bench Verified | Notes |
|---|---|---|
| Claude Opus 4.5 | 80.9% | Industry-leading, first to break 80% |
| GPT-5.1-Codex-Max | 77.9% | OpenAI’s best |
| Claude Sonnet 4.5 | 77.2% | Best value for coding |
| GPT-5.1 | 76.3% | — |
| Gemini 3 Pro Preview | 76.2% | — |
SWE-bench Verified tests models on real GitHub pull requests—actual bug reports from open-source projects. An 80.9% score means Opus 4.5 can resolve 4 out of 5 real-world issues without human intervention, a significant leap from the ~50% scores of models just a year ago.
Additional coding results reinforce the lead:
| Benchmark | Opus 4.5 | Best competitor |
|---|---|---|
| SWE-bench Pro | 52.0% | — |
| SWE-bench Multilingual | 76.2% | Leads 7/8 languages |
| Terminal-bench 2.0 | 59.3% | GPT-5.1-Codex-Max (58.1%) |
| Terminal-bench Hard | 44.0% | Highest ever recorded |
Agentic benchmarks
Opus 4.5 establishes clear leadership on autonomous task completion:
| Benchmark | Opus 4.5 | What it measures |
|---|---|---|
| OSWorld | 66.3% | Computer use (clicks, typing, navigation) |
| τ²-bench Retail | 88.9% | Multi-step retail agent tasks |
| τ²-bench Telecom | 98.2% | Multi-step telecom agent tasks |
| MCP Atlas | 62.3% | Model Context Protocol integration |
| WebArena | 65.3% | Single-agent web navigation |
The OSWorld result (66.3%) is particularly significant—it measures real computer operation including clicking, typing, and navigating complex interfaces. This is the highest score ever recorded for a computer-using AI model.
Reasoning benchmarks
| Benchmark | Opus 4.5 | GPT-5.1 | Gemini 3 Pro |
|---|---|---|---|
| GPQA Diamond | 87.0% | 88.1% | 91.9% |
| MMLU | 90.8% | 91.0% | 91.8% |
| MMMU (vision) | 80.7% | 85.4% | 81.0% |
| ARC-AGI-2 | 37.6% | 17.6% | 31.1% |
| AIME 2025 | 100% | 94.0% | — |
The ARC-AGI-2 result (37.6%) stands out—more than double GPT-5.1’s score. This benchmark tests fluid intelligence on novel problems that cannot be memorised, suggesting Opus 4.5 excels at genuine reasoning rather than pattern matching. The AIME 2025 perfect score (100%) was achieved with Python tools enabled.
Industry rankings
Artificial Analysis ranks Opus 4.5 at 70 on the Intelligence Index (reasoning mode), tying GPT-5.1 for second place behind Gemini 3 Pro’s 73. On LMArena/Chatbot Arena, Opus 4.5 holds the highest Expert Advantage Score (+85) of any non-thinking model—expert users strongly prefer it over alternatives when solving difficult problems.
Pricing breakdown
The 67% price reduction from Opus 4.1 represents Anthropic’s most significant pricing move, making flagship-tier capabilities accessible for production workloads.
| Tier | Input (per MTok) | Output (per MTok) |
|---|---|---|
| Standard | $5.00 | $25.00 |
| Cache write | $6.25 | — |
| Cache read | $0.50 | — |
| Batch API | $2.50 | $12.50 |
Price evolution
| Model | Input | Output | Change |
|---|---|---|---|
| Claude Opus 4.5 | $5.00 | $25.00 | Current |
| Claude Opus 4.1 | $15.00 | $75.00 | -67% |
| Claude Opus 4 | $15.00 | $75.00 | — |
Cost comparison with competitors
| Model | Input | Output | Notes |
|---|---|---|---|
| GPT-5.1 | $1.25 | $10.00 | 4x cheaper input |
| Gemini 3 Pro | $2.00 | $12.00 | 2.5x cheaper input |
| Claude Opus 4.5 | $5.00 | $25.00 | Premium pricing |
| Claude Sonnet 4.5 | $3.00 | $15.00 | Best coding value |
Despite the premium per-token pricing, Opus 4.5 can be cheaper per task due to superior efficiency. Amp’s testing found $1.30/thread for Opus 4.5 versus $1.83/thread for Sonnet—the model solves problems with fewer tokens, offsetting the higher rate.
The efficiency advantage
For agentic workflows, token efficiency matters more than per-token price. Opus 4.5’s ability to solve tasks in fewer iterations means:
- Fewer API calls
- Less context accumulation
- Faster completion times
- Lower total cost despite higher rates
How to access Claude Opus 4.5
Via API
Opus 4.5 is generally available with no waitlist. Basic usage:
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-5-20251101",
max_tokens=8192,
messages=[{"role": "user", "content": "Your prompt here"}]
)
Enable extended thinking with effort control:
response = client.messages.create(
model="claude-opus-4-5-20251101",
max_tokens=16000,
thinking={
"type": "enabled",
"budget_tokens": 10000
},
messages=[{"role": "user", "content": "Complex reasoning task"}]
)
Rate limits scale with usage tier. Tier 1 ($100 spent): 1,000 requests/minute, 20K tokens/minute. Tier 4 ($10,000+ spent): 4,000 requests/minute, 400K tokens/minute.
Via Claude.ai
Access varies by subscription tier:
| Tier | Access | Price | Notes |
|---|---|---|---|
| Free | ❌ | $0 | Opus not available |
| Pro | ✅ Dropdown | $20/mo | Shared token allocation |
| Max | ✅ Default | $100+/mo | No Opus-specific caps |
| Team Standard | ✅ Dropdown | $25/user/mo | — |
| Team Premium | ✅ Default | $150/user/mo | Priority access |
| Enterprise | ✅ | Custom | — |
Critically, Anthropic removed Opus-specific usage caps—Max users now receive roughly the same token allocation for Opus 4.5 as they previously had for Sonnet, making the flagship model practical as a daily driver.
Via cloud providers
Opus 4.5 is available on all three major cloud platforms—the only frontier model with this breadth:
| Platform | Status | Model ID |
|---|---|---|
| Amazon Bedrock | GA | anthropic.claude-opus-4-5-20251101-v1:0 |
| Google Cloud Vertex AI | GA | claude-opus-4-5@20251101 |
| Microsoft Azure Foundry | Preview | — |
| GitHub Copilot | Preview | Paid tiers |
How Claude Opus 4.5 compares
vs GPT-5.1
Opus 4.5 leads on raw coding capability (80.9% vs 76.3% SWE-bench) and abstract reasoning (37.6% vs 17.6% ARC-AGI-2). McKay Wrigley declared “Claude Code + Opus 4.5 is the best AI coding tool in the world… and it’s not close.”
However, GPT-5.1 costs 4x less for input tokens and offers faster response times. Multiple developers report GPT-5.1 better for rapid iteration cycles.
Choose Opus 4.5 for: complex refactoring, multi-file codebases, agentic workflows, when code quality matters more than speed.
Choose GPT-5.1 for: rapid iteration, cost-sensitive deployments, when speed trumps peak capability.
vs Gemini 3 Pro
Gemini leads on some reasoning benchmarks (91.9% vs 87.0% GPQA Diamond) and offers 1M+ token context—5x larger than Opus 4.5’s 200K. Developers report Gemini excels at UI/frontend tasks.
However, Opus 4.5 dominates agentic and coding benchmarks. Amp’s team switched from Gemini 3 to Opus 4.5 within a week: “Gemini’s impressive highs came with lows… Opus 4.5 seems more polished.”
Choose Gemini for: massive context needs, UI/UX design, multimodal workflows.
Choose Opus 4.5 for: coding, agentic tasks, when reliability matters more than context size.
vs Claude Sonnet 4.5
Sonnet 4.5 offers 60% of Opus’s pricing ($3/$15 vs $5/$25) with 95% of the capability on many tasks. For straightforward coding and writing, Sonnet is often sufficient.
Choose Sonnet for: daily coding tasks, cost-sensitive production, when speed matters.
Choose Opus for: complex architecture decisions, difficult debugging, “when you cannot afford to be wrong.”
The practical consensus
From community feedback: developers reserve Opus 4.5 for hard problems—complex refactoring, architectural decisions, stubborn bugs. They use Sonnet for daily work and GPT-5.1 for rapid iteration. Simon Willison noted he “switched back to Sonnet 4.5 and kept on working at the same pace” for routine tasks.
Known limitations
Independent testing and community reports reveal several areas where Opus 4.5 falls short:
Price premium persists: Despite the 67% cut from Opus 4.1, Opus 4.5 costs 2-4x more per token than GPT-5.1 or Gemini 3 Pro. Zvi Mowshowitz noted: “Price is the biggest weakness. Even with a cut, $5/$25 is still on the high end.”
Speed tradeoffs: Multiple reviewers note “Opus is slower than Sonnet. You’ll notice this.” For rapid iteration cycles, the latency can be frustrating.
Over-engineering tendency: Several comparisons noted Opus 4.5 “sometimes introduces architecture or testing infrastructure beyond what the user requested,” producing “more infrastructure than needed” and “thinking like a platform architect rather than service engineer.”
Context window limitations: At 200K tokens, Opus 4.5 has the smallest context among flagships—GPT-5.1 offers 400K and Gemini offers 1M+. No 1M context beta is available for Opus (only Sonnet).
Hallucination rate: Artificial Analysis testing found a 58% hallucination rate (4th lowest among frontier models, but still notable).
Prompt injection risk: While achieving industry-leading 4.7% attack success rate (versus 21.9% for GPT-5.1), Simon Willison cautioned: “Single attempts at prompt injection still work 1/20 times.”
Community reception
The developer community has responded overwhelmingly positively to Opus 4.5, with several reviewers calling it the most significant model release since GPT-4.
The positives
Coding excellence: McKay Wrigley declared “This is the best model for both code and for agents, and it’s not close.” Simon Willison completed 20 commits across 39 files on sqlite-utils with Opus 4.5 handling most of the work during a weekend preview.
“Just gets it” factor: Internal testers and developers consistently note the model handles ambiguity without extensive prompting. Alex Albert, Anthropic’s head of developer relations: “Tasks that were near-impossible for Sonnet 4.5 just a few weeks ago are now within reach.”
Token efficiency: JetBrains reported 50-75% reduction in tool calling and build errors. Multiple teams found Opus 4.5 cheaper per task than Sonnet despite higher per-token pricing.
Agentic reliability: Rakuten found agents “autonomously refine their own capabilities—achieving peak performance in 4 iterations while other models couldn’t match that quality after 10.”
The criticisms
Price concerns: Despite the dramatic cut, the premium pricing remains the primary criticism. Many developers default to Sonnet and only “reach for Opus when stuck.”
Speed: “Opus is slower than Sonnet—you’ll notice this” appears in multiple reviews. The trade-off between intelligence and latency is real.
Occasionally over-engineers: Some users report Opus “produces more infrastructure than needed,” suggesting too much capability can be a liability for simple tasks.
Expert verdict
Zvi Mowshowitz’s analysis: “Claude Opus 4.5 Is The Best Model Available… This is clearly the best model. The benchmark lead is clear and consistent.”
The consensus: Opus 4.5 is the capability leader, but Sonnet remains the daily driver for most workflows.
Version history
| Version | Released | Key changes |
|---|---|---|
| Claude Opus 4.5 | Nov 24, 2025 | 80.9% SWE-bench, 67% price cut, effort parameter |
| Claude Haiku 4.5 | Oct 2025 | Fast tier for 4.5 family |
| Claude Sonnet 4.5 | Sep 2025 | Best-value coding model |
| Claude Opus 4.1 | Aug 2025 | Previous flagship |
| Claude Opus 4 | May 2025 | Initial Claude 4 flagship |
| Claude Sonnet 4 | May 2025 | Tool use, computer use SOTA |
Opus 4.1 remains available via API for users who need backward compatibility.
FAQ
Is Claude Opus 4.5 better than GPT-5.1?
For coding—yes. Opus 4.5 leads with 80.9% vs 76.3% on SWE-bench and 37.6% vs 17.6% on ARC-AGI-2. However, GPT-5.1 costs 4x less for input tokens and is faster. Choose based on whether you prioritise capability or cost.
How much does Claude Opus 4.5 cost?
$5.00 per million input tokens, $25.00 per million output tokens. Cached inputs drop to $0.50/MTok (90% off). Batch API offers 50% discount. This is 67% cheaper than Opus 4.1’s $15/$75 pricing.
Can I use Claude Opus 4.5 for free?
No. Opus 4.5 is not available on the free tier. You need Claude Pro ($20/month), Claude Max ($100+/month), or API access. Pro users access Opus via the model dropdown.
What’s the difference between Opus 4.5 and Sonnet 4.5?
Opus 4.5 (80.9% SWE-bench) is Anthropic’s most capable model for complex tasks. Sonnet 4.5 (77.2% SWE-bench) offers ~95% of the capability at 60% of the price ($3/$15 vs $5/$25). Most developers use Sonnet daily and reserve Opus for hard problems.
Is Opus 4.5 worth upgrading from Opus 4.1?
Yes—you get better performance at 67% lower cost. The token efficiency improvements mean most tasks actually cost less despite the capability increase.
Does Opus 4.5 support 1 million token context?
No. Opus 4.5 has a 200K token context window. Only Sonnet 4 and Sonnet 4.5 support the 1M context beta. This is a notable limitation versus GPT-5.1 (400K) and Gemini 3 Pro (1M+).
What is Opus 4.5 best at?
Complex coding tasks, multi-file refactoring, agentic workflows, computer use, and problems requiring genuine reasoning. It’s the model to use “when you cannot afford to be wrong.”
Where is Opus 4.5 available?
Claude.ai (Pro/Max/Team/Enterprise), Anthropic API, Amazon Bedrock, Google Cloud Vertex AI, Microsoft Azure Foundry (preview), GitHub Copilot (preview), Cursor, and OpenRouter.