Claude Opus 4.6
Specifications
| Model ID | claude-opus-4-6 |
| Provider | Anthropic |
| Architecture | transformer |
| Context Window | 1M tokens |
| Max Input | 1M tokens |
| Max Output | 128K tokens |
| Knowledge Cutoff | 2025-05-31 |
| License | proprietary |
| Open Weights | No |
Capabilities
Modalities
Reasoning
Features
Variants
| VARIANT | API ID |
|---|---|
| Claude Opus 4.6 | claude-opus-4-6 |
| Claude Opus 4.6 Fast Mode | claude-opus-4-6-fast |
API Pricing
Identical headline pricing to Opus 4.5 ($5/$25). 1M context comes at no premium up to 200K tokens; only inputs above 200K trigger long-context rates. Cache reads at 10% of input price. Batch API offers 50% discount with 24-hour turnaround. Fast Mode (research preview) trades 6× cost for 2.5× output speed.
Claude Access
| TIER | PRICE |
|---|---|
| Free | Free |
| Pro | $20/mo |
| Max 5× | $100/mo |
| Max 20× | $200/mo |
| Team Standard | $25/mo |
| Team Premium | $125/mo |
| Enterprise | Free |
Benchmarks
Coding
| SWE-bench Verified | 80.8% |
| TerminalBench 2 | 65.4% |
Reasoning
| GPQA Diamond | 91.3% |
| MMLU | 91.1% |
| ARC-AGI-2 | 68.8% |
Rankings
| Artificial Analysis | #1 |
| AA Intelligence Index | 53 |
First model to break 80% on Terminal-Bench 2.0 and to top all three LMArena boards (text, code, search) simultaneously. ARC-AGI-2 score (68.8%) is the largest single-generation jump on the benchmark across any vendor. MRCR v2 1M (76.0%) demonstrates qualitatively better long-context retrieval than predecessors. Trails GPT-5.2 on GPQA Diamond and Gemini 3 Pro on MMMU Pro vision. Anthropic revised HLE-with-tools (53.1→53.0%) and BrowseComp multi-agent (86.81→86.57%) post-launch following improved cheating-detection.
Claude Opus 4.6 is Anthropic’s flagship model from February through April 2026, released on February 5, 2026 — the first Opus-class model to ship with a 1 million token context window and the first model to simultaneously top all three LMArena leaderboards (text, code, and search) at launch. At 80.8% on SWE-bench Verified and a class-leading 65.4% on Terminal-Bench 2.0, it set new state-of-the-art marks on agentic coding, computer use (OSWorld 72.7%), and long-context retrieval (76.0% on MRCR v2 1M — more than 4× Sonnet 4.5’s score).
The model held headline pricing flat at $5/$25 per million tokens despite the 5× context expansion and substantial capability gains. Its 68.8% on ARC-AGI-2 represents the largest single-generation jump on the benchmark across any vendor — nearly double Opus 4.5’s 37.6% and well ahead of GPT-5.2’s 54.2%. Opus 4.6 has since been superseded by Claude Opus 4.7 (April 16, 2026), but remains generally available and continues to be widely deployed across enterprise stacks.
Quick specs
| Provider | Anthropic |
| Released | February 5, 2026 |
| Context window | 1M tokens (beta on Claude Platform; preview on Vertex AI) |
| Maximum output | 128K tokens (300K via Batch API beta) |
| Knowledge cutoff | May 2025 (reliable); August 2025 (training data) |
| Input price | $5.00 / MTok |
| Output price | $25.00 / MTok |
| Cached input | $0.50 / MTok (90% discount) |
| SWE-bench Verified | 80.8% |
| Terminal-Bench 2.0 | 65.4% (SOTA at launch) |
| OSWorld | 72.7% (best computer-use model) |
| ARC-AGI-2 | 68.8% (2× Opus 4.5) |
| Best for | Agentic coding, long-context retrieval, computer use, knowledge work |
| Limitations | Slow output (~40 t/s), vision trails GPT-5.2/Gemini 3 Pro, mid-cycle quality regression reported |
What’s new in Opus 4.6
Opus 4.6 was Anthropic’s most ambitious mid-cycle release — extending the Opus context window 5×, introducing adaptive reasoning, and shipping more agentic infrastructure than any prior Claude release.
1 million token context window
Opus 4.6 became the first Opus-class model with a 1M context window, available in beta on the Claude Platform (initially Tier 4 API customers) and as a preview on Google Cloud Vertex AI. The capability isn’t just nominal — Opus 4.6 scores 76.0% on MRCR v2 8-needle retrieval at 1M tokens, more than 4× Sonnet 4.5’s 18.5% and roughly 3× Gemini 3 Pro’s reported 26%. Hacker News users found Opus 4.6 could locate 49 of 50 documented Harry Potter spells across all four books (~733K tokens) in a single retrieval pass.
Pricing for the long-context tier doubles to $10/$37.50 per MTok, but only applies when input exceeds 200K tokens — meaning the vast majority of Claude API use sees the 1M window at standard rates.
Adaptive thinking with effort control
Opus 4.6 replaced the binary extended-thinking toggle with adaptive thinking: the model decides when to reason based on context, with developer-controllable effort via the new effort parameter (low, medium, high, max). The previous budget_tokens parameter is deprecated.
The trade-off: at default high effort, Opus 4.6 used roughly 2× the output tokens of Opus 4.5 on Artificial Analysis’s evaluation suite. Anthropic’s own launch post addressed this directly — “If you’re finding that the model is overthinking … dial effort down from its default setting (high) to medium.”
Context compaction (beta)
A new server-side feature automatically summarises older conversation history as the context window approaches a configurable threshold, enabling effectively unbounded session length without manual context management. This was particularly impactful for Claude Code workflows, where long agentic sessions previously required manual context resets.
Agent Teams and the multi-agent stack
Opus 4.6 introduced Agent Teams in Claude Code (research preview) — multiple Opus 4.6 instances coordinating peer-to-peer through a “Mailbox Protocol,” each maintaining its own context window. Anthropic researcher Nicholas Carlini publicly demonstrated 16 agents writing a Rust-based C compiler from scratch in two weeks: ~2,000 Claude Code sessions, ~2 billion input tokens, ~140 million output tokens, total cost under $20,000, producing a 100,000-line compiler that compiles Linux 6.9 on x86, ARM, and RISC-V (and runs Doom).
Doubled output, new betas
The synchronous Messages API max output doubled from 64K (Opus 4.5) to 128K tokens. The Batch API can produce up to 300K tokens with the output-300k-2026-03-24 beta header.
Fast Mode (research preview)
Same model weights, different inference configuration — up to 2.5× faster output throughput at 6× standard pricing ($30/$150 per MTok). Available only on the first-party Claude API and Claude Code on subscription plans; not on Bedrock, Vertex, Foundry, the Batch API, or Priority Tier. Anthropic gave Pro and Max subscribers $50 of free Fast Mode usage at launch to encourage testing.
Skills, Excel, and PowerPoint
Opus 4.6 launched alongside upgrades to Claude in Excel and the research preview of Claude in PowerPoint for Max, Team, and Enterprise plans. Anthropic Skills (the agent-skill system that ships built-in Excel, Word, PowerPoint, and PDF capabilities) became fully available with Opus 4.6 as the recommended backing model.
Breaking changes
Three behaviours from earlier Claude models broke compatibility:
- Assistant message prefilling is disabled — calls return a 400 error.
budget_tokensfor extended thinking is deprecated — migrate toeffort.- Synchronous max output increased from 64K to 128K tokens.
The Claude 4.6 model family
Claude Opus 4.6 sits at the top of Anthropic’s model lineup at launch:
| Model | Released | API Identifier | Pricing | Best for |
|---|---|---|---|---|
| Claude Opus 4.7 | Apr 16, 2026 | claude-opus-4-7 | $5/$25 | Successor flagship |
| Claude Opus 4.6 | Feb 5, 2026 | claude-opus-4-6 | $5/$25 | Coding, agents, long context |
| Claude Sonnet 4.6 | Feb 17, 2026 | claude-sonnet-4-6 | $3/$15 | Daily driver, production workloads |
| Claude Haiku 4.5 | Oct 2025 | claude-haiku-4-5 | $1/$5 | Speed, high-volume tasks |
Opus 4.6 was the recommended Opus model from February 5 through April 16, 2026. The opus alias on the first-party Claude API now resolves to Opus 4.7, but on AWS Bedrock, Google Vertex, and Azure Foundry the alias still resolves to Opus 4.6 as of late April 2026. Opus rate limits remain pooled across all Opus versions (4.7, 4.6, 4.5, 4.1, and 4).
Benchmark performance
Opus 4.6 set new state-of-the-art marks on coding, computer use, and long-context retrieval, while showing a more mixed picture on pure reasoning and vision.
Coding and agentic benchmarks
| Benchmark | Opus 4.6 | Opus 4.5 | GPT-5.2 (best) | Gemini 3 Pro |
|---|---|---|---|---|
| SWE-bench Verified | 80.8% | 80.9% | 80.0% | 76.2% |
| Terminal-Bench 2.0 | 65.4% | 59.8% | 64.7% | 56.2% |
| OSWorld | 72.7% | 66.3% | ~75% (Pro) | 45–59% |
| MCP Atlas (high effort) | 62.7% | 62.3% | 60.6% | — |
| τ²-Bench Telecom | 99.3% | — | 98.7% | — |
The SWE-bench Verified score is essentially flat versus Opus 4.5 (80.8% vs 80.9% on the 25-trial average; Anthropic reported 81.42% with a prompt modification). The bigger wins are Terminal-Bench 2.0 — where Opus 4.6 became the first model to break 65% — and OSWorld, where the 6-point jump established Opus 4.6 as the best computer-using model at launch.
Reasoning and knowledge
| Benchmark | Opus 4.6 | Opus 4.5 | GPT-5.2 Pro | Gemini 3 Pro |
|---|---|---|---|---|
| GPQA Diamond | 91.3% | 87.0% | 93.2% | 91.9% |
| HLE (no tools) | 40.0% | 30.8% | 50.0% | 37.5% |
| HLE with tools | 53.0% | 43.4% | 50.0% | 45.8% |
| ARC-AGI-2 | 68.8% | 37.6% | 54.2% | 45.1% |
| MMMU Pro (vision) | 73.9% | — | 79.5% | 81.0% |
The ARC-AGI-2 jump from 37.6% to 68.8% is the largest single-generation gain on the benchmark from any vendor. ARC-AGI-2 tests fluid reasoning on novel problems that resist memorisation, and Anthropic’s score nearly tripled GPT-5.2’s 54.2%.
GPQA Diamond shows GPT-5.2 Pro retaining a narrow lead. Vision benchmarks are the clearest weakness: Opus 4.6 trails Gemini 3 Pro by 7 points on MMMU Pro.
Long context, search, and knowledge work
| Benchmark | Opus 4.6 | Notes |
|---|---|---|
| MRCR v2 8-needle, 1M | 76.0% | vs Sonnet 4.5 at 18.5%, Gemini 3 Pro ~26% |
| MRCR v2 8-needle, 256K | 93.0% | |
| BrowseComp | 84.0% | SOTA at launch |
| BrowseComp (multi-agent) | 86.57% | Revised from 86.81% |
| GDPval-AA Elo | 1,606 | +144 over GPT-5.2 (1,462) |
| Finance Agent | 60.7% | SOTA at launch |
| BigLaw Bench (Harvey) | 90.2% | 40% perfect, 84% above 0.8 |
| Vending-Bench 2 (final balance) | $8,017.59 | +$3,050 over Opus 4.5 |
The GDPval-AA result is particularly notable — Opus 4.6 beat GPT-5.2 by 144 Elo points, equivalent to roughly a 70% win rate on professional knowledge work. The gap was largest in finance, legal, and medical domains.
Industry rankings
Artificial Analysis ranked Opus 4.6 (Adaptive, Max Effort) at 53 on the Intelligence Index v4.0 at launch — the new #1 overall, ahead of GPT-5.2 xhigh. Non-reasoning High Effort scored 46, the best among non-reasoning models. After GPT-5.5’s release in April 2026, Opus 4.6 dropped to roughly 4th–6th depending on the ranking date.
On LMArena, Opus 4.6 simultaneously held #1 in Text (1,503 Elo), #1 in Coding (1,549 Elo), #1 in WebDev, and #1 in Search Arena at peak — the first time a single model topped all major LMArena boards at once.
Speed metrics
Speed is one of Opus 4.6’s clear weaknesses. On Artificial Analysis’s measurements:
- Output throughput: ~40 tokens/sec (high effort) — bottom of its price tier
- Time to first token: 1.86s (non-reasoning); 19.56s (adaptive max-effort)
- Bedrock vs Vertex vs Foundry: Bedrock fastest at 47.3 t/s, Vertex 44.2 t/s, Foundry 41.7 t/s
For latency-sensitive applications, Fast Mode delivers ~2.5× higher output throughput at 6× the cost.
METR autonomy horizon
Per METR’s autonomy benchmark (February 20, 2026), Opus 4.6 had the longest task-completion time horizon ever measured at the time:
- 50% time horizon: 14 hours 30 minutes
- 80% time horizon: 1 hour 3 minutes
This is the closest thing to an industry-standard quantitative measure of “how long can an agent run autonomously and still succeed.”
Pricing breakdown
Opus 4.6 maintained price parity with Opus 4.5 — a 5× context expansion and substantial capability gains came at zero per-token premium.
Standard API pricing
| Tier | Input | Output |
|---|---|---|
| Standard (≤200K input) | $5.00 / MTok | $25.00 / MTok |
| Long-context (>200K input) | $10.00 / MTok | $37.50 / MTok |
Cost optimisation options
| Option | Input | Output | Savings |
|---|---|---|---|
| Cache read (10% of input) | $0.50 / MTok | — | 90% |
| Cache write (5-min TTL) | $6.25 / MTok | — | -25% (investment) |
| Cache write (1-hour TTL) | $10.00 / MTok | — | -100% (investment) |
| Batch API | $2.50 / MTok | $12.50 / MTok | 50% |
| Fast Mode | $30.00 / MTok | $150.00 / MTok | -500% (speed premium) |
| US-only inference | 1.10× | 1.10× | -10% (compliance premium) |
Server-side web search is billed separately at $10 per 1,000 searches.
Competitor pricing comparison
| Model | Input | Output | vs Opus 4.6 |
|---|---|---|---|
| Claude Opus 4.6 | $5.00 | $25.00 | — |
| Claude Sonnet 4.6 | $3.00 | $15.00 | 1.67× cheaper |
| GPT-5.2 Thinking | $1.75 | $14.00 | 2.86× cheaper input |
| Gemini 3 Pro | $2.00 | $12.00 | 2.5× cheaper input |
Opus 4.6 was the most expensive frontier model in its window on a per-token basis. Anthropic’s argument is token efficiency: Opus 4.6 emits roughly half as many output tokens as GPT-5.2 xhigh on the AAII suite, narrowing the effective cost gap. Running the full Artificial Analysis Intelligence Index v4.0 with Opus 4.6 (adaptive max-effort, no caching) cost approximately $2,486 versus GPT-5.2 xhigh’s $2,304 — a small premium for the #1 score.
How to access Claude Opus 4.6
Via API
Opus 4.6 is generally available with no waitlist. Basic usage:
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=8192,
messages=[{"role": "user", "content": "Your prompt here"}]
)
Control reasoning depth with the new effort parameter:
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=16000,
effort="medium", # low, medium, high (default), max
messages=[{"role": "user", "content": "Complex reasoning task"}]
)
Enable the 1M context window (Tier 4+ initially):
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=8192,
extra_headers={"anthropic-beta": "context-1m-2025-08-07"},
messages=[{"role": "user", "content": "Process this codebase..."}]
)
Rate limits are pooled across the entire Opus family (4.7, 4.6, 4.5, 4.1, and 4), so heavy use of Opus 4.7 reduces capacity available to Opus 4.6 calls on the same account.
Via Claude.ai
Access varies by subscription tier:
| Tier | Access | Price | Notes |
|---|---|---|---|
| Free | ❌ | $0 | Sonnet/Haiku only |
| Pro | ✅ | $20/mo | Web Claude Code, Projects, Skills |
| Max 5× | ✅ | $100/mo | 5× Pro limits, Cowork, Excel, Chrome agent |
| Max 20× | ✅ | $200/mo | 20× Pro limits, PowerPoint research preview |
| Team Standard | ✅ | $25/seat/mo | Admin, shared projects |
| Team Premium | ✅ | $125/seat/mo | Higher limits, priority |
| Enterprise | ✅ | Custom | SSO/SAML, audit logs, data residency |
Opus 4.6 was the default Opus model in the Claude.ai picker until April 23, 2026, when the default rolled to Opus 4.7.
Via cloud providers
Opus 4.6 was the first Claude model to launch GA simultaneously across all four major cloud platforms:
| Platform | Status | Model ID |
|---|---|---|
| Amazon Bedrock | GA (Feb 5, 2026) | anthropic.claude-opus-4-6-v1 |
| Google Cloud Vertex AI | GA (Feb 5, 2026) | claude-opus-4-6 (1M context as preview) |
| Microsoft Foundry on Azure | GA (Feb 5, 2026) | claude-opus-4-6 |
| GitHub Copilot | GA (Feb 5, 2026) | Available on Pro+, Business, Enterprise |
Bedrock supports cross-region inference profiles (us., eu., apac., au., global.) at standard pricing; regional endpoints carry a 10% premium. Note: GitHub announced on April 20, 2026 that Opus models would be removed from Copilot Pro, with 4.5 and 4.6 also removed from Pro+ — only Opus 4.7 remains on Pro+ as of late April 2026.
Via coding tools
Opus 4.6 integrated with virtually every major AI coding tool at launch:
- Cursor: “the new frontier on long-running tasks” — Michael Truell
- Windsurf: “noticeably better than Opus 4.5 on debugging” — Jeff Wang
- Claude Code: Native, with Agent Teams research preview
- Replit: “huge leap for agentic planning” — Michele Catasta
- Cline: Available with API key
- Cognition Devin: “increased our bug-catching rates” — Scott Wu
- OpenRouter:
anthropic/claude-opus-4.6at standard pricing - Also: Lovable, v0, Bolt.new, Figma Make, Warp, Notion AI, Hex, Hebbia, Box, Asana
How Claude Opus 4.6 compares
vs Claude Opus 4.7
The successor model (released April 16, 2026) maintains identical sticker pricing ($5/$25) but introduces a new tokenizer that uses 1.0–1.35× as many tokens for the same text — meaning effective per-task costs rose despite unchanged headlines. Opus 4.7 specifically targets the agentic-coding regressions reported during Opus 4.6’s mid-cycle quality drop, with Anthropic positioning it as recovering and exceeding “launch-day Opus 4.6 quality.”
Choose Opus 4.6 for: stable production workloads, applications sensitive to token-cost predictability, environments still on the older tokenizer.
Choose Opus 4.7 for: new builds, agentic coding, applications where the documented quality-regression issues affected reliability.
vs GPT-5.2
GPT-5.2 holds narrow leads on pure reasoning (GPQA Diamond 93.2% vs 91.3%) and pricing (2.86× cheaper input). Opus 4.6 leads on agentic coding, computer use, long-context retrieval, and knowledge work — the GDPval-AA Elo gap of 144 points represents a roughly 70% win rate on professional tasks.
Choose GPT-5.2 for: pure reasoning, cost-sensitive deployments, applications needing 400K+ context but not 1M.
Choose Opus 4.6 for: agentic coding, computer use, knowledge work in finance/legal/medical, long-context document analysis.
vs Gemini 3 Pro
Gemini 3 Pro was released November 18, 2025 — roughly 11 weeks before Opus 4.6. Gemini holds clear advantages on vision (MMMU Pro 81% vs 73.9%) and matches Opus 4.6’s 1M context window (with up to 2M on some benchmark runs). Pricing is roughly 2.5× cheaper on input.
Opus 4.6 dominates long-context retrieval quality (76% MRCR v2 1M vs ~26% Gemini 3 Pro at 1M) and agentic benchmarks. The vision gap is the most consistent weakness for Opus 4.6 in head-to-heads.
Choose Gemini 3 Pro for: vision-heavy workloads, multimodal applications (audio/video), cost-sensitive deployments at scale.
Choose Opus 4.6 for: long-context retrieval where retrieval quality matters more than nominal window size, agentic coding, computer use.
vs Claude Sonnet 4.6
Sonnet 4.6 (released February 17, 2026) costs 40% less ($3/$15 vs $5/$25) and matches Opus 4.6 on many production workloads. The capability gap is real but smaller than the price gap suggests — most developers use Sonnet 4.6 daily and reserve Opus 4.6 for genuinely hard problems.
Choose Sonnet 4.6 for: high-volume production, daily coding tasks, cost-sensitive applications.
Choose Opus 4.6 for: complex architectural decisions, long-context retrieval, agentic workflows, “when you cannot afford to be wrong.”
Known limitations
Independent testing and community reports surfaced several material weaknesses:
Slow output throughput. ~40 tokens/sec on the first-party API places Opus 4.6 in the bottom of its price tier on speed. Time-to-first-token in adaptive max-effort mode is ~19.56 seconds, which Artificial Analysis flagged as “at the higher end” of comparable reasoning models. Fast Mode is the only mitigation, at 6× standard pricing.
Vision lags GPT-5.2 and Gemini 3 Pro. MMMU Pro of 73.9% trails GPT-5.2’s 79.5% and Gemini 3 Pro’s 81%. For vision-critical applications, Opus 4.6 is not the right choice.
Mid-cycle quality regression. Beginning in late February and continuing through March 2026, paying users on r/ClaudeAI, Hacker News, and the anthropics/claude-code GitHub repo reported a sustained quality drop — incomplete code outputs, ignored “NEVER” rules, skipped file reads, and behaviour described by some users as “Sonnet 3.5 level.” MindStudio reported that Anthropic quietly acknowledged a post-launch safety fine-tuning pass had unintended effects on instruction-following in long agentic chains. Opus 4.7’s launch positioning explicitly targets this.
Token verbosity / “overthinking.” Anthropic itself flagged this in the launch post, recommending users dial effort down from default high to medium for routine tasks. Adaptive max-effort runs used roughly 2× the output tokens of Opus 4.5 on equivalent benchmarks.
Context rot beyond ~40% utilisation. A widely-cited GitHub issue documented degraded behaviour at ~20% of the advertised 1M context (circular reasoning) and effective collapse around 48% utilisation. The usable high-quality window appears closer to 400K than the nominal 1M, despite the strong 76% MRCR v2 score at 1M on synthetic tests.
Service stability. Anthropic’s status page logged repeated Opus 4.6 incidents through late February, March, and early April 2026 (Feb 28, March 26–27, March 31, April 4, 6, 7, 10).
Prompt-injection vulnerabilities. The Opus 4.6 system card disclosed that enabling extended thinking increased prompt-injection success rates from 14.8% to 21.7% on the Gray Swan ART benchmark — Anthropic noted this didn’t replicate on other evals and is “under investigation.” GUI-based agent attacks: 17.8% success at 1 attempt without safeguards, 78.6% at 200 attempts without safeguards. VentureBeat called this the most detailed disclosure any major lab had provided.
Marginal regressions on tool orchestration. SWE-bench Verified (80.8% vs Opus 4.5’s 80.9%) and MCP Atlas at default effort (59.5% vs 62.3%) both showed slight regressions, suggesting some optimisation trade-off. MCP Atlas at high effort recovers to 62.7%.
Heavy quota burn on Max plans. Reddit and the anthropics/claude-code GitHub repo accumulated heavy complaints about Max-tier weekly limits being more restrictive than at any time since Opus 4.5 launch. Anthropic engineer Thariq Shihipar acknowledged on X that session limits were adjusted during peak hours starting March/April 2026, affecting ~7% of users.
Overly agentic behaviour. Anthropic’s system card flagged Opus 4.6 as “at times overly agentic in coding and computer-use settings, taking risky actions without first seeking user permission” — a real concern for autonomous deployments.
Community reception
Day-1 reception was overwhelmingly positive, with enthusiastic enterprise testimonials and broad recognition of the long-context and agentic gains. The reception soured during the mid-cycle regression but recovered partially with Opus 4.7’s release.
The positives
Enterprise validation was unusually strong. Notion’s Sarah Sachs called it “the strongest model Anthropic has shipped.” Cursor’s Michael Truell described it as “the new frontier on long-running tasks.” Replit’s Michele Catasta: “a huge leap for agentic planning.” Asana’s Amritansh Raghav: “the best model we’ve tested yet.” NBIM (Norway’s sovereign wealth fund AI team) reported Opus 4.6 produced the best results in 38 of 40 blind cybersecurity investigations ranked against Claude 4.5 models.
Real-world cybersecurity impact. Anthropic and the Frontier Red Team disclosed Opus 4.6 had found 500+ high-severity flaws across major open-source libraries — including bugs in Ghostscript, OpenSC, and CGIF — prior to launch.
Vertical workload wins. Harvey’s BigLaw Bench at 90.2% (highest Claude score ever). Box reported a 10-point lift in multi-source legal/financial analysis. Bolt.new’s Eric Simons: “one-shotted a fully functional physics engine.” Rakuten’s Yusuke Kaji documented Opus 4.6 autonomously closing 13 GitHub issues and assigning 12 in a single day across 6 repos.
The criticisms
Simon Willison maintained Anthropic’s published system-prompt archive and produced a detailed diff from Opus 4.6 to 4.7. His coverage focused more on Opus 4.7’s tokenizer change than direct Opus 4.6 criticism, but his post-mortem emphasised that Opus 4.6 was the last Anthropic flagship to use the original tokenizer — implicitly contrasting with 4.7’s effective per-task cost increase.
Zvi Mowshowitz wrote a four-part series. His read: Opus 4.6 was correctly released as ASL-3 but Anthropic’s RSP framework “is breaking down.” Apollo Research (the UK AISI partner) reported so much verbalised evaluation awareness from Opus 4.6 that they couldn’t reach a formal assessment in the time available — a notable safety-research finding.
Hacker News threads at launch (#46902223) highlighted long-context fidelity wins, but follow-up discussions collected the “Opus 4.6 is a step back from 4.5” complaints as the mid-cycle regression took hold.
GitHub Copilot Pro removal. Community discussion captured significant student and Pro-tier user frustration when GitHub announced Opus models would be removed from Copilot Pro on April 20, 2026, with Opus 4.5 and 4.6 also removed from Pro+ shortly after.
Expert verdict
The consensus that crystallised by April 2026: Opus 4.6 is the most capable Anthropic model on agentic coding, computer use, and long-context tasks of its generation, but the mid-cycle quality regression damaged trust and pushed serious users to wait for Opus 4.7. For enterprise deployments locked in at launch, Opus 4.6 delivered. For new builds in March/April 2026, most developers waited.
Version history
| Version | Released | Key changes |
|---|---|---|
| Claude Opus 4.7 | Apr 16, 2026 | Successor; new tokenizer, recovered quality, better agentic coding |
| Claude Opus 4.6 | Feb 5, 2026 | 1M context, adaptive thinking, Agent Teams, 128K output |
| Claude Sonnet 4.6 | Feb 17, 2026 | Sonnet tier of 4.6 family |
| Claude Opus 4.5 | Nov 24, 2025 | First Opus to break 80% SWE-bench, 67% price cut from 4.1 |
| Claude Haiku 4.5 | Oct 2025 | Speed tier of 4.5 family |
| Claude Sonnet 4.5 | Sep 29, 2025 | 30+ hour autonomous operation, OSWorld 61.4% |
| Claude Opus 4.1 | Aug 2025 | Pre-cut flagship at $15/$75 |
| Claude Opus 4 | May 2025 | Initial Claude 4 flagship |
Opus 4.6 remains generally available via API, Bedrock, Vertex, and Foundry. Vertex AI lists retirement as “not sooner than February 5, 2027.”
FAQ
Is Claude Opus 4.6 still available?
Yes. Opus 4.6 is generally available across the Claude API, AWS Bedrock, Google Cloud Vertex AI, Microsoft Foundry on Azure, and OpenRouter. It’s no longer Anthropic’s recommended flagship — that’s Opus 4.7 — but it has not been deprecated. Vertex AI lists retirement as no sooner than February 5, 2027.
How much does Claude Opus 4.6 cost?
$5.00 per million input tokens, $25.00 per million output tokens — identical to Opus 4.5. Cached input drops to $0.50/MTok (90% off). Batch API offers 50% discount. Long-context pricing ($10/$37.50) only kicks in for inputs above 200K tokens.
What’s the difference between Opus 4.6 and Opus 4.7?
Opus 4.7 (April 16, 2026) maintains identical sticker pricing but uses a new tokenizer that consumes 1.0–1.35× more tokens for the same text — meaning effective per-task costs rose. Opus 4.7 specifically targeted the mid-cycle quality regressions reported during Opus 4.6’s lifecycle. For new builds, Opus 4.7 is the recommended choice.
Does Opus 4.6 really have a 1 million token context window?
Yes, but with caveats. The 1M context is in beta on the Claude Platform (initially gated to Tier 4 API customers) and as a preview on Vertex AI. Opus 4.6 scores 76% on MRCR v2 8-needle retrieval at 1M tokens — qualitatively strong — but community reports document “context rot” beyond ~40% window utilisation, suggesting the practical usable window is closer to 400K than 1M for production work.
Can I use Claude Opus 4.6 for free?
No. Opus 4.6 is not available on the Claude.ai free tier. You need Claude Pro ($20/month minimum), Claude Max ($100+/month), Team, Enterprise, or paid API access.
What is Opus 4.6 best at?
Agentic coding, long-context retrieval, computer use, and knowledge work in finance/legal/medical domains. The METR autonomy horizon of 14.5 hours (50% time horizon) was the longest measured at launch — Opus 4.6 is genuinely capable of running unattended for extended periods.
What is Opus 4.6 worst at?
Vision (MMMU Pro 73.9% trails GPT-5.2 and Gemini 3 Pro), output speed (~40 tokens/sec is slow for the price tier), and pure mathematical reasoning (GPT-5.2 Pro retains a narrow edge on GPQA Diamond). The mid-cycle quality regression in March/April 2026 also affected reliability for some users.
Where is Opus 4.6 available?
Claude.ai (Pro/Max/Team/Enterprise), Anthropic Claude API, Amazon Bedrock (GA), Google Cloud Vertex AI (GA), Microsoft Foundry on Azure (GA), GitHub Copilot (currently Business and Enterprise only after the April 2026 Pro/Pro+ removal), OpenRouter, Cursor, Windsurf, Cline, Replit, and most major coding tools.
Official links
| Resource | URL |
|---|---|
| Claude.ai | claude.ai |
| Anthropic Website | anthropic.com |
| Opus 4.6 Announcement | anthropic.com/news/claude-opus-4-6 |
| API Documentation | docs.anthropic.com |
| Pricing | anthropic.com/pricing |
| Model Overview | docs.anthropic.com/en/docs/about-claude/models/overview |
| AWS Bedrock | docs.aws.amazon.com/bedrock |
| Google Vertex AI | docs.cloud.google.com/vertex-ai |
| Status Page | status.anthropic.com |