Claude Sonnet 4.6

Provider: Anthropic
Status: available
Context: 1,000,000 tok
SWE-bench: 79.6%
Price: $3 / $15 /MTok
Knowledge: 2025-08

Claude Sonnet 4.6 is Anthropic’s mid-tier model and, for most people, the one they actually use day to day. Released on 17 February 2026, it’s the free-tier default on claude.ai and the workhorse that sits beneath the Opus flagships — priced at an unchanged $3 input / $15 output per million tokens, the same as Sonnet 4.5 before it.

The reason it matters is the price-to-performance ratio. On Anthropic’s own benchmarks, Sonnet 4.6 lands within a point or two of Opus 4.6 on coding and computer use — and independent testing from Artificial Analysis goes further, finding it beats Opus 4.6 on agentic knowledge work and terminal tasks. Anthropic says early-access developers preferred it to Sonnet 4.5 “by a wide margin” and often even preferred it to Opus 4.5, its smartest model from late 2025. The catch, which we’ll come back to, is that it gets there by spending a lot more tokens.

Quick specs


Provider	Anthropic
Released	17 February 2026 (API + Claude apps same day)
API model ID	`claude-sonnet-4-6`
Context window	1,000,000 tokens (beta); 200K standard
Max output	128,000 tokens (up to 300K via Batch API beta)
Knowledge cutoff	August 2025 (reliable); trained to May 2025
Input price	$3.00 / MTok
Output price	$15.00 / MTok
SWE-bench Verified	79.6%
OSWorld-Verified	72.5%
GPQA Diamond	89.9%
AA Intelligence Index	51 (2nd at launch)
Best for	High-volume agentic work, coding, computer use, customer-facing agents, document and finance analysis
Limitations	Higher token consumption than 4.5; trails Opus on hard scientific/abstract reasoning and 1M long-context retrieval

TRY CLAUDE SONNET 4.6 →

What’s new in Claude Sonnet 4.6

Sonnet 4.6 is a clean upgrade of Sonnet 4.5 at the same price, with gains concentrated in four areas (Anthropic, Simon Willison).

A genuine computer-use leap

The biggest single jump is in computer use — controlling a real desktop, browser and file system. Sonnet 4.6 scores 72.5% on OSWorld-Verified, up from 61.4% on Sonnet 4.5 (Anthropic), which is effectively level with Opus 4.6 (72.7%). That 11-point gain is what makes it viable as the engine behind browser agents and Claude in Chrome; the insurance-automation firm Pace reported it hitting 94% on their internal benchmark, the highest of any Claude they’d tested.

Adaptive thinking and effort control

Sonnet 4.6 replaces Sonnet 4.5’s fixed thinking-token budgets with two dials. Adaptive thinking lets the model decide how much to reason before answering, and effort control (low, medium, high, max) lets you cap that spend explicitly. This is the same control surface Anthropic later standardised on the Opus line. It’s the source of both the model’s strength and its main weakness: left at max effort, it reasons hard and scores well, but it burns far more tokens doing so (see Known limitations).

A 1M context window and bigger outputs

The context window grows from 200K to 1,000,000 tokens (in beta, via the context-1m-2025-08-07 header), and — unlike legacy Sonnet — the full window bills at the standard rate, with no long-context surcharge. Maximum output rises from 64K to 128K tokens on the standard API, with up to 300K available through the Batch API beta. Microsoft describes the 1M window as generally available on Foundry, while Anthropic’s first-party API still labels it beta — a minor source conflict worth noting if you’re depending on it.

Stronger safety and injection resistance

Anthropic reports Sonnet 4.6’s resistance to prompt injection is “on par with Opus 4.6”, and its single-turn harmless-response rate on violative requests rose to 99.38% (from 97.89% on Sonnet 4.5) (Anthropic System Card). It’s deployed under ASL-3 safeguards. Worth a caveat: commenters on Hacker News noted the system card’s own figures still show a meaningful adversarial-takeover rate under sustained attack, so “Opus-level” resistance is not the same as immunity for agentic deployments.

The Sonnet 4.6 model family

Like the Opus 4.8 line, Sonnet 4.6 is not split into separate model IDs. There is one model — claude-sonnet-4-6 — shaped by two dials.

Dial	Options	What it does
Effort level	`low`, `medium`, `high`, `max`	Caps how many tokens the model spends reasoning
Adaptive thinking	On / off	Lets the model decide its own reasoning depth

Within Anthropic’s lineup, Sonnet 4.6 sits below the Opus flagships and the Mythos-class Fable 5, and above the fast, cheap Haiku 4.5. It’s the “balanced” tier: most of the capability, a fraction of the flagship price.

Benchmark performance

Anthropic published Sonnet 4.6’s scores in its System Card, benchmarked mainly against Opus 4.6 and Sonnet 4.5. All figures in the tables below are Anthropic-reported at adaptive thinking / max effort unless noted; the independent verification section follows.

Coding

Benchmark	Sonnet 4.6	Opus 4.6	Sonnet 4.5
SWE-bench Verified	79.6%	80.8%	77.2%
SWE-bench Multilingual	75.9%	—	—
Terminal-Bench 2.0 (Terminus-2)	59.1%	65.4%	51.0%

The story here is how close Sonnet 4.6 runs to Opus 4.6 on standard coding — 79.6% vs 80.8% on SWE-bench Verified is a gap of barely a point (Anthropic). Terminal-Bench is where the tiers separate: agentic terminal work still favours Opus. For a sense of where this lands against the wider field, see our best AI for coding rankings.

Reasoning and knowledge

Benchmark	Sonnet 4.6	Opus 4.6	Sonnet 4.5
GPQA Diamond	89.9%	91.3%	83.4%
Humanity’s Last Exam (with tools)	49.0%	53.0%	33.6%
Humanity’s Last Exam (no tools)	33.2%	40.0%	17.7%
ARC-AGI-2	58.3%	68.8%	13.6%

This is where the Opus gap is real. On the hardest scientific reasoning (GPQA Diamond) and especially on Humanity’s Last Exam and abstract reasoning (ARC-AGI-2), Opus 4.6 pulls clearly ahead. The ARC-AGI-2 jump from 13.6% to 58.3% over Sonnet 4.5 is enormous, but Opus 4.6’s 68.8% shows the flagship still has real headroom on novel-pattern problems. If your workload is dominated by this kind of task, that’s the signal to escalate to Opus 4.8.

Agentic and computer use

Benchmark	Sonnet 4.6	Opus 4.6	Sonnet 4.5
OSWorld-Verified	72.5%	72.7%	61.4%
τ²-bench Retail	91.7%	91.9%	86.2%
τ²-bench Telecom	97.9%	99.3%	98.0%
MCP-Atlas	61.3%	59.5%	43.8%

This is Sonnet 4.6’s strongest showing. On computer use, tool-calling agents and MCP orchestration it’s at near-parity with Opus 4.6 — and it actually leads Opus 4.6 on MCP-Atlas. For the high-volume agentic work most teams are building, this table is the argument for using Sonnet rather than paying for Opus.

Professional knowledge work

Benchmark	Sonnet 4.6	Opus 4.6	Sonnet 4.5
GDPval-AA (ELO)	1,633	1,606	1,276
Finance Agent (Vals AI)	63.3%	60.1%	—

Sonnet 4.6 beats Opus 4.6 on both of these — GDPval-AA, which measures economically valuable knowledge work, and the Vals AI Finance Agent benchmark, where its 63.3% was a state-of-the-art result at launch. This is the clearest evidence that “Sonnet is the cheap one, Opus is the smart one” is too simple a framing for 2026.

Independent verification

Unlike many launches, Sonnet 4.6 has solid third-party data, and it mostly backs Anthropic up.

Artificial Analysis scored it 51 on its Intelligence Index (v4.0, adaptive thinking, max effort) — an 8-point jump over Sonnet 4.5, good enough for 2nd place behind Opus 4.6 (53) and tied with GPT-5.2. They noted it was the first time Anthropic held the top two spots, and confirmed Sonnet 4.6 “leads all models we have tested on GDPval-AA and Terminal-Bench, outperforming even Claude Opus 4.6” (Artificial Analysis, via OfficeChai).

Vals AI placed it 5th on its broader composite index (60.06%) and independently reproduced the 63.3% Finance Agent figure. On LMArena’s Code Arena it ranked around 3rd (≈1,525 Elo) shortly after launch, and the SRE-tooling firm Rootly measured a 4-plus-point gain over Sonnet 4.5 on its incident-response benchmark — a bigger generational jump than Opus saw.

The one consistent independent caveat is cost. Artificial Analysis found that on GDPval-AA, Sonnet 4.6 “used more than 4x the total tokens than its predecessor” — 280M tokens versus Sonnet 4.5’s 58M, and more even than Opus 4.6’s 160M. Running their full Index cost roughly $2,088 for Sonnet 4.6 versus about $733 for Sonnet 4.5. The intelligence is real; so is the token bill.

Pricing breakdown

Sonnet 4.6 holds the $3/$15 price that has defined the Sonnet tier since Sonnet 4 (Anthropic pricing docs).

Mode	Input (per MTok)	Output (per MTok)	Notes
Standard	$3.00	$15.00	Unchanged from Sonnet 4 / 4.5; full 1M window, no surcharge
Batch API	$1.50	$7.50	Standard 50% discount
Cached input	$0.30	—	Cache reads at 0.1x base; min cacheable prompt 2,048 tokens

The pricing change that matters isn’t the headline rate — it’s that the 1M-token context now bills at the standard rate, removing the ~2x long-context premium that legacy Sonnet charged above 200K input. Cache writes are billed separately (1.25x for 5-minute, 2x for 1-hour), and caching stacks with the Batch API, which can bring effective cached batch input close to $0.15/MTok for high-reuse workloads (PE Collective).

Cost comparison with contemporaries

Model	Input	Output	Notes
Claude Sonnet 4.6	$3.00	$15.00	Near-Opus performance at mid-tier price
Claude Opus 4.8	$5.00	$25.00	Flagship; clear lead on hardest reasoning
Claude Haiku 4.5	see Anthropic	—	Fast, cheap tier below Sonnet
GPT-5.5	$5.00	$30.00	OpenAI flagship; pricier per token
Gemini 3.5 Pro	est. ~$15	est. ~$60	Newer Google flagship; not yet GA as of June 2026

The headline takeaway: Sonnet 4.6 is meaningfully cheaper per token than every frontier flagship, while landing within striking distance of them on a lot of practical work. Just remember the token-consumption caveat above when you model real per-task cost — cheaper per token is not automatically cheaper per job.

How to access Claude Sonnet 4.6

Via API

Sonnet 4.6 is generally available with no waitlist as claude-sonnet-4-6 on the Claude API, and on Amazon Bedrock (anthropic.claude-sonnet-4-6), Google Cloud Vertex AI, Microsoft Foundry and GitHub Copilot. Effective context varies by platform — the Copilot harness, for instance, caps Claude models well below the 1M beta window — so check the limit on your specific platform before relying on a large context.

from anthropic import Anthropic
client = Anthropic()

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=4096,
    messages=[{"role": "user", "content": "Your prompt here"}],
)
print(message.content)

Via the Claude apps

Sonnet 4.6 is the model most people on claude.ai are using, because it’s the default on the free tier. Access by tier (Anthropic pricing):

Tier	Price	Sonnet 4.6	Notes
Free	$0	Yes	Default model
Pro	$20/mo	Yes	Default + limited Opus; $17/mo billed annually
Max	from $100/mo	Yes	Higher limits; full Opus access
Team	$30/user/mo	Yes	5-seat minimum
Enterprise	Custom	Yes	Custom pricing

It also powers Claude Code, Claude Cowork, and the Claude apps for Excel, PowerPoint and Word. Exact message limits per tier aren’t publicly published — Anthropic describes them as a conversation budget — so heavy users should expect to bump into caps on Free and Pro.

How Claude Sonnet 4.6 compares

Sonnet 4.6’s job is to be the sensible default. The comparisons that matter are against the Claude tier above it and the mid-tier models from other labs.

vs Claude Opus 4.8

This is the most important comparison, because it’s the one you’ll make constantly: when do you pay up for Opus 4.8? Opus is clearly stronger on the hardest scientific reasoning (GPQA, Humanity’s Last Exam), abstract reasoning (ARC-AGI-2) and 1M long-context retrieval, and it’s Anthropic’s best coder. But on standard coding, computer use, tool-calling and knowledge work, Sonnet 4.6 is close enough that paying 67% more per token (and up to 2x on output) is hard to justify for most volume work.

Choose Sonnet 4.6 as your default for coding agents, computer use, customer-facing agents and high-throughput pipelines. Choose Opus 4.8 for deep reasoning, large multi-file refactors, multi-agent coordination and anything where getting it exactly right beats getting it cheaply.

vs GPT-5.5 and Gemini 3.5 Pro

Against the other labs’ flagships, Sonnet 4.6 competes on value rather than raw ceiling. GPT-5.5 is the stronger model on agentic terminal coding and sits at a higher price ($5/$30). Gemini 3.5 Pro wasn’t generally available as of mid-June 2026, and its estimated pricing is far higher — so a clean, same-harness head-to-head is data not available from primary sources, and we won’t invent one. What independent data does say is that Sonnet 4.6 was tied with GPT-5.2 on the Artificial Analysis Intelligence Index at launch while costing less to run. See the Google hub for the current Gemini lineup and our best AI models ranking for the full field.

The practical consensus

The honest summary, echoed by Artificial Analysis and independent reviewers: for roughly 90% of real coding, agentic and knowledge-work tasks, Sonnet 4.6 gives you near-flagship quality at a mid-tier price — provided you manage its appetite for tokens. It’s the model you reach for first, and escalate away from only when a task genuinely needs Opus.

Known limitations

Higher token consumption. This is the headline caveat. In max-effort adaptive-thinking mode, Sonnet 4.6 can use 3-4.8x more tokens than Sonnet 4.5 on the same task (Artificial Analysis). Cheaper per token does not mean cheaper per job — cap effort and use caching to control real cost.

Trails Opus on the hardest reasoning. On GPQA Diamond, Humanity’s Last Exam and ARC-AGI-2 the gap to Opus 4.6/4.8 is real (10-17 points on abstract reasoning). For deep scientific or novel-pattern work, Sonnet is not the right tool.

Weaker 1M long-context retrieval. On long-context recall (MRCR), Opus pulls well ahead — so the 1M window is more useful for bulk ingestion than for precise needle-in-haystack retrieval at the top of the range.

Vendor-reported headline figures. Every System Card number is Anthropic-run. Independent verification exists and broadly agrees (Artificial Analysis, Vals AI), but the AIME 2025 score (95.6%) carries Anthropic’s own contamination flag, and MathVista isn’t reported at all.

Post-launch reliability wobbles. The March-April 2026 Claude Code quality complaints (covered below) were traced to harness and config changes, not the weights — but they’re a reminder to pin the model ID and run your own canary evals rather than trusting headline quality blindly.

Community reception

Reception was broadly positive but measured — the consensus phrase, from the AINews/Latent Space write-up, was “a clean upgrade of 4.5, mostly better with some caveats.”

Anthropic’s launch partners reported concrete wins (Anthropic):

Cursor (Michael Truell): “a notable improvement over Sonnet 4.5 across the board, including long-horizon tasks and more difficult problems.”
Cognition / Devin (Scott Wu): it “meaningfully closed the gap with Opus on bug detection.”
Replit (Michele Catasta): “The performance-to-cost ratio of Claude Sonnet 4.6 is extraordinary.”
Windsurf (Jeff Wang): “a viable alternative if you are a heavy Opus user.”

Independent coverage was more reserved. Simon Willison’s write-up was neutral, focused on pricing, the knowledge cutoff and his pelican-SVG test rather than a strong endorsement. The clearest real-world adoption signal: Perplexity’s Comet browser agent defaulted to Sonnet 4.6 for Pro users, and third-party trackers reported it quickly becoming one of the most-called models on Anthropic’s API.

The quality-regression episode

For completeness: between March and April 2026, a wave of complaints claimed Claude Code (running Sonnet 4.6 and Opus 4.6) had “got dumber.” Anthropic’s 23 April post-mortem confirmed the cause was a series of harness and configuration changes — the default effort level was quietly dropped from high to medium on 4 March and reverted on 7 April, alongside an idle-session bug and a verbosity prompt change that hurt coding output. Crucially, the model weights themselves didn’t degrade. It’s a useful case study in why pinning model IDs and running your own evals matters more than headline benchmarks.

Version history

Version	Released	Key changes
Claude Sonnet 4.6	17 Feb 2026	Near-Opus performance, 1M context (beta), adaptive thinking + effort control, computer-use leap (OSWorld 61→72%)
Claude Sonnet 4.5	29 Sep 2025	”Best coding model in the world” at launch; 30+ hr autonomy; SWE-bench 77.2%
Claude Sonnet 4	22 May 2025	Established the $3/$15 Sonnet price point; SWE-bench 72.7%
Claude 3.7 Sonnet	Feb 2025	First hybrid-reasoning Claude; extended thinking

No “Sonnet 5” exists as of June 2026 — Anthropic spent the first half of the year shipping Opus 4.6/4.7/4.8 and the Mythos-class Fable 5 instead, leaving Sonnet 4.6 as the current and widely-deployed mid-tier model. Sonnet 4.5 remains available via the API for teams that haven’t migrated.

FAQ

Is Claude Sonnet 4.6 better than Sonnet 4.5?

Yes, clearly — it leads Sonnet 4.5 on every published benchmark, with the biggest jumps in computer use (OSWorld 61.4% → 72.5%) and abstract reasoning (ARC-AGI-2 13.6% → 58.3%), at the same $3/$15 price. The one trade-off is that it uses substantially more tokens to get there, so re-run your own cost evals before migrating.

How much does Claude Sonnet 4.6 cost?

$3 per million input tokens and $15 per million output tokens — unchanged from Sonnet 4.5. The full 1M context window bills at that standard rate with no long-context premium. On the apps it’s included on every tier, including Free.

Can I use Claude Sonnet 4.6 for free?

Yes. It’s the default model on the free tier of claude.ai, subject to usage limits. Pro ($20/mo) raises those limits and adds some Opus access.

How does Sonnet 4.6 compare to Opus 4.8?

Opus 4.8 is stronger on the hardest scientific and abstract reasoning, large refactors and 1M long-context retrieval. But on standard coding, computer use and knowledge work, Sonnet 4.6 is close enough that it’s the better default for most high-volume work. Use Opus when a task genuinely needs the extra ceiling.

What is the context window and knowledge cutoff?

A 1,000,000-token context window in beta (200K standard), up to 128,000 output tokens, and a reliable knowledge cutoff of August 2025 (Anthropic’s system card describes training data up to May 2025).

Was Sonnet 4.6 affected by the June 2026 export-control suspension?

No. The 12 June 2026 US export-control directive suspended only the Mythos-class Fable 5 and Mythos 5. Sonnet 4.6 and the rest of the Claude lineup are fully available, and Sonnet 4.6 is one of the recommended fallbacks for stranded Fable 5 workloads.

Does Sonnet 4.6 support computer use and MCP?

Yes — both are core strengths. It scores 72.5% on OSWorld-Verified (near Opus 4.6) and actually leads Opus 4.6 on MCP-Atlas, which is why it’s a common engine for browser and tool-calling agents.

Last verified 18 June 2026. Benchmark figures are Anthropic-reported via the Claude Sonnet 4.6 System Card unless otherwise noted; independent figures from Artificial Analysis, Vals AI and LMArena are labelled as such. Pricing and availability current as of the publication date and subject to change.