Claude Sonnet 4.5: Complete Guide to Anthropic's Best Coding Model - Pricing, Benchmarks & Review

VARIANT	API ID	DESCRIPTION	SWE
Claude Sonnet 4.5	`claude-sonnet-4-5-20250929`	Flagship coding model with extended thinking capabilities	—
Claude Sonnet 4.5 (alias)	`claude-sonnet-4-5`	Always points to latest Sonnet 4.5 version	—

TIER	PRICE	CONTEXT	RATE LIMIT
Free	Free	200K	~9 messages per 5 hours
Pro	$20/mo	200K	40-80 hours Sonnet weekly
Max 5×	$100/mo	200K	140-280 hours Sonnet/week
Max 20×	$200/mo	200K	240-480 hours Sonnet/week
Team Standard	$25/mo	200K	—
Team Premium	$150/mo	200K	—
Enterprise	Free	500K	Custom

Claude Sonnet 4.5 is Anthropic’s flagship coding model, released on September 29, 2025. At launch, it achieved the highest score on SWE-bench Verified at 77.2%—establishing itself as the leading model for real-world software engineering tasks. Anthropic positioned it as their recommended model for “basically every use case,” delivering significant improvements in autonomous operation: 30+ hours of continuous work compared to 7 hours for its predecessor.

The model dominates computer use benchmarks with a 61.4% OSWorld score (45% improvement over Sonnet 4) and scores 100% on AIME 2025 when using Python tools. Critically, it maintains price parity with Sonnet 4 at $3/$15 per million tokens—making the upgrade a pure capability gain at zero additional cost. For developers building AI-assisted coding workflows, Sonnet 4.5 represents the current sweet spot between capability and cost.

Quick specs


Provider	Anthropic
Released	September 29, 2025
Context window	200K tokens (1M beta at Tier 4+)
Max output	64K tokens
Knowledge cutoff	January 31, 2025
Input price	$3.00 / MTok
Output price	$15.00 / MTok
Cached input	$0.30 / MTok (90% savings)
SWE-bench Verified	77.2% (82.0% high compute)
OSWorld	61.4% (best-in-class computer use)
Best for	Coding, agents, computer use, high-volume production
Limitations	Pure reasoning trails GPT-5, costs 2.4× more than GPT-5 input

TRY CLAUDE SONNET 4.5 →

What’s new in Sonnet 4.5

Sonnet 4.5 represents the most significant Sonnet upgrade to date, focusing on agentic capabilities and coding performance while maintaining the speed and cost that made Sonnet the default choice for production workloads.

30+ hour autonomous operation

The headline improvement is sustained autonomous work. Anthropic reports Sonnet 4.5 can operate for 30+ hours continuously compared to 7 hours for Sonnet 4. This enables true overnight coding tasks, multi-day research projects, and complex multi-system integrations without human intervention.

State-of-the-art computer use

OSWorld score jumped from 42.2% to 61.4%—a 45% relative improvement. This makes Sonnet 4.5 the best model for computer use tasks: navigating GUIs, filling forms, interacting with web applications, and automated testing. The TAU-bench results reinforce this: 98% on Telecom (vs 71.5% for Opus 4.1) and 86.2% on Retail.

Zero code editing errors

Replit reported their code editing error rate dropped from 9% to 0% when switching from Sonnet 4 to Sonnet 4.5. Combined with the 77.2% SWE-bench score, this establishes Sonnet 4.5 as the most reliable model for automated code modification.

1M context window beta

API users at Tier 4+ ($400+ spend) can access the 1 million token context beta via the context-1m-2025-08-07 header. This enables processing entire codebases, lengthy documentation, or comprehensive research materials in a single context—though pricing doubles for inputs beyond 200K tokens.

Reduced sycophancy

Anthropic specifically trained Sonnet 4.5 to be less agreeable when users are wrong. The model pushes back more appropriately and avoids the excessive “you’re absolutely right!” responses that plagued earlier versions.

The Sonnet 4.5 model family

Sonnet 4.5 is the middle tier of the Claude 4.5 family, balancing capability and cost:

Model	Released	API Identifier	Pricing	Best for
Claude Opus 4.5	Nov 24, 2025	`claude-opus-4-5-20251101`	$5/$25	Complex reasoning, “when you can’t afford to be wrong”
Claude Sonnet 4.5	Sep 29, 2025	`claude-sonnet-4-5-20250929`	$3/$15	Coding, agents, production workloads
Claude Haiku 4.5	Oct 2025	`claude-haiku-4-5-20251001`	$1/$5	High-volume, latency-sensitive tasks

Sonnet 4.5 delivers approximately 95% of Opus 4.5’s coding capability at 60% of the cost, making it the recommended default for most use cases. Reserve Opus for complex architectural decisions and difficult debugging.

Benchmark performance

Sonnet 4.5 leads on coding and agentic benchmarks while showing competitive—but not leading—performance on pure reasoning tasks.

Coding benchmarks

Benchmark	Sonnet 4.5	Opus 4.5	GPT-5.1	Gemini 3 Pro
SWE-bench Verified	77.2%	80.9%	76.3%	76.2%
SWE-bench (high compute)	82.0%	—	—	—
Terminal-Bench	50.0%	59.3%	43.8%	—
OSWorld	61.4%	66.3%	~44%	—

SWE-bench Verified tests models on real GitHub pull requests—the most realistic benchmark for software engineering capability. Sonnet 4.5’s 77.2% means it resolves roughly 4 out of 5 actual bug reports without human intervention. The high-compute configuration (parallel attempts + rejection sampling) pushes this to 82.0%.

Agentic benchmarks

Benchmark	Sonnet 4.5	Opus 4.1	GPT-5	Gemini 2.5 Pro
TAU-bench Telecom	98.0%	71.5%	—	—
TAU-bench Retail	86.2%	—	—	—
TAU-bench Airline	70.0%	63.0%	—	—
Finance Agent	55-69%	—	46.9%	29.4%

The TAU-bench results demonstrate Sonnet 4.5’s exceptional agent capabilities—the 98% Telecom score represents near-perfect task completion in complex multi-step scenarios.

Reasoning and knowledge

Benchmark	Sonnet 4.5	Opus 4.5	GPT-5.1	Gemini 3 Pro
GPQA Diamond	83.4%	87.0%	~87.0%	91.9%
MMLU	89.1%	90.8%	91.5%	91.8%
AIME 2025	87.0%	100.0%	94.0%	—
MMMU	77.8%	80.7%	85.4%	—

On pure reasoning, Sonnet 4.5 trails the flagship models. GPQA Diamond (83.4%) is notably behind Gemini 3 Pro (91.9%) and GPT-5.1 (~87%). For tasks requiring maximum reasoning capability, Opus 4.5 or GPT-5.1 may be better choices.

Speed and efficiency

Artificial Analysis testing ranks Sonnet 4.5 among the fastest frontier models:

Output speed: 63 tokens/second
Time to first token: 1.80 seconds
Intelligence Index: 61 (thinking mode)—4th overall

This speed advantage makes Sonnet 4.5 practical for interactive coding assistants where latency matters.

Pricing breakdown

Sonnet 4.5 maintains price parity with Sonnet 4—the capability upgrade comes at zero additional cost.

Standard API pricing

Tier	Input	Output
Standard (≤200K context)	$3.00 / MTok	$15.00 / MTok
Long context (>200K)	$6.00 / MTok	$22.50 / MTok

Cost optimisation options

Option	Input	Output	Savings
Prompt caching (read)	$0.30 / MTok	—	90%
Prompt caching (write)	$3.75 / MTok	—	-25% (investment)
Extended cache (1hr write)	$6.00 / MTok	—	-100%
Batch API	$1.50 / MTok	$7.50 / MTok	50%

Prompt caching with 5-minute TTL delivers up to 90% savings on repeated context. For high-volume production with predictable prompts, this dramatically reduces costs.

Competitor pricing comparison

Model	Input	Output	vs Sonnet 4.5
Claude Sonnet 4.5	$3.00	$15.00	—
Claude Opus 4.5	$5.00	$25.00	1.67× more
GPT-5.1	$1.25	$10.00	2.4× cheaper input
Gemini 3 Pro	~$2.00	~$12.00	~1.5× cheaper

The cost differential with GPT-5.1 is significant. For cost-sensitive deployments where Sonnet 4.5’s coding advantages aren’t critical, GPT-5.1 offers compelling value.

How to access Claude Sonnet 4.5

Via API

Sonnet 4.5 is generally available with no waitlist. Basic usage:

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    max_tokens=8192,
    messages=[{"role": "user", "content": "Your prompt here"}]
)

Enable extended thinking with budget control:

response = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000  # Minimum 1,024
    },
    messages=[{"role": "user", "content": "Complex reasoning task"}]
)

Enable 1M context (Tier 4+ only):

response = client.messages.create(
    model="claude-sonnet-4-5-20250929",
    max_tokens=8192,
    headers={"anthropic-beta": "context-1m-2025-08-07"},
    messages=[{"role": "user", "content": "Process this large codebase..."}]
)

Via Claude.ai

Access varies by subscription tier:

Tier	Access	Price	Notes
Free	✅	$0	~9 messages/5 hours, no extended thinking
Pro	✅	$20/mo	40-80 hours Sonnet/week, extended thinking
Max 5×	✅	$100/mo	140-280 hours Sonnet/week
Max 20×	✅	$200/mo	240-480 hours Sonnet/week
Team	✅	$25-150/user/mo	Premium seats get priority
Enterprise	✅	Custom	500K context, custom limits

Sonnet 4.5 is the default model for all Claude.ai users, making it immediately accessible without configuration changes.

Via cloud providers

Sonnet 4.5 is available on all three major cloud platforms:

Platform	Status	Model ID
Amazon Bedrock	GA	`anthropic.claude-sonnet-4-5-20250929-v1:0`
Google Cloud Vertex AI	GA	`claude-sonnet-4-5@20250929`
Microsoft Azure Foundry	Preview	—

AWS Bedrock includes GovCloud availability with cross-region inference. Vertex AI supports batch predictions and the 1M context preview.

Via coding tools

Sonnet 4.5 is fully integrated into major AI coding assistants:

Cursor: Regular and thinking modes available
GitHub Copilot: Public preview, powering agentic experiences
Windsurf: Full integration
Cline: Available with API key
OpenRouter: anthropic/claude-sonnet-4.5 at same pricing

How Claude Sonnet 4.5 compares

vs Claude Opus 4.5

Opus 4.5 costs 67% more ($5/$25 vs $3/$15) but delivers measurably better results on complex tasks:

+3.7pp on SWE-bench (80.9% vs 77.2%)
+9.3pp on Terminal-Bench (59.3% vs 50.0%)
+4.9pp on OSWorld (66.3% vs 61.4%)
+13pp on AIME 2025 (100% vs 87%)

However, Opus achieves this with 76% fewer output tokens at medium effort, making it more cost-effective for tasks requiring deep reasoning. Simon Willison noted he “switched back to Sonnet 4.5 and kept on working at the same pace” for routine tasks.

Choose Sonnet 4.5 for: daily coding tasks, high-volume production, cost-sensitive applications, standard agent workflows.

Choose Opus 4.5 for: complex architectural decisions, difficult debugging, long-horizon autonomous agents, “when you cannot afford to be wrong.”

vs GPT-5.1

Independent testing shows near parity on SWE-bench under standardised conditions: Sonnet 4.5 at 69.8% vs GPT-5-Codex at 69.4%. However, GPT-5.1 is 2.4× cheaper on input tokens ($1.25 vs $3.00).

Sonnet 4.5 advantages:

Superior computer use (OSWorld 61.4% vs ~44%)
Better agentic consistency
30+ hour autonomous operation
Lower hallucination rate on code tasks

GPT-5.1 advantages:

2.4× cheaper input pricing
Better pure reasoning (GPQA 87% vs 83.4%)
Larger context window (400K vs 200K standard)
Broader multimodal support (audio/video)

Zvi Mowshowitz’s assessment: “If I had to pick one ‘best coding model in the world’ right now it would be Sonnet 4.5. If I had to pick one coding strategy to build with, I’d use Sonnet 4.5 and Claude Code.”

vs Gemini 3 Pro

Gemini 3 Pro (released November 2025) narrows the coding gap to 76.2% SWE-bench while offering:

Larger context: 1M+ tokens native vs 200K (1M beta)
Better reasoning: GPQA 91.9% vs 83.4%
Lower cost: ~$2/$12 vs $3/$15
Broader multimodal: Native audio, video, more languages

Sonnet 4.5 maintains advantages in computer use and agentic reliability. Choose Gemini for massive context needs or when reasoning benchmarks matter more than coding benchmarks.

The practical consensus

From community feedback: developers use Sonnet 4.5 as the daily driver for coding work, reserve Opus 4.5 for hard problems, and reach for GPT-5.1 when cost matters or for “particular wicked problems and difficult bugs.”

Known limitations

Independent testing and community reports reveal several areas where Sonnet 4.5 falls short:

Cost premium over GPT-5.1: At $3/$15, Sonnet 4.5 costs 2.4× more than GPT-5.1 ($1.25/$10) on input tokens. For cost-sensitive deployments, this premium requires justification through superior coding performance.

Reasoning gap: GPQA Diamond (83.4%) trails GPT-5.1 (~87%) and Gemini 3 Pro (91.9%). For tasks requiring maximum reasoning capability—mathematical research, complex scientific analysis—other models may be better choices.

Legacy codebase struggles: Hacker News feedback suggests Sonnet 4.5 is “insanely impressive in greenfield projects and collapses in legacy codebases.” The model excels at building new systems but can struggle with complex existing architectures.

Context window constraints: The standard 200K context is smaller than GPT-5.1’s 400K and Gemini’s 1M+. The 1M beta requires Tier 4+ API access and doubles input pricing.

Superficial implementations on long tasks: Some users report that while Sonnet 4.5 is fast, it can produce “broken and superficial” implementations on longer, more complex tasks compared to GPT-5 which “took 5× longer but understood the objective better.”

Mathematical weakness: Multiple reviewers note Sonnet 4.5 is “still leagues worse than GPT-5-high at mathematical stuff.” For heavy mathematical workloads, consider alternatives.

Community reception

The developer community has responded positively to Sonnet 4.5, with particular praise for its coding capabilities and speed.

The positives

Coding excellence: Simon Willison called it “probably the ‘best coding model in the world’” at launch. The code interpreter mode successfully cloned his repository and ran 466 tests autonomously.

Real-world validation: Ethan Mollick demonstrated Sonnet 4.5 autonomously replicating published economics research—reading papers, processing data archives, converting STATA code to Python, and reproducing findings.

Speed improvements: Tasks that took 20+ minutes with competitors complete in ~3 minutes with Sonnet 4.5. Replit’s code editing error rate dropped from 9% to 0%.

Agent reliability: Devin reported 18% improvement in planning tasks. The 30+ hour autonomous operation enables overnight coding workflows that weren’t previously practical.

The criticisms

Cost concerns: The 2.4× input cost premium over GPT-5.1 is frequently cited. For high-volume deployments, this adds up significantly.

Context issues: Some users report the model “forgets rules” and doesn’t reuse existing code patterns in longer conversations.

Not universally superior: Zvi Mowshowitz notes GPT-5 remains better for “particular wicked problems and difficult bugs”—Sonnet 4.5’s speed advantage doesn’t always translate to better outcomes.

Expert verdict

The consensus: Sonnet 4.5 is the best default choice for AI-assisted coding, but the “best coding model” claim applies to specific contexts—greenfield projects, architecture planning, and rapid iteration. For legacy codebases, mathematical research, and cost-sensitive deployments, alternatives remain competitive or superior.

Version history

Version	Released	Key changes
Claude Opus 4.5	Nov 24, 2025	Flagship, 80.9% SWE-bench, 67% price cut from Opus 4.1
Claude Haiku 4.5	Oct 2025	Fast tier, 73.3% SWE-bench at $1/$5
Claude Sonnet 4.5	Sep 29, 2025	77.2% SWE-bench, 30+ hour operation, 61.4% OSWorld
Claude Sonnet 4	May 2025	72.7% SWE-bench, computer use improvements
Claude 3.7 Sonnet	Early 2025	62.3% SWE-bench, extended thinking preview
Claude 3.5 Sonnet	2024	Previous generation

Sonnet 4 remains available via API for users who need backward compatibility. The upgrade path from Sonnet 4 to 4.5 requires no code changes—simply update the model identifier.

FAQ

Is Claude Sonnet 4.5 better than GPT-5.1 for coding?

For most coding tasks—yes. Sonnet 4.5 leads on SWE-bench (77.2% vs 76.3%) and significantly outperforms on computer use (OSWorld 61.4% vs ~44%). However, GPT-5.1 costs 2.4× less on input tokens and may be better for “wicked problems and difficult bugs.” Choose based on whether capability or cost matters more.

How much does Claude Sonnet 4.5 cost?

$3.00 per million input tokens, $15.00 per million output tokens. Cached inputs drop to $0.30/MTok (90% savings). Batch API offers 50% discount. This is identical pricing to Sonnet 4—the upgrade is a pure capability gain.

Can I use Claude Sonnet 4.5 for free?

Yes. Free tier users get approximately 9 messages per 5 hours with Sonnet 4.5, though extended thinking is not available. For serious use, Pro ($20/month) or API access is recommended.

What’s the difference between Sonnet 4.5 and Opus 4.5?

Opus 4.5 (80.9% SWE-bench) is Anthropic’s most capable model for complex tasks. Sonnet 4.5 (77.2% SWE-bench) offers ~95% of the capability at 60% of the price ($3/$15 vs $5/$25). Most developers use Sonnet daily and reserve Opus for hard problems.

Does Sonnet 4.5 support 1 million token context?

Yes, but only via API at Tier 4+ ($400+ spend). Enable with the context-1m-2025-08-07 beta header. Input pricing doubles for content beyond 200K tokens.

Is Sonnet 4.5 good for agents?

Excellent. The 30+ hour autonomous operation, 61.4% OSWorld score, and 98% TAU-bench Telecom demonstrate best-in-class agent capabilities. It’s the recommended model for building autonomous coding workflows.

What is Sonnet 4.5 best at?

Coding tasks, computer use, agentic workflows, and high-volume production workloads. It excels at greenfield projects, rapid iteration, and tasks requiring sustained autonomous operation.

Where is Sonnet 4.5 available?

Claude.ai (all tiers), Anthropic API, Amazon Bedrock, Google Cloud Vertex AI, Microsoft Azure Foundry (preview), Cursor, GitHub Copilot, Windsurf, Cline, and OpenRouter.

Official links

Resource	URL
Claude.ai	claude.ai
Anthropic Website	anthropic.com
Sonnet 4.5 Announcement	anthropic.com/news/claude-sonnet-4-5
Sonnet Product Page	anthropic.com/claude/sonnet
API Documentation	docs.anthropic.com
Pricing	anthropic.com/pricing
Model Overview	docs.anthropic.com/en/docs/about-claude/models/overview
Migration Guide	docs.anthropic.com/en/docs/about-claude/models/migrating-to-claude-4
Status Page	status.anthropic.com

Model ID	`claude-sonnet-4-5`
Provider	Anthropic
Architecture	transformer
Context Window	200K tokens
Max Input	200K tokens
Max Output	64K tokens
Knowledge Cutoff	2025-01-31
License	proprietary
Open Weights	No

Artificial Analysis	#4
AA Intelligence Index	61

Specifications

Capabilities

Modalities

Reasoning

Features

Variants

API Pricing

Claude Access

Benchmarks

Coding