Claude Sonnet 4.5
Specifications
| Model ID | claude-sonnet-4-5 |
| Provider | Anthropic |
| Architecture | transformer |
| Context Window | 200K tokens |
| Max Input | 200K tokens |
| Max Output | 64K tokens |
| Knowledge Cutoff | 2025-01-31 |
| License | proprietary |
| Open Weights | No |
Capabilities
Modalities
Reasoning
Features
Variants
| VARIANT | API ID |
|---|---|
| Claude Sonnet 4.5 | claude-sonnet-4-5-20250929 |
| Claude Sonnet 4.5 (alias) | claude-sonnet-4-5 |
API Pricing
Price parity with Sonnet 4—pure capability upgrade at same cost. Prompt caching offers up to 90% savings on repeated context. Batch API provides 50% discount with 24-hour turnaround. 1M context beta available at Tier 4+.
Claude Access
| TIER | PRICE |
|---|---|
| Free | Free |
| Pro | $20/mo |
| Max 5× | $100/mo |
| Max 20× | $200/mo |
| Team Standard | $25/mo |
| Team Premium | $150/mo |
| Enterprise | Free |
Benchmarks
Coding
| SWE-bench Verified | 77.2% |
Reasoning
| GPQA Diamond | 83.4% |
| MMLU | 89.1% |
Math
| AIME 2025 | 87% |
Vision
| MMMU | 77.8% |
Rankings
| Artificial Analysis | #4 |
| AA Intelligence Index | 61 |
SWE-bench leader at launch (77.2%). OSWorld (61.4%) establishes SOTA for computer use—45% improvement over Sonnet 4. Perfect AIME 2025 with Python tools. TAU-bench Telecom (98%) demonstrates exceptional agent capabilities. Output speed (63 tok/s) among fastest frontier models.
Claude Sonnet 4.5 is Anthropic’s flagship coding model, released on September 29, 2025. At launch, it achieved the highest score on SWE-bench Verified at 77.2%—establishing itself as the leading model for real-world software engineering tasks. Anthropic positioned it as their recommended model for “basically every use case,” delivering significant improvements in autonomous operation: 30+ hours of continuous work compared to 7 hours for its predecessor.
The model dominates computer use benchmarks with a 61.4% OSWorld score (45% improvement over Sonnet 4) and scores 100% on AIME 2025 when using Python tools. Critically, it maintains price parity with Sonnet 4 at $3/$15 per million tokens—making the upgrade a pure capability gain at zero additional cost. For developers building AI-assisted coding workflows, Sonnet 4.5 represents the current sweet spot between capability and cost.
Quick specs
| Provider | Anthropic |
| Released | September 29, 2025 |
| Context window | 200K tokens (1M beta at Tier 4+) |
| Max output | 64K tokens |
| Knowledge cutoff | January 31, 2025 |
| Input price | $3.00 / MTok |
| Output price | $15.00 / MTok |
| Cached input | $0.30 / MTok (90% savings) |
| SWE-bench Verified | 77.2% (82.0% high compute) |
| OSWorld | 61.4% (best-in-class computer use) |
| Best for | Coding, agents, computer use, high-volume production |
| Limitations | Pure reasoning trails GPT-5, costs 2.4× more than GPT-5 input |
What’s new in Sonnet 4.5
Sonnet 4.5 represents the most significant Sonnet upgrade to date, focusing on agentic capabilities and coding performance while maintaining the speed and cost that made Sonnet the default choice for production workloads.
30+ hour autonomous operation
The headline improvement is sustained autonomous work. Anthropic reports Sonnet 4.5 can operate for 30+ hours continuously compared to 7 hours for Sonnet 4. This enables true overnight coding tasks, multi-day research projects, and complex multi-system integrations without human intervention.
State-of-the-art computer use
OSWorld score jumped from 42.2% to 61.4%—a 45% relative improvement. This makes Sonnet 4.5 the best model for computer use tasks: navigating GUIs, filling forms, interacting with web applications, and automated testing. The TAU-bench results reinforce this: 98% on Telecom (vs 71.5% for Opus 4.1) and 86.2% on Retail.
Zero code editing errors
Replit reported their code editing error rate dropped from 9% to 0% when switching from Sonnet 4 to Sonnet 4.5. Combined with the 77.2% SWE-bench score, this establishes Sonnet 4.5 as the most reliable model for automated code modification.
1M context window beta
API users at Tier 4+ ($400+ spend) can access the 1 million token context beta via the context-1m-2025-08-07 header. This enables processing entire codebases, lengthy documentation, or comprehensive research materials in a single context—though pricing doubles for inputs beyond 200K tokens.
Reduced sycophancy
Anthropic specifically trained Sonnet 4.5 to be less agreeable when users are wrong. The model pushes back more appropriately and avoids the excessive “you’re absolutely right!” responses that plagued earlier versions.
The Sonnet 4.5 model family
Sonnet 4.5 is the middle tier of the Claude 4.5 family, balancing capability and cost:
| Model | Released | API Identifier | Pricing | Best for |
|---|---|---|---|---|
| Claude Opus 4.5 | Nov 24, 2025 | claude-opus-4-5-20251101 | $5/$25 | Complex reasoning, “when you can’t afford to be wrong” |
| Claude Sonnet 4.5 | Sep 29, 2025 | claude-sonnet-4-5-20250929 | $3/$15 | Coding, agents, production workloads |
| Claude Haiku 4.5 | Oct 2025 | claude-haiku-4-5-20251001 | $1/$5 | High-volume, latency-sensitive tasks |
Sonnet 4.5 delivers approximately 95% of Opus 4.5’s coding capability at 60% of the cost, making it the recommended default for most use cases. Reserve Opus for complex architectural decisions and difficult debugging.
Benchmark performance
Sonnet 4.5 leads on coding and agentic benchmarks while showing competitive—but not leading—performance on pure reasoning tasks.
Coding benchmarks
| Benchmark | Sonnet 4.5 | Opus 4.5 | GPT-5.1 | Gemini 3 Pro |
|---|---|---|---|---|
| SWE-bench Verified | 77.2% | 80.9% | 76.3% | 76.2% |
| SWE-bench (high compute) | 82.0% | — | — | — |
| Terminal-Bench | 50.0% | 59.3% | 43.8% | — |
| OSWorld | 61.4% | 66.3% | ~44% | — |
SWE-bench Verified tests models on real GitHub pull requests—the most realistic benchmark for software engineering capability. Sonnet 4.5’s 77.2% means it resolves roughly 4 out of 5 actual bug reports without human intervention. The high-compute configuration (parallel attempts + rejection sampling) pushes this to 82.0%.
Agentic benchmarks
| Benchmark | Sonnet 4.5 | Opus 4.1 | GPT-5 | Gemini 2.5 Pro |
|---|---|---|---|---|
| TAU-bench Telecom | 98.0% | 71.5% | — | — |
| TAU-bench Retail | 86.2% | — | — | — |
| TAU-bench Airline | 70.0% | 63.0% | — | — |
| Finance Agent | 55-69% | — | 46.9% | 29.4% |
The TAU-bench results demonstrate Sonnet 4.5’s exceptional agent capabilities—the 98% Telecom score represents near-perfect task completion in complex multi-step scenarios.
Reasoning and knowledge
| Benchmark | Sonnet 4.5 | Opus 4.5 | GPT-5.1 | Gemini 3 Pro |
|---|---|---|---|---|
| GPQA Diamond | 83.4% | 87.0% | ~87.0% | 91.9% |
| MMLU | 89.1% | 90.8% | 91.5% | 91.8% |
| AIME 2025 | 87.0% | 100.0% | 94.0% | — |
| MMMU | 77.8% | 80.7% | 85.4% | — |
On pure reasoning, Sonnet 4.5 trails the flagship models. GPQA Diamond (83.4%) is notably behind Gemini 3 Pro (91.9%) and GPT-5.1 (~87%). For tasks requiring maximum reasoning capability, Opus 4.5 or GPT-5.1 may be better choices.
Speed and efficiency
Artificial Analysis testing ranks Sonnet 4.5 among the fastest frontier models:
- Output speed: 63 tokens/second
- Time to first token: 1.80 seconds
- Intelligence Index: 61 (thinking mode)—4th overall
This speed advantage makes Sonnet 4.5 practical for interactive coding assistants where latency matters.
Pricing breakdown
Sonnet 4.5 maintains price parity with Sonnet 4—the capability upgrade comes at zero additional cost.
Standard API pricing
| Tier | Input | Output |
|---|---|---|
| Standard (≤200K context) | $3.00 / MTok | $15.00 / MTok |
| Long context (>200K) | $6.00 / MTok | $22.50 / MTok |
Cost optimisation options
| Option | Input | Output | Savings |
|---|---|---|---|
| Prompt caching (read) | $0.30 / MTok | — | 90% |
| Prompt caching (write) | $3.75 / MTok | — | -25% (investment) |
| Extended cache (1hr write) | $6.00 / MTok | — | -100% |
| Batch API | $1.50 / MTok | $7.50 / MTok | 50% |
Prompt caching with 5-minute TTL delivers up to 90% savings on repeated context. For high-volume production with predictable prompts, this dramatically reduces costs.
Competitor pricing comparison
| Model | Input | Output | vs Sonnet 4.5 |
|---|---|---|---|
| Claude Sonnet 4.5 | $3.00 | $15.00 | — |
| Claude Opus 4.5 | $5.00 | $25.00 | 1.67× more |
| GPT-5.1 | $1.25 | $10.00 | 2.4× cheaper input |
| Gemini 3 Pro | ~$2.00 | ~$12.00 | ~1.5× cheaper |
The cost differential with GPT-5.1 is significant. For cost-sensitive deployments where Sonnet 4.5’s coding advantages aren’t critical, GPT-5.1 offers compelling value.
How to access Claude Sonnet 4.5
Via API
Sonnet 4.5 is generally available with no waitlist. Basic usage:
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=8192,
messages=[{"role": "user", "content": "Your prompt here"}]
)
Enable extended thinking with budget control:
response = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=16000,
thinking={
"type": "enabled",
"budget_tokens": 10000 # Minimum 1,024
},
messages=[{"role": "user", "content": "Complex reasoning task"}]
)
Enable 1M context (Tier 4+ only):
response = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=8192,
headers={"anthropic-beta": "context-1m-2025-08-07"},
messages=[{"role": "user", "content": "Process this large codebase..."}]
)
Via Claude.ai
Access varies by subscription tier:
| Tier | Access | Price | Notes |
|---|---|---|---|
| Free | ✅ | $0 | ~9 messages/5 hours, no extended thinking |
| Pro | ✅ | $20/mo | 40-80 hours Sonnet/week, extended thinking |
| Max 5× | ✅ | $100/mo | 140-280 hours Sonnet/week |
| Max 20× | ✅ | $200/mo | 240-480 hours Sonnet/week |
| Team | ✅ | $25-150/user/mo | Premium seats get priority |
| Enterprise | ✅ | Custom | 500K context, custom limits |
Sonnet 4.5 is the default model for all Claude.ai users, making it immediately accessible without configuration changes.
Via cloud providers
Sonnet 4.5 is available on all three major cloud platforms:
| Platform | Status | Model ID |
|---|---|---|
| Amazon Bedrock | GA | anthropic.claude-sonnet-4-5-20250929-v1:0 |
| Google Cloud Vertex AI | GA | claude-sonnet-4-5@20250929 |
| Microsoft Azure Foundry | Preview | — |
AWS Bedrock includes GovCloud availability with cross-region inference. Vertex AI supports batch predictions and the 1M context preview.
Via coding tools
Sonnet 4.5 is fully integrated into major AI coding assistants:
- Cursor: Regular and thinking modes available
- GitHub Copilot: Public preview, powering agentic experiences
- Windsurf: Full integration
- Cline: Available with API key
- OpenRouter:
anthropic/claude-sonnet-4.5at same pricing
How Claude Sonnet 4.5 compares
vs Claude Opus 4.5
Opus 4.5 costs 67% more ($5/$25 vs $3/$15) but delivers measurably better results on complex tasks:
- +3.7pp on SWE-bench (80.9% vs 77.2%)
- +9.3pp on Terminal-Bench (59.3% vs 50.0%)
- +4.9pp on OSWorld (66.3% vs 61.4%)
- +13pp on AIME 2025 (100% vs 87%)
However, Opus achieves this with 76% fewer output tokens at medium effort, making it more cost-effective for tasks requiring deep reasoning. Simon Willison noted he “switched back to Sonnet 4.5 and kept on working at the same pace” for routine tasks.
Choose Sonnet 4.5 for: daily coding tasks, high-volume production, cost-sensitive applications, standard agent workflows.
Choose Opus 4.5 for: complex architectural decisions, difficult debugging, long-horizon autonomous agents, “when you cannot afford to be wrong.”
vs GPT-5.1
Independent testing shows near parity on SWE-bench under standardised conditions: Sonnet 4.5 at 69.8% vs GPT-5-Codex at 69.4%. However, GPT-5.1 is 2.4× cheaper on input tokens ($1.25 vs $3.00).
Sonnet 4.5 advantages:
- Superior computer use (OSWorld 61.4% vs ~44%)
- Better agentic consistency
- 30+ hour autonomous operation
- Lower hallucination rate on code tasks
GPT-5.1 advantages:
- 2.4× cheaper input pricing
- Better pure reasoning (GPQA 87% vs 83.4%)
- Larger context window (400K vs 200K standard)
- Broader multimodal support (audio/video)
Zvi Mowshowitz’s assessment: “If I had to pick one ‘best coding model in the world’ right now it would be Sonnet 4.5. If I had to pick one coding strategy to build with, I’d use Sonnet 4.5 and Claude Code.”
vs Gemini 3 Pro
Gemini 3 Pro (released November 2025) narrows the coding gap to 76.2% SWE-bench while offering:
- Larger context: 1M+ tokens native vs 200K (1M beta)
- Better reasoning: GPQA 91.9% vs 83.4%
- Lower cost: ~$2/$12 vs $3/$15
- Broader multimodal: Native audio, video, more languages
Sonnet 4.5 maintains advantages in computer use and agentic reliability. Choose Gemini for massive context needs or when reasoning benchmarks matter more than coding benchmarks.
The practical consensus
From community feedback: developers use Sonnet 4.5 as the daily driver for coding work, reserve Opus 4.5 for hard problems, and reach for GPT-5.1 when cost matters or for “particular wicked problems and difficult bugs.”
Known limitations
Independent testing and community reports reveal several areas where Sonnet 4.5 falls short:
Cost premium over GPT-5.1: At $3/$15, Sonnet 4.5 costs 2.4× more than GPT-5.1 ($1.25/$10) on input tokens. For cost-sensitive deployments, this premium requires justification through superior coding performance.
Reasoning gap: GPQA Diamond (83.4%) trails GPT-5.1 (~87%) and Gemini 3 Pro (91.9%). For tasks requiring maximum reasoning capability—mathematical research, complex scientific analysis—other models may be better choices.
Legacy codebase struggles: Hacker News feedback suggests Sonnet 4.5 is “insanely impressive in greenfield projects and collapses in legacy codebases.” The model excels at building new systems but can struggle with complex existing architectures.
Context window constraints: The standard 200K context is smaller than GPT-5.1’s 400K and Gemini’s 1M+. The 1M beta requires Tier 4+ API access and doubles input pricing.
Superficial implementations on long tasks: Some users report that while Sonnet 4.5 is fast, it can produce “broken and superficial” implementations on longer, more complex tasks compared to GPT-5 which “took 5× longer but understood the objective better.”
Mathematical weakness: Multiple reviewers note Sonnet 4.5 is “still leagues worse than GPT-5-high at mathematical stuff.” For heavy mathematical workloads, consider alternatives.
Community reception
The developer community has responded positively to Sonnet 4.5, with particular praise for its coding capabilities and speed.
The positives
Coding excellence: Simon Willison called it “probably the ‘best coding model in the world’” at launch. The code interpreter mode successfully cloned his repository and ran 466 tests autonomously.
Real-world validation: Ethan Mollick demonstrated Sonnet 4.5 autonomously replicating published economics research—reading papers, processing data archives, converting STATA code to Python, and reproducing findings.
Speed improvements: Tasks that took 20+ minutes with competitors complete in ~3 minutes with Sonnet 4.5. Replit’s code editing error rate dropped from 9% to 0%.
Agent reliability: Devin reported 18% improvement in planning tasks. The 30+ hour autonomous operation enables overnight coding workflows that weren’t previously practical.
The criticisms
Cost concerns: The 2.4× input cost premium over GPT-5.1 is frequently cited. For high-volume deployments, this adds up significantly.
Context issues: Some users report the model “forgets rules” and doesn’t reuse existing code patterns in longer conversations.
Not universally superior: Zvi Mowshowitz notes GPT-5 remains better for “particular wicked problems and difficult bugs”—Sonnet 4.5’s speed advantage doesn’t always translate to better outcomes.
Expert verdict
The consensus: Sonnet 4.5 is the best default choice for AI-assisted coding, but the “best coding model” claim applies to specific contexts—greenfield projects, architecture planning, and rapid iteration. For legacy codebases, mathematical research, and cost-sensitive deployments, alternatives remain competitive or superior.
Version history
| Version | Released | Key changes |
|---|---|---|
| Claude Opus 4.5 | Nov 24, 2025 | Flagship, 80.9% SWE-bench, 67% price cut from Opus 4.1 |
| Claude Haiku 4.5 | Oct 2025 | Fast tier, 73.3% SWE-bench at $1/$5 |
| Claude Sonnet 4.5 | Sep 29, 2025 | 77.2% SWE-bench, 30+ hour operation, 61.4% OSWorld |
| Claude Sonnet 4 | May 2025 | 72.7% SWE-bench, computer use improvements |
| Claude 3.7 Sonnet | Early 2025 | 62.3% SWE-bench, extended thinking preview |
| Claude 3.5 Sonnet | 2024 | Previous generation |
Sonnet 4 remains available via API for users who need backward compatibility. The upgrade path from Sonnet 4 to 4.5 requires no code changes—simply update the model identifier.
FAQ
Is Claude Sonnet 4.5 better than GPT-5.1 for coding?
For most coding tasks—yes. Sonnet 4.5 leads on SWE-bench (77.2% vs 76.3%) and significantly outperforms on computer use (OSWorld 61.4% vs ~44%). However, GPT-5.1 costs 2.4× less on input tokens and may be better for “wicked problems and difficult bugs.” Choose based on whether capability or cost matters more.
How much does Claude Sonnet 4.5 cost?
$3.00 per million input tokens, $15.00 per million output tokens. Cached inputs drop to $0.30/MTok (90% savings). Batch API offers 50% discount. This is identical pricing to Sonnet 4—the upgrade is a pure capability gain.
Can I use Claude Sonnet 4.5 for free?
Yes. Free tier users get approximately 9 messages per 5 hours with Sonnet 4.5, though extended thinking is not available. For serious use, Pro ($20/month) or API access is recommended.
What’s the difference between Sonnet 4.5 and Opus 4.5?
Opus 4.5 (80.9% SWE-bench) is Anthropic’s most capable model for complex tasks. Sonnet 4.5 (77.2% SWE-bench) offers ~95% of the capability at 60% of the price ($3/$15 vs $5/$25). Most developers use Sonnet daily and reserve Opus for hard problems.
Does Sonnet 4.5 support 1 million token context?
Yes, but only via API at Tier 4+ ($400+ spend). Enable with the context-1m-2025-08-07 beta header. Input pricing doubles for content beyond 200K tokens.
Is Sonnet 4.5 good for agents?
Excellent. The 30+ hour autonomous operation, 61.4% OSWorld score, and 98% TAU-bench Telecom demonstrate best-in-class agent capabilities. It’s the recommended model for building autonomous coding workflows.
What is Sonnet 4.5 best at?
Coding tasks, computer use, agentic workflows, and high-volume production workloads. It excels at greenfield projects, rapid iteration, and tasks requiring sustained autonomous operation.
Where is Sonnet 4.5 available?
Claude.ai (all tiers), Anthropic API, Amazon Bedrock, Google Cloud Vertex AI, Microsoft Azure Foundry (preview), Cursor, GitHub Copilot, Windsurf, Cline, and OpenRouter.
Official links
| Resource | URL |
|---|---|
| Claude.ai | claude.ai |
| Anthropic Website | anthropic.com |
| Sonnet 4.5 Announcement | anthropic.com/news/claude-sonnet-4-5 |
| Sonnet Product Page | anthropic.com/claude/sonnet |
| API Documentation | docs.anthropic.com |
| Pricing | anthropic.com/pricing |
| Model Overview | docs.anthropic.com/en/docs/about-claude/models/overview |
| Migration Guide | docs.anthropic.com/en/docs/about-claude/models/migrating-to-claude-4 |
| Status Page | status.anthropic.com |