Anthropic released Claude Sonnet 4.5, which it calls its most capable model to date. The company says it improves coding and real-world computer use. Claude 4.5 is available through the API at the same price as Sonnet 4.
Stronger results and long task focus
Anthropic says Sonnet 4.5 worked on the same project for more than 30 hours on complex, multi-step tasks. The company did not share task details. It frames this as better long-run focus than past agent setups.
On benchmarks, Sonnet 4.5 scored 77.2 percent on SWE-bench Verified and 61.4 percent on OSWorld. Those figures beat OpenAI’s GPT-5 Codex at 74.5 percent and Google’s Gemini 2.5 Pro at 67.2 percent. It also showed gains on AIME 2024 and MMMLU.
On finance tasks in Vals AI’s Finance Agent benchmark, Sonnet 4.5 scored 92 percent. Anthropic says it is the best model for building complex agents and for using computers.
Computer use and benchmark context
Anthropic reports better computer use than Sonnet 4. Four months ago, Sonnet 4 scored 42.2 percent on OSWorld. The new score is 61.4 percent. The company uses these skills in its Claude for Chrome extension, which can navigate sites and fill spreadsheets.
Benchmarks can be gamed or suffer from dataset contamination. So the numbers need independent checks. Even so, the gains suggest a clear step up from 4.0.
Tools, pricing, and early reactions
Claude 4.5 keeps pricing at $3 per million input tokens and $15 per million output tokens. Developers can call the model with the identifier „claude-sonnet-4-5“ through the Claude API.
Anthropic also released Claude Code 2.0, a command-line agent for developers, and the Claude Agent SDK for building coding agents. Claude Code adds checkpoints, a refreshed terminal, and a native VS Code extension. The API gains context editing and a memory tool for longer-running agent tasks.
Users of Claude’s web and app interfaces can now run code and create files in chats. They can also generate spreadsheets, slides, and documents without leaving the chat. A five-day research preview called Imagine with Claude for Max subscribers shows the model generating software in real time.
Anthropic says Sonnet 4.5 reduces sycophancy, deception, power-seeking, and the tendency to encourage delusional thinking. According to Ars Technica, developer Simon Willison said it felt better for code than GPT-5-Codex in his early tests.