Anthropic says Claude 4.5 beats OpenAI and Google on tests

Anthropic, OpenAI, and Google official logos arranged in a tight central triangle on a bright gradient backdrop, Anthropic logo centered and slightly forward, subtle metallic laurel ring encircling the trio, soft rim lighting and glossy reflections, medium close-up framing, warm gold against cool cyan and deep blue, no text

Anthropic released Claude Sonnet 4.5, which it calls its most capable model to date. The company says it improves coding and real-world computer use. Claude 4.5 is available through the API at the same price as Sonnet 4.

Stronger results and long task focus

Anthropic says Sonnet 4.5 worked on the same project for more than 30 hours on complex, multi-step tasks. The company did not share task details. It frames this as better long-run focus than past agent setups.

On benchmarks, Sonnet 4.5 scored 77.2 percent on SWE-bench Verified and 61.4 percent on OSWorld. Those figures beat OpenAI’s GPT-5 Codex at 74.5 percent and Google’s Gemini 2.5 Pro at 67.2 percent. It also showed gains on AIME 2024 and MMMLU.

On finance tasks in Vals AI’s Finance Agent benchmark, Sonnet 4.5 scored 92 percent. Anthropic says it is the best model for building complex agents and for using computers.

Computer use and benchmark context

Anthropic reports better computer use than Sonnet 4. Four months ago, Sonnet 4 scored 42.2 percent on OSWorld. The new score is 61.4 percent. The company uses these skills in its Claude for Chrome extension, which can navigate sites and fill spreadsheets.

Benchmarks can be gamed or suffer from dataset contamination. So the numbers need independent checks. Even so, the gains suggest a clear step up from 4.0.

Tools, pricing, and early reactions

Claude 4.5 keeps pricing at $3 per million input tokens and $15 per million output tokens. Developers can call the model with the identifier „claude-sonnet-4-5“ through the Claude API.

Anthropic also released Claude Code 2.0, a command-line agent for developers, and the Claude Agent SDK for building coding agents. Claude Code adds checkpoints, a refreshed terminal, and a native VS Code extension. The API gains context editing and a memory tool for longer-running agent tasks.

Users of Claude’s web and app interfaces can now run code and create files in chats. They can also generate spreadsheets, slides, and documents without leaving the chat. A five-day research preview called Imagine with Claude for Max subscribers shows the model generating software in real time.

Anthropic says Sonnet 4.5 reduces sycophancy, deception, power-seeking, and the tendency to encourage delusional thinking. According to Ars Technica, developer Simon Willison said it felt better for code than GPT-5-Codex in his early tests.

Total
0
Shares
Previous Post
Medium close-up of California Governor Gavin Newsom seated at a desk, pen poised over an unsigned document, with a luminous outline of California and subtle circuit patterns glowing behind him, warm golden sunlight on his face contrasted with a cool blue background, shallow depth of field, clean scene with no text or signage.

Newsom signs first US state AI safety law

Next Post
Neutral close-up portraits of Donald Trump, Hakeem Jeffries, and Chuck Schumer arranged in a tight triangle around a glowing smartphone showing a simple abstract play icon with subtle glitch pixels, warm skin tones against a vivid blue–red gradient background, bright high-contrast lighting, medium close-up framing, no text or logos

Trump’s AI video sparks anger as shutdown deadline nears

Related Posts