0110000101101001

Anthropic Unveils Claude Opus 4.1: A Leap Forward in AI Coding and Reasoning

AI brain processing complex code structures.

Anthropic has unveiled Claude Opus 4.1, an enhanced version of its advanced AI model, promising significant improvements in agentic tasks, real-world coding, and complex reasoning. This upgrade is now accessible to paid Claude users, Claude Code, and through major cloud platforms like Amazon Bedrock and Google Cloud’s Vertex AI, maintaining its previous pricing structure.

Key Takeaways

  • Broader Accessibility: Available via API, Amazon Bedrock, and Google Cloud’s Vertex AI, making it accessible to a wider range of developers and businesses.
  • Enhanced Coding Prowess: Claude Opus 4.1 achieves a 74.5% score on SWE-bench Verified, a notable leap in coding performance, particularly in multi-file code refactoring.
  • Improved Reasoning and Analysis: The model demonstrates advanced capabilities in in-depth research and data analysis, with a greater capacity for detail tracking and agentic search.
  • Industry Validation: Early adopters like GitHub and Rakuten Group report significant performance gains, with Rakuten highlighting Opus 4.1’s precision in debugging large codebases.

Advancements in AI Capabilities

Claude Opus 4.1 represents a significant step forward for Anthropic’s AI offerings. The model boasts a 74.5% success rate on the SWE-bench Verified benchmark, a critical measure of AI’s ability to handle real-world coding tasks. This represents a substantial improvement over its predecessor, particularly in complex operations like multi-file code refactoring. GitHub has noted these gains, confirming that Opus 4.1 outperforms Opus 4 across most capabilities.

Beyond coding, Opus 4.1 shows marked improvements in its capacity for in-depth research and data analysis. Its ability to track intricate details and perform agentic searches has been refined, making it a more powerful tool for complex problem-solving and information synthesis. Rakuten Group has specifically praised the model’s precision, noting its efficiency in identifying and correcting errors within extensive code repositories without introducing new issues, a trait highly valued for everyday debugging.

Performance Benchmarks and Methodology

Anthropic employs a hybrid reasoning approach for its Claude models. The reported benchmarks, including SWE-bench Verified and Terminal-Bench, were achieved without the use of extended thinking. Conversely, benchmarks such as TAU-bench, GPQA Diamond, MMMLU, MMMU, and AIME utilized extended thinking, allowing the model to process up to 64,000 tokens. For TAU-bench, specific prompt addendums were used to encourage better reasoning and tool utilization, with the maximum number of steps increased to accommodate the model’s more thorough problem-solving process.

In its evaluation of SWE-bench, Anthropic continues to equip its Claude 4 models with a bash tool and a file editing tool. The planning tool previously used by Claude 3.7 Sonnet has been omitted. Scores for Claude 4 models are reported against the full 500 problems in SWE-bench, while OpenAI’s scores are based on a subset of 477 problems.

Availability and Future Outlook

Claude Opus 4.1 is readily available for paid Claude users and within Claude Code. Developers can access it through Anthropic’s API by using the identifier claude-opus-4-1-20250805. Furthermore, the model is integrated into Amazon Bedrock and Google Cloud’s Vertex AI, broadening its accessibility. Anthropic encourages users to upgrade from Opus 4 to Opus 4.1 for all applications and emphasizes that user feedback remains crucial for ongoing model development and future releases.

Sources

Claude Opus 4.1, Anthropic.