When other AI manufacturers release models, they will definitely tell you "how awesome and powerful our product is this time." But Anthropic is different. They said, "We have something stronger, but we can't give it to you yet." So on April 17, 2026, Anthropic released Claude Opus 4.7. There is not much suspense in this release. The official blog lists the running scores, ability improvements and application scenarios step by step. But if you read the entire announcement carefully, you'll notice something unusual.
Opus 4.7 follows Anthropic's Project Glasswing and Mythos Preview. And last week they just announced that Mythos Preview is temporarily restricted from release due to excessive network security capabilities.
Therefore Opus 4.7 is clearly positioned as "the first public model used to test new network security guardrails."
Officials even said they experimentally weakened the model's cybersecurity capabilities during training.
So what exactly is Opus 4.7?
01 What is the performance of Opus 4.7?
Let’s talk about the regular part first.
Opus 4.7 outperforms Opus 4.6 on multiple benchmarks, especially on advanced software engineering tasks.
In the official chart, Opus 4.7 is 87.6% and Opus 4.6 is 80.8% on SWE-Bench Verified; on the more difficult SWE-Bench Pro, Opus 4.7 is 64.3% and Opus 4.6 is 53.4%; on Terminal-Bench 2.0, Opus 4.7 is 69.4% and Opus 4.6 is 65.4%; Finance On agent v11, Opus 4.7 is 64.4% and Opus 4.6 is 60.1%.

Let's explain this string of numbers in human terms: You can now hand over more complex programming tasks to Opus 4.7, which will handle long-running tasks more rigorously, follow instructions more precisely, and find ways to verify its output before reporting it.
In the feedback from early testers of Opus 4.7, there are several points worth noting.
The first is that the ability to follow instructions has been greatly improved.
Opus 4.7 interprets instructions strictly literally, whereas previous models tended to interpret them loosely or skip certain parts.
This sounds like a good thing, but it can actually cause trouble. The performance is that Opus 4.7 is more "obedient", but this will make some old prompt words invalid.
The previous Claude might have been more "understanding". You write a vague instruction, and it will automatically complete your true intention, or ignore some less important, conflicting, or unclearly written requirements. Many users' prompt words are actually adjusted based on this old model habit.
But Opus 4.7 officials say it prefers to follow instructions strictly literally. In this way, small details in the old prompt words that were automatically ignored by the model may now be carefully implemented. The fuzzy expressions that the model used to deal with flexibly are now understood in the most direct way.
The result is that the model is obviously stronger, but the output is different from what the user expected.
The second is multi-modal support improvements.
The Opus 4.7 can accept images up to 2576 pixels long side, which is about 3.75 megapixels, more than three times that of the previous Claude model.
This is not an ordinary "image recognition ability" upgrade, but to enable AI to understand the software interface and serve Anthropic's Computer Use function.
The visual upgrade of Opus 4.7 is not to make users ask "What's in this picture?" but to enable agents to understand the software interface.
If an agent cannot see dense forms, terminal output, design draft details, and code screenshots, it will be useless no matter how strong its operating capabilities are, because it only knows how to work, but not where to work.
Anthropic increased the image resolution, essentially giving Claude clearer eyes.
In the future, many tasks in AI office, AI testing, AI security, and AI front-end development will not be pure text tasks, but screen tasks.
The third is actual work performance.
Internal testing shows that Opus 4.7 is more effective than Opus 4.6 in financial analysis tasks, producing more rigorous analyzes and models, more professional presentations, and tighter cross-task integration.
It also has the highest score in the third-party assessment GPQAval-AA, which is an assessment covering finance, law and other fields.
The fourth is memory ability.
Opus 4.7 will also use file system-based memory. It can remember important notes during long-term, multi-session work, and subsequent tasks require less pre-information.
This point is not obvious in the official announcement, but I think it may be the most critical update feature in long-term use.
Only an agent that can remember project constraints, user preferences, architectural decisions, and reasons for the last failure across sessions can transform from a "smart temporary worker" to a "stable colleague."
In terms of security and alignment, the overall performance of Opus 4.7 and Opus 4.6 is similar.
It improves on honesty and resistance to malicious tip injection attacks, and decreases on the ability to give harmful advice, such as how to make and use regulated knives.
The official alignment assessment concludes that the model is "generally aligned and trustworthy, but not yet fully behavioral."
In terms of price, Opus 4.7 and Opus 4.6 remain the same. Inputs cost $5 per million tokens, and outputs cost $25 per million tokens.
But there are two cost changes mentioned in the migration guide. The new tokenizer may turn the same input into 1.0 to 1.35 times the token. In the strong thinking mode, especially the agent's multi-round dialogue, the model will think more and may output more tokens.
So this is where Anthropic is being careful. The nominal price does not change, but it will become more expensive if you run it more.
In the past, model billing mainly depended on the length of input and output, but now it also depends on the level of thinking, task budget, how many rounds the agent has run, and whether reasoning continues after the tool fails.
Anthropic’s newly added x-high effort and task budgets show that the use of high-end models is following the same logic as cloud computing back then. What you are paying for is not an answer, but a task process that involves thinking, trial and error, and verification.
02 Why did Anthropic release castrated models?
Then again, one of the real selling points of Opus 4.7 is precisely that it doesn't fully unleash its capabilities.
This may sound counter-intuitive, but it may be the norm for next-generation model companies.
The closer the model is to the real production environment, the less we can only pursue stronger results. It needs to know what can and cannot be done, which users can open more permissions, and which requests must be blocked.
Anthropic launched the Cyber Verification Program at the same time as it released Opus 4.7.
This program essentially grades abilities. Ordinary users get Opus with guardrails, and only verified security experts can apply for wider network security uses.
The model automatically detects and blocks requests that indicate prohibited or high-risk cybersecurity uses.
Anthropic says it will learn from real-world deployments of Opus 4.7 to prepare for the widespread release of Mythos-level models in the future.
I have to say that Anthropic still knows how to play. They believe that Opus's current capabilities are surplus, so they turned security into product capabilities.
In the past few years, the competitive logic of AI companies has been "I am better than you." It has higher running scores, more parameters, and more complex things it can do. But when the model capability reaches a certain critical point, this logic begins to fail.
A model that performs too well in cybersecurity tests may mean it can also be used maliciously. An agent with no restrictions at all may make dangerous decisions without the user's knowledge.
The path chosen by Anthropic is to lock away the strongest model first and use a weaker but good enough model to test the security mechanism. It’s not that it’s technically impossible, it’s that you actively choose not to do it. This "restraint" itself becomes part of product differentiation.
Whether this strategy can succeed depends on whether the market recognizes the concept of "caution."
If users only care about "can it be done", then Anthropic's approach will appear conservative. But if corporate customers start to pay attention to "will something go wrong", then this kind of hierarchical release and proactive weakening of certain capabilities may actually become a competitive advantage.
At the same time as Opus 4.7 was released, Anthropic also updated Claude Code, adding auto mode and /ultrareview functions.
Auto mode is not automatic model selection, but a permission option. It allows Claude to make some permission decisions for the user, so long tasks are less interrupted, but the risk is lower than skipping permission confirmation completely.
This design targets the core contradiction of agent products: if you ask too many questions, the agent will look like an intern; if you don’t ask, the risk will be too great.
The most difficult button to design in the agent era is not "Start", but "Allow".
In the past, AI only answered questions and had very few permissions.
Now it needs to change code, read files, run commands, open web pages, and submit PRs. Every step involves risks.
If every operation requires user confirmation, the agent's autonomy will be meaningless. But if they are completely let go, users will worry that the AI will make irreversible wrong decisions.
The essence of auto mode is to find a balance between "don't bother me" and "don't mess around".
Depending on the risk level of the operation, it decides whether to execute it automatically, prompt the user, or require explicit authorization.
This is also a huge leap between "what an agent can do" and "whether it can be used."
/ultrareview is a dedicated code review session that reads changes and points out bugs and design issues.
This function is much more fun than writing code, because it shows that AI programming has officially entered the second stage, allowing AI to review the code generated by AI itself.
It is no longer uncommon for AI to write code. What is really rare is whether AI can review its own code.
/ultrareview is like Anthropic’s second pair of eyes for Claude Code.
One agent is responsible for writing, and another, more discreet session is responsible for reviewing.
I can guess without looking at the data that these two functions must be high-frequency functions. Because essentially, these two functions used to be what all programmers who used Claude Code did.
Generating code is only part of the development process. Reviewing, testing, refactoring, and documentation are equally important. If AI can only do the first step, it will always be just an auxiliary tool. If it can participate in the entire process, it may truly change the way software is developed.
There is another detail worth noting about this release. The official specifically reminds users in the migration guide that the use of tokens in Opus 4.7 may increase, but in actual programming evaluation, the overall efficiency has improved.
This shows that they are optimizing not the cost of a single call, but the total cost of completing the task. If an agent does things right the first time, even if a single call is more expensive, the total cost will be lower than repeated trial and error.
This is a more mature product idea. In the early days, AI products pursued "cheap" and "fast", but now they are pursuing "reliability".
Opus 4.7 is not the strongest model, and Anthropic does not package it as the strongest model.
It is a balance between capability, safety and cost. But whether it is truly balanced, I don’t know. This needs to be verified by the market.
At least in terms of release strategy, Anthropic gives a new idea, because sometimes "what not to do" is more important than "what to do".