Anthropic has just released Claude Opus 4.7, and the headline is not the benchmark numbers. It is the behavior. If you walk in expecting a polished version of Opus 4.6, you will be confused inside of five minutes. The model responds differently, it waits differently, and the entire operating system around it has shifted toward agentic workflows.
That shift is the real story. Opus 4.7 is less a chatbot and more a conductor, and Anthropic is openly telling us to stop using it like the previous generation. What follows is a full breakdown of what has changed, what to actually turn on, and which of the new commands are worth your time.
A Different Model, Not a Polished One
The first thing you will notice is literal adherence to instructions. What you write, the model does. What you do not write, the model will not do. Those old shortcut prompts like “give me a pitch deck to sell my product” that used to trigger a full PowerPoint now stop and wait for you to specify. This is not a regression. It is alignment.
Anthropic has been working on two problems the entire AI industry is stuck on: can a model actually think, and can it follow instructions over long horizons without drifting? The answer they have landed on for 4.7 is to make the model much more obedient to explicit instruction sets, and to push the work of framing onto the user. If you give it a vague brief, you will get a vague response or a request for clarification. If you give it a coded decision tree, it will execute with real consistency.
Opus 4.7 rewards structured prompting and punishes casual prompting. The models that win with this release are the ones whose users learn to write agentic instructions, not longer sentences.
This also means long rambling prompts that worked in 4.6 do not work as well here. The recommended approach is short, sequenced, specific. Tell it what to retrieve, who to work with, and how to choose its actions. That is the agentic loop, and it is now the core interaction pattern.
The Advisor Function: Two Models Working in Tandem
Released 48 hours before the model itself, the Advisor function is the architectural preview of where everything is heading. Type “Advisor” in your terminal, press enter, and you can now link one Claude model to another. When your main model gets stuck, it pings the advisor, which in this case is Opus 4.7, and gets back a very short sequence of 400 to 800 tokens telling it how to unblock itself.
Short is the key word. The advisor is not rewriting the response. It is nudging the stuck model with a compact instruction to keep moving. That is it. And yet this small mechanism changes the stability profile of long-running agent tasks significantly, because it means one model can restart another when reasoning breaks down.
Single-agent AI is quietly being retired. Going forward, any serious workflow will pair at least two models, one executing and one advising. The Advisor function is how Anthropic is introducing that pattern to its user base.
Orchestrator and Sub-Agents: The New Mental Model
Underneath the advisor pattern sits a bigger concept: the orchestrator. You have a head agent that coordinates, and sub-agents that each have their own tools, their own contexts, and their own instructions. Think of them as employees. The orchestrator dispatches work, the sub-agents execute in their own sandboxes, and they report back. The orchestrator compiles the result.
Here is the honest part. None of this happens automatically. The marketing promise that you can say “do my job” and Claude will build its own agent network is not true. What the model does extremely well is follow an ASCII decision tree written by you. If you draw the decision tree clearly, specify what each agent does, what variables it handles, and how it formats its output, Opus 4.7 executes it with more consistency than any previous Claude model.
If you skip that step, the model will keep generating tokens without ever producing a usable solution. That is the failure mode to watch for. More tokens, no convergence. The fix is not a better prompt. It is a coded instruction set.
Benchmarks: Where 4.7 Actually Improves
On paper, 4.7 is a ten-point jump over Sonnet and Opus 4.6 in High configuration. The gains concentrate in three places: coding performance, gradual reasoning on academic knowledge (around 94.2, close to the ceiling of human expert reasoning), and visual reasoning. That last one is the quiet upgrade most people will miss.
Send a complex graph to Opus 4.6 and you got sloppy interpretations. Send it to 4.7 and it now sits among the best multimodal models at linking visual technical elements to text. This is the area where Gemini 3.1 had a clear lead, and Anthropic has closed most of that gap.
There is one regression worth flagging: agentic search. It has dropped slightly. Not dramatically, not enough to panic over, but enough that if your workflow depends on autonomous web research, you should run a few side-by-side tests before migrating.
| Metric | Opus 4.6 | Opus 4.7 |
|---|---|---|
| Long-term coherence (autonomous business task) | ~$8,000 generated | ~$10,000 generated |
| Misalignment score (lower is better) | 2.48 | 2.75 |
| Visual reasoning | Weak on dense graphs | Best-in-class multimodal |
| Agentic search | Stronger | Slight decline |
| Coding and reasoning (High mode) | Baseline | +10 points |
A note on misalignment. The score went up, not down, and that is worth pausing on. It is still a small delta, but misalignment is exactly when the model decides to do something other than what you asked. In an agentic system running for hours, a drifting agent can cause damage before you notice. This is another reason the instruction layer matters more than the raw capability score.
Reasoning Modes: What to Actually Turn On
Anthropic has introduced a new reasoning tier called Extra High, sitting between High and Max. The official recommendation is to run 4.7 in High almost all the time, and to reach for Extra High when the task involves code or runs longer than 30 minutes.
The Max setting is the one to avoid. It doubles your token usage from about 100,000 to 210,000 reasoning tokens for a marginal gain of 3 to 4 percent. That is not a trade worth making for most work. Extra High gives you performance equivalent to the old Max ceiling of 4.6 at a lower effective cost, which is where the real value sits.
On the lower end, Medium is barely at 55 percent reliability with this model. Do not use it. If you are on the web chat interface, all of this is handled automatically by the “adaptive thinking” system, which decides the reasoning depth for you. If you are on the CLI, you set it yourself with /effort.
High for 99 percent of work. Extra High for code and long-running analysis. Max is a trap. Medium is broken.
Why Reasoning Depth Even Matters
The technical reason ties back to neural network architecture. Two variables matter: the number of attention blocks and the depth of the neural layers. Picture a river. The attention blocks are the height of the walls, the layers are the depth of the water. Feed a model more information than its walls can hold and it overflows. Give it shallow layers and it cannot hold the relationships between concepts across a long context.
Opus 4.7 widens both. That is why it can handle longer contexts and more complex cross-references without collapsing. It is also why structured input still matters. A wider river does not rescue you from poorly organized data. Send a million tokens of junk and the model still loses focus.
The New CLI Commands
If you are on Claude Code CLI, a handful of commands are worth memorizing on day one. The web interface abstracts all of this away, but the CLI is where the real power sits.
/model and /effort
Type /model to switch between Opus 4.7, Sonnet 4.6, and any other available model. Type /config, navigate to the models section, and hit space to see the full list. The /effort command is where you set the reasoning tier: automatic, max, or extra high.
Output Styles
Opus 4.7 has a noticeably different writing tone out of the box. More direct, less verbose, with its own rhythm. If you were relying on custom instructions for 4.6, you will probably need to rewrite them. Head to /config, find the output styles section, and you will see the built-in options: explanatory, learning, and code_style. You can create your own by dropping a file in the output style directory with a name, a description, and instructions that shape the token sampling of the response.
/ultra_review
This one replaces the older review function and it is serious. Run /ultra_review on a working directory and three independent AI agents sweep through your code: stability tests, security checks, bug fixes, full analysis. It burns close to 200,000 tokens per run, costs between five and twenty dollars, and takes around 40 minutes. Pro and Max account holders get three free runs during the launch window, which is a straightforward invitation to try it on a real codebase before committing.
/proactive
Proactive is Anthropic’s take on scheduled loops. Type /proactive in the menu, give it an interval and a prompt, and it creates a cron task that runs your instruction on a schedule. Default cap is three days, which you can extend by specifying it in the system. A typical use case looks like: “/loop 5 minutes. Check if the deployment of my app on GitHub has been completed.” Useful for monitoring, polling, and any workflow where you want Claude to check on something without being asked.
/rewind
The old /undo has been renamed /rewind, and it does what you would expect. Roll back to a previous point in the conversation when the model has wandered into a bad branch. You press enter, scroll through prior states, and pick where to restart. It saves context window space and saves you from watching a confused thread spiral further.
GDPVAL Skills and the Cowork Shift
One layer most coverage will miss: Opus 4.7 ships with GDPVAL skills, which are modeled behaviors for specific tasks. How to build a PowerPoint. How to write a Word document. How to handle science and biology reasoning. How to hold long-term consistency across complex decision-making. These skills are part of why Claude Cowork, the new interface with its redesigned menu, works as well as it does on document-heavy work.
The broader point: Claude is no longer a chatbot you prompt. It is an agentic system you instruct. If you are still using it like a chatbot, you are leaving about 98 percent of the capability on the table.
What This Means for Teams Building on Claude
The practical takeaway is blunt. Stop writing prose prompts. Start writing decision trees. Pair your main model with the advisor. Reach for Extra High when the task justifies it and stay on High the rest of the time. Write output styles for any workflow where tone matters. Use /ultra_review on anything heading to production. Use /rewind to keep your context clean.
The companies automating whole teams with Claude are not doing it with magic prompts. They are doing it with coded instructions, clear sub-agent definitions, and disciplined reasoning-mode selection. That is the boring truth of what Opus 4.7 rewards.
Frequently Asked Questions
Is Claude Opus 4.7 worth upgrading from Opus 4.6?
For code, visual reasoning, and any long-running agent task, yes. The ten-point jump in High mode alone justifies the switch. For agentic search workflows specifically, run a side-by-side test first because 4.7 has regressed slightly there.
Should I use Max reasoning mode?
No. You pay roughly double the tokens for a 3 to 4 percent gain. Use High for general work and Extra High for code and tasks over 30 minutes. That covers 99 percent of real use cases.
Do I need a Max plan to use Extra High reasoning?
Yes. Extra High consumes around 110,000 reasoning tokens per run, so you need a plan with at least a one-million-token context window. The Pro tier will run into limits fast on this setting.
What is the Advisor function actually doing?
Linking your active model to a second model that steps in with a short 400 to 800 token nudge when reasoning gets stuck. It does not rewrite the response. It unblocks the main model so it can continue executing. Think of it as a senior engineer pairing with a junior one.
Can I run /ultra_review on any codebase?
Yes, as long as you define the working directory. Budget roughly $5 to $20 per run and 40 minutes of processing. Pro and Max account holders get three free runs at launch, which is the cheapest way to test it on something real.
Why did the misalignment score go up in 4.7?
It is a small delta, but it is the right question to ask. Agentic models making more autonomous decisions have more room to drift. The answer is not to avoid 4.7. It is to tighten your instruction layer so the model has less room to improvise in the first place.
What happened to the /undo command?
Renamed to /rewind. Same function: roll back to a previous point in the conversation when the thread has gone sideways. Press enter, scroll through prior states, pick a restart point.
The Real Story
Opus 4.7 is not a bigger model pretending to be smarter. It is a more obedient model that expects more work from you. That sounds like a downgrade until you actually use it. The payoff is consistency over long tasks, stable behavior inside agent swarms, and output that matches the specification instead of drifting from it. For anyone building real workflows on top of Claude, that trade is the right one. For anyone still prompting it like a search bar, this release will feel harder than the last one. Both things can be true.









Leave a Reply