How to really use Claude Opus 4.8 — five things worth changing · Wiki

Anthropic has released Claude Opus 4.8. The benchmarks — the official comparative model tests — look great, as usual. But the more important question is a different one: is this a better model for you, and do you have to change how you work with it? The answer to the second part is yes. I'll show you five practical changes that the difference comes down to.

What actually changed

Opus 4.8 is built on its predecessor, Opus 4.7. In short: sharper judgment, more honesty about its own progress, and the ability to work longer on one task on its own. The input and output price (for the text you feed the model and the text it returns) stays the same as in 4.7. What did go up are the request limits in Claude Code over the API — to make room for the higher token use at the stronger effort levels. That concerns the technical limits for API users; the five-hour window and the weekly session limits are unchanged.

The 1M-token context window stays too. A token is a fragment of text the model operates on — roughly a piece of a word. The context window is the amount of text the model "sees" at once: instructions, files, the earlier conversation. A million tokens is a great deal — you can fit an extensive project into it all at once.

It's worth flagging a few terms that come up below. Claude Code is the tool in which Claude works on coding and operational tasks — it runs in the terminal or as an extension, reads files, carries out steps. The effort level is a setting for how much "effort" the model puts into a task. We'll come back to this in a moment, because it's the most important thing today.

Anthropic devoted a separate section of the announcement to the model's honesty. It's a familiar problem: the model would declare "done, I sent 50 files" when it had sent 15, or promise four hours of work and do it in twenty minutes. Opus 4.8 is meant to make things up like that less often. If you got that impression with 4.7, you weren't alone.

And one honest note up front: not every problem with 4.7 was the model's. Sometimes it's a matter of how it's used. In my view it's worth taking some of the responsibility on yourself before you decide the tool is at fault.

1. Effort is now the most important lever

In Opus 4.8 you can control how much effort the model puts into a task. In Claude Code you type effort and a slider appears. It's set to high by default. The available levels are: low, medium, high, x-high, max and ultracode (which is x-high combined with the workflows feature). The higher you go, the "smarter" the model — but also the costlier in tokens. The lower you go, the faster the responses.

An abstract slider on a dark background with six points of rising brightness, from dim to an intense green glow — a picture of the model's effort levels from low to max.

Some of the old 4.7 gripes — the feeling of "laziness" or overeagerness — may in fact have been a matter of the wrong level. If a task takes a lot of work and the model is sitting on low or medium, that's simply too little effort. The reverse holds too: on a trivial task set to x-high the model can overdo it — deliberating over and complicating something that's simple. It's a balance between the model's intelligence, the token cost and speed.

The most important takeaway: if you're one of those people who open Claude Code, start typing and never touch this setting — start. The difference between Opus 4.8 on low and on x-high is big enough to feel like an entirely different version of the model. It's worth pulling that lever.

2. Say what to do, not what not to do

One thing stands out in the official documentation on good instruction-writing practice: good examples rarely tell the model what not to do. They almost always describe outright what to do. The model handles a positive instruction better than a list of prohibitions.

An abstract image with a dark tangled knot of gray lines on the left and one clean, glowing green-and-blue path leading straight to the goal on the right — the contrast between a list of prohibitions and a clear instruction.

It's a simple change of habit. Instead of building a long list of "don't do this, don't do that," describe the result you care about. The model then hits your intent more accurately, rather than weaving between prohibitions.

3. Give the "why" behind an instruction

This builds on the previous point. The model acts as if it's curious about context — if you tell it to avoid something, it more or less asks "but why?" The more of that context you supply, the better it sticks to the instruction.

An example. Instead of a dry "don't use em dashes," write: "I want the text to sound as if I wrote it myself; it's my style and I never use em dashes, so stick to that style." The same request, but with a rationale — and the model clearly respects it more readily. Fewer negative instructions, more "why" context — that translates into better adherence to your rules.

4. By default it reasons before reaching for tools

Opus 4.8 reasons first and only then reaches for tools. Before it launches a helper agent or looks into a database, say, it tries to work out on its own what questions to ask and how to approach the problem, using what it already has.

This is often very good — sometimes you do want the model to think the matter through first. But sometimes you want the opposite: for it to first pull in extra context and only then start reasoning. So as you move your workflows from 4.7 to 4.8, don't switch the model over "blind" on the assumption that everything will work the same. Watch it for a while and get a feel for how it behaves.

5. It calibrates response length on its own

Opus 4.8 matches the length and detail of its answers to the complexity of the task on its own — rather than holding to one fixed level of verbosity. In practice: shorter answers for a simple fact check, longer ones for an open-ended analysis that takes more reasoning.

Benchmarks always look great

Finally, something that's easy to forget. A new model's benchmarks always look excellent — that's the nature of marketing announcements. It happens that Opus 4.8 really is better at one use, while another tool still wins at something else — regardless of what the official tests say. Someone else's use case isn't your use case.

So don't start from the benchmarks, start from your own pain points. Look at what frustrated you most in working with Opus 4.7: where you still repeat the same prompt to the model, how often you have to correct it, how fast you approach your session limit. Maybe 4.8 solves those problems, and maybe it solves other ones instead. A better model doesn't automatically mean better for this particular problem.

So check the concrete things: whether the collaboration is more pleasant, whether you correct it less often, what token use looks like. According to the documentation the new model is more economical in this respect — but that's yet to be confirmed in practice. When you choose the model, the context strategy and the effort level, aim straight at the constraints and pain points you have now. Those, not a table of benchmarks, will tell you whether the change was worth the trouble.