The most expensive model isn't the answer to every question. That's the first thing worth saying out loud, because the instinct runs exactly the other way: since I have access to the most powerful model, I'll use it for everything. And that's precisely how you end up paying several times over for work a cheaper model would have done just as well.
I'll show you a different logic — one where a cheap, fast model does most of the work, and you call in the expensive one only when it's genuinely needed. First I'll explain why model prices diverge so sharply, then how to tell which task belongs to the "executor" and which to the "advisor," and finally I'll show you a concrete way to set this up for yourself.
Why this matters for costs at all
Language models — the "brains" that read your prompt and write the answer — are billed by the token. A token is roughly a fragment of a word; the longer the prompt and the longer the answer, the more tokens, the higher the bill. And here's the first thing few people remember: the text the model generates (output) costs considerably more than the text it reads (input). That's the rule across almost every model.
Let's look at the Claude family, because the differences are telling. The most powerful model, Opus, costs $5 per million input tokens and $25 per million output. The mid-tier one, Sonnet — $3 for input and $15 for output. The lightest, Haiku — $1 for input and $5 for output. (These are the maker's prices for the versions current in early 2026; the figures themselves change, but the proportion between the models has held for a long time.)
Put another way: for work Opus does at $25, Haiku charges $5. A fivefold difference on output. If you run everything through the most expensive model — including simple answers, summaries, database lookups — you're overpaying by exactly that difference, on every single query.
Executor and advisor — two different roles
Here's the observation this whole strategy rests on. Picture a task made of three steps: A, B, C. Only step A is hard enough that you need a powerful reasoning model. Steps B and C are simple. Why waste money having the most expensive model do B and C, when a cheap model will do them just as effectively for a fraction of the price?
Hence the split into two roles:
- The executor — a cheap, fast model (say Sonnet or Haiku) that runs most of the work: reading, searching, answering simple questions, carrying out routine steps.
- The advisor — an expensive, powerful model (Opus) that the executor reaches for only when it recognizes a task is beyond it. The advisor helps plan a hard part, make a difficult decision, or check what the executor has produced.
The key is in the words "only when." It isn't that the advisor runs in the background the whole time. The executor leads the conversation and decides for itself whether a given question calls for higher intelligence. If it doesn't — it answers itself, cheaply. If it does — it escalates to the advisor. This is escalation driven by difficulty, not using maximum power "just in case."
What the maker's numbers say
The provider published its own tests of this strategy — and one important note here: these are the maker's benchmarks on engineering tasks, not yours. Treat them as a pointer to direction, not a promise of your own result.
In one of those tests, Sonnet as the executor with Opus as the advisor scored 2.7 percentage points higher on a standard software-problem-solving benchmark than Sonnet alone — and at the same time cut the cost of a single task by almost 12 percent. So both better and cheaper at once.
In a second, tougher test, Haiku (the cheapest model) as the executor with Opus as the advisor scored more than twice as high as Haiku achieves on its own. This setup costs more than Haiku alone — obviously, since the expensive consultation is added in. But it still comes out cheaper than if Opus had run the whole task by itself. That's the entire idea in one sentence: you get close to the quality of the most powerful model while paying a fraction of its price.
How to tell executor work from advisor work
Before you set anything up, it's worth looking at your own tasks through this lens. From what I've seen, the line runs like this:
Executor work (a cheap model is entirely enough): - answering simple, repetitive questions, - searching and summarizing what you already have in documents, - routine steps where it's known what should happen, - a first pass through the material, before you even know whether it's hard.
Advisor work (worth reaching for a powerful model): - planning something complex before the work begins, - a decision carrying risk or one that's ambiguous, - checking and evaluating what the executor produced, - the moment a cheap model "stalls" and you can see it isn't coping with the problem.
And an honest note I won't skip: the fact that Sonnet called the advisor on a question Haiku didn't deem hard doesn't automatically mean either one is "better." It means they judge difficulty a little differently — and that your job is to check which setup gives you the answers you'd actually want to send.
How to set this up for yourself — plan mode in Claude Code
The simplest, everyday way to apply this strategy requires building nothing. If you work in Claude Code (the assistant that runs in the terminal and can reach your files), you already have it.
The mechanism is called plan mode, and it works like this: the powerful model plans, the cheap one executes.
- Type the command
/model— you'll see a list of available models (the default, Sonnet, Haiku and others). - Choose the "Opus Plan" option. It sets the powerful Opus model in plan mode only, and uses the cheaper Sonnet for all the rest of the work.
- When you're framing what should happen — you're in plan mode, so the powerful model is thinking for you. The two of you are meant to reach agreement before anything is executed.
- When you approve the plan and move to execution — the tool switches itself to the cheaper Sonnet, which carries out the agreed plan.
The practical rule it comes down to: use the most powerful model only when you need it, and otherwise stick with the cheaper one. The hardest, most thought-intensive moment is usually settling the plan — that's where the powerful model earns its keep. Carrying out a finished plan is executor work.
Test first, trust only afterward
There's one condition without which the whole saving is an illusion. A cheaper bill makes sense only if you don't lose on quality. And you won't verify that on three queries.
Before you let this strategy into something you care about — customer service, generating proposals, analyzing documents — run dozens, ideally hundreds of real queries through each setup and see which one consistently gives the answers you'd want to send. A few tries are too few to claim anything. Only a repeatable result on your own cases is proof. Your use case isn't someone else's use case — and you're the one who has to verify it.
This, by the way, is a principle broader than one mechanism in one tool. You match the model's power and cost to the task, not to your ambition. A cheap, fast model for most of the work; an expensive, powerful one — called in deliberately, at countable moments where it genuinely changes the result. Next time you catch yourself reaching for "I'll go with the most powerful one for peace of mind," stop on one question: which part of this work is actually hard — and whether the rest can be handled more cheaply.