Guide
Decisions & comparisons
How to choose a language model: the right one for the task, not the "best"
There's no single best model — only the one that fits the task. What matters is cost, context window, privacy, hosting, and latency — not a leaderboard ranking.
- Choosing a model is about fit to the task, not a place on a leaderboard.
- Five criteria: cost, context window, privacy, hosting, latency.
- Decide on your own test set, not on someone else's benchmarks.
Why "the best model" is the wrong question
Every month a model wins some leaderboard. That's a weak basis for a decision, because leaderboards measure averaged tasks while yours are specific. You choose a language model the way you choose a tool: for the task, the budget, and the constraints — not for its position on a list.
The right question is: which model meets my quality requirements at an acceptable cost and latency, on data I can't hand off to a third party? The answer is rarely "the one at the top of the leaderboard."
Five criteria that actually carry weight
Cost. Billing is usually per token — both input and output. At high volume, the gap between models multiplies by the number of calls and becomes the main line on the bill.
Context window. The context window is how much text the model sees at once. A large window simplifies work with long documents, but every token in the context costs money and raises latency. Bigger doesn't mean free.
Privacy. Can the data leave your infrastructure? Does the provider use it for training? This is often a hard requirement that rules out some candidates before you even look at quality.
Hosting. A model behind the provider's API starts working right away and needs no maintenance. A model on your own infrastructure gives you control and predictable cost at scale, at the price of a team and hardware.
Latency. In a live conversation, milliseconds matter; in overnight batch processing, they don't. A larger model and a larger context usually respond more slowly.
The criteria in one table
| Criterion | What it measures | When it decides |
|---|---|---|
| Cost per token | Price of input and output | High call volume |
| Context window | How much text at once | Long documents, RAG |
| Privacy | Where the data ends up | Sensitive data, regulations |
| Hosting | Cloud or self-hosted | Scale, control, team |
| Latency | Response time | Live interaction |
Operator's rule: don't pick a model off a leaderboard. Pick two or three candidates that meet your hard requirements (privacy, cost) and settle it on your own test set.
Size and parameters — without the myths
The number of model parameters is sometimes treated as a measure of quality. That's an oversimplification. A larger model is more often capable of hard reasoning, but for classification, data extraction, or short summaries a smaller model is often faster, cheaper, and good enough. Match the size to the difficulty of the task, not to a "just in case."
How to settle it: your own evaluation
The decision shouldn't rest on someone else's tests. Build a small set of your own cases — 30 to 50 real queries with expected answers — and run the candidates through it. Such an evaluation on your data says more than any public leaderboard, because it measures exactly what you need to do.
The result often surprises: a cheaper, smaller model beats a pricier favorite on the specific task. That is the whole point of "fit to the task over best" — and the reason to make this decision on numbers, not on a name.
Terms in this guide
Related articles
- Claude Fable 5 and Mythos 5 — what Anthropic shipped and why you have two weeks
- How to really use Claude Opus 4.8 — five things worth changing
- Cheap executor, expensive advisor — how to match the AI model to the task
Frequently asked questions
- Is a bigger model always better?
- No. More parameters mean higher cost and higher latency. For simple tasks — classification, short summaries — a smaller model is often faster and cheaper at comparable quality.
- Is a leaderboard enough to choose from?
- No. Public leaderboards measure averaged tasks, not yours. Build a small set of your own cases and test the candidates on it — the result often differs from the general ranking.
- Cloud model or self-hosted?
- The cloud is faster to start with and needs no maintenance. Self-hosting gives you control over the data and predictable cost at high volume, but it requires a team and infrastructure.