AI Glossary
Reasoning model
reasoning model, inference model
A reasoning model is a language model that, before answering, devotes extra compute to internal, step-by-step reasoning. It excels at tasks requiring multi-step logic, such as mathematics or programming.
- Extends the inference phase to break a problem into intermediate steps before answering.
- Improves accuracy on tasks in mathematics, programming and multi-step analysis.
- Greater accuracy comes with longer response times and a higher cost per query.
A reasoning model is a variant of a large language model that, before formulating its final answer, devotes extra resources to an internal line of reasoning. Instead of generating the result right away, the model breaks the problem into intermediate steps — in practice an automated, reinforced form of chain-of-thought. The mechanism is sometimes called extended thinking and is one of the most important changes in the models of 2024–2026.
Unlike a standard model, which treats every query with a similar amount of compute, a reasoning model dynamically increases its effort for harder problems. This extra effort happens in the inference phase — so-called inference-time compute scaling, an alternative to simply growing the parameter count. Some systems let you control the depth of this process, which we describe as extended thinking.
From the perspective of a business deployment, the choice is a trade-off between quality and cost. Reasoning models deliver clearly better results in data analysis, code review or planning, but every answer takes longer and consumes more tokens. That is why, in production, it is best to switch reasoning on selectively — where correctness matters more than response time — and route simpler, high-volume queries to cheaper models.
Related terms
Related articles