What is Extended thinking (reasoning effort)?

AI Glossary

Extended thinking (reasoning effort)

extended thinking, reasoning effort, thinking mode, thinking budget

Extended thinking is a mode in which a model generates internal reasoning before giving its final answer. It trades higher latency and token usage for greater accuracy on hard tasks.

The model produces internal reasoning steps before delivering the final answer.
It raises accuracy on complex tasks at the cost of latency and token count.
The amount of thinking can be tuned — from brief to deep — depending on task difficulty.

Extended thinking is a mode of operation in which, before issuing a final answer, the model is allowed to generate a stream of internal reasoning — to lay out the steps, check alternatives, catch an error. Only then does it formulate its final answer. It's a deliberate trade-off: response time and the number of tokens used both rise, but on hard tasks — math, multi-step analysis, code — accuracy rises noticeably.

The mechanism is related to the chain-of-thought technique, in which the model is asked to "think step by step." The difference is that extended thinking is sometimes built into the model itself or controlled by a reasoning-effort parameter, rather than forced solely through the prompt's content. Models designed around this mechanism are called reasoning models.

For a deployment, the ability to tune the effort is what matters in practice. Simple, high-volume queries are handled in a fast, cheap mode, and extended thinking is switched on where an error is costly and a longer, more expensive inference pays off. Controlling this level is a real lever over cost and quality in production use of models.

Related terms