AI Glossary
Context window
context window, model context
The context window is the maximum number of tokens a model can process at once in a single request — including the instruction, attached data, and the response. Anything beyond this limit is dropped.
- Measured in tokens, not words or characters.
- Covers the input and the generated response together.
- Once the limit is exceeded, older content falls out of the model's reach.
The context window defines how many tokens a model sees during a single request. It holds the entire instruction, any attached documents, the conversation history, and the response being generated. When the content exceeds the limit, the oldest fragments are no longer available to the model.
The window size directly affects how much material you can supply at once. With a large body of company knowledge, instead of pasting everything in, you use document retrieval, which selects only the most relevant fragments for the window. A longer window also usually means a higher cost per request.
Related terms
In guides
Related articles