AI Glossary
Token
tokens, text token
A token is the smallest unit of text a language model works on — usually a fragment of a word, a whole word, or a character. The model processes and generates text precisely as a sequence of tokens.
- Text is split into tokens before it enters the model.
- One token is on average a piece of a word, not always a whole word.
- The number of tokens determines the context limit and the cost of a query.
A language model doesn't read text as letters or whole sentences. First it splits the text into tokens — short fragments — and assigns them numbers. In English and Polish, one token is usually a piece of a word, so a longer or rarer word may break into several tokens.
Tokens matter in practice. They are the unit in which the context window is measured — how much text the model sees at once — as well as the cost of using the model, since providers bill queries by the number of tokens on the input and the output.
Related terms
In guides
Related articles