AI Glossary

Transformer

transformer architecture, transformer model

A transformer is a neural network architecture built on the attention mechanism, which lets the model weigh the relationships between every token in a sequence. It is the foundation of today's large language models.

Introduced in 2017 in the paper "Attention Is All You Need".
The attention mechanism weighs dependencies between all tokens at once.
It is the basis for LLMs and many multimodal models.

A transformer is a type of neural network first described in 2017. Its key component is the attention mechanism (attention), which lets the model assess how strongly each token relates to the others in the same sequence. This makes it far better at handling context and long-range dependencies.

Unlike older architectures, a transformer processes a sequence in parallel, which makes good use of modern hardware and makes very large models easier to train. It is precisely this property that made it the foundation of today's large language models.

Related terms