What is RAG (Retrieval-Augmented Generation)?

AI Glossary

RAG (Retrieval-Augmented Generation)

Retrieval-Augmented Generation, retrieval-augmented generation

RAG is a technique in which a language model searches for relevant document passages before answering and grounds its generation in them — so it responds based on specific sources rather than memory alone.

Combines retrieval with generation.
Reduces hallucinations because the answer is grounded in sources.
Lets you work with company knowledge without retraining the model from scratch.

RAG addresses the problem where a language model on its own doesn't know current or company-specific information. Instead of retraining the model from scratch, we retrieve the most relevant passages from a document store and add them to the query.

In practice, the documents are converted ahead of time into embeddings and stored in a vector database. When a question comes in, the system finds the closest passages and passes them to the model as context, and the model formulates its answer based on them.

Related terms

In guides