AI Glossary

Reranking

reranking, re-ranking, result reranking, cross-encoder

Reranking is a second retrieval stage in which a separate model reorders the initial results by actual relevance before the best ones reach the language model. It improves answer quality in RAG.

It's a second stage after fast initial retrieval — it reviews the candidates and orders them by relevance.
A reranking model scores the question–passage pair together, which makes it more accurate than comparing vectors alone.
It lowers the risk that irrelevant context reaches the model, so it improves RAG answer quality.

Reranking is a stage that sits between retrieval and answer generation. First, a fast mechanism — usually based on embeddings and vector comparison — selects a broad pool of candidates, say fifty passages. Then the reranking model scores each candidate against the question and orders them from most to least relevant, after which only the top few are passed to the language model.

The difference from semantic search alone is significant: vector search compares precomputed representations of the question and the passage separately, which is fast but approximate. The reranking model analyzes the question and the passage together, so it catches nuances of relevance that vectors alone miss. The price is that it's slower — which is why it's applied to a small pool rather than the whole database.

In a RAG deployment, reranking is one of the cheapest ways to improve quality without rebuilding the entire system. When the model answers correctly but "misses" the intent of the question, the problem is often not the generator itself, but the fact that poorly matched passages were reaching the context — and that is exactly what reranking sorts out.

Related terms