Skip to content
Article Issue #5184

Retrieval-Augmented Generation (RAG)

What to know

Retrieval-Augmented Generation (RAG) is a system design pattern in which a retrieval step fetches relevant documents or passages from an external store, and those documents are concatenated with the user query before being passed to a generative language model; At query time, the user's input is converted into an embedding and compared against pre-indexed document embeddings in a vector database using approximate nearest-neighbor search; RAG is the standard approach for building knowledge-base assistants, internal search tools, and document Q&A systems

Retrieval-Augmented Generation (RAG), WikiWalls Glossary illustration

« Back to Glossary Index

Retrieval-Augmented Generation (RAG) is a system design pattern in which a retrieval step fetches relevant documents or passages from an external store, and those documents are concatenated with the user query before being passed to a generative language model. The model can then produce answers grounded in retrieved evidence rather than relying solely on its parametric training knowledge.

How it works

At query time, the user’s input is converted into an embedding and compared against pre-indexed document embeddings in a vector database using approximate nearest-neighbor search. The top-k most semantically similar chunks are retrieved, inserted into the prompt as context, and the model generates a response conditioned on that evidence. The knowledge base can be updated independently of the model.

Key facts

  • Components: Chunking, embedding, vector indexing, retrieval, reranking, and generation are the core pipeline stages.
  • Context window constraint: Retrieved chunks must fit within the model’s context window, requiring chunk size tuning.
  • Hybrid search: Combining dense vector search with BM25 keyword search often outperforms either approach alone.
  • Reranking: A cross-encoder reranker rescores retrieved candidates to improve relevance before insertion into the prompt.

For builders

RAG is the standard approach for building knowledge-base assistants, internal search tools, and document Q&A systems. It avoids the cost and complexity of fine-tuning while keeping answers current. Investing in chunking strategy, metadata filtering, and reranking yields larger quality improvements per engineering hour than upgrading the base model for most retrieval tasks.

Sources

« Back to Definition Index
Administrator · 41 published guides · Joined 2016

Welcome to wikiwalls

The WikiWalls Journal · Free, weekly

One careful fix in your inbox each Wednesday.

No affiliate links inside the diagnosis. No sponsored "top 10". One careful fix per week — unsubscribe in one click.

No tracking pixels · No spam · Edited by a human.