Last updated: April 5, 2026 · Model Architecture · by Daniel Ashford

What is RAG (Retrieval-Augmented Generation)?

QUICK ANSWER

A technique that gives LLMs access to external documents to improve accuracy and reduce hallucination.

Definition

Retrieval-Augmented Generation (RAG) is an architecture that combines an LLM with an external knowledge retrieval system. Instead of relying solely on training data, RAG retrieves relevant documents from a knowledge base and includes them in the prompt, allowing grounded responses.

How It Works

A typical RAG pipeline: (1) the user query is converted into a vector embedding, (2) similar documents are retrieved from a vector database, and (3) retrieved documents are inserted into the LLM prompt as context. RAG dramatically reduces hallucination because the model cites specific sources rather than generating from memory.

Example

A support chatbot using RAG: when a customer asks "What is your refund policy?", the system retrieves the actual refund policy document and includes it in the prompt, so the answer is based on the real policy.

Related Terms

Embeddings

Numerical representations of text that capture semantic meaning — used in search and RAG systems.

Vector Database

A specialized database for storing and searching embeddings — the backbone of RAG systems.

Context Window

The maximum amount of text an LLM can process in a single request.

Hallucination

When an LLM generates plausible-sounding but factually incorrect information.

See How Models Compare

Understanding rag (retrieval-augmented generation) is important when choosing the right AI model. See how 12 models compare on our leaderboard.

View Leaderboard →Our Methodology

← Browse all 47 glossary terms

Daniel Ashford

Founder & Lead Evaluator · 200+ models evaluated