Home Glossary
HomeGlossaryRetrieval-Augmented Generation (RAG)
MCP Glossary

Retrieval-Augmented Generation (RAG)

TL;DR

Retrieval-Augmented Generation (RAG) is a technique where an LLM fetches relevant information from an external knowledge source at query time and injects it into the prompt before generating an answer. RAG keeps LLMs up to date with proprietary or fresh data without retraining.

In depth

RAG addresses two LLM weaknesses at once: training cutoffs (the model doesn't know recent events) and private data (the model wasn't trained on your internal docs). By retrieving relevant snippets at query time and adding them to the prompt, you get answers grounded in current, proprietary knowledge.

The typical RAG pipeline: (1) chunk documents, (2) embed each chunk into a vector, (3) store vectors in a vector database, (4) at query time, embed the query and find the top-k most similar chunks, (5) inject those chunks into the LLM's prompt, (6) the LLM answers using them as context.

MCP and RAG fit together naturally. A knowledge-base MCP (e.g. Notion, Confluence, Meilisearch, Pinecone) exposes `search` or `query` tools. Instead of hardcoding retrieval logic, the LLM agent calls these tools when it needs context. The LLM decides WHEN to retrieve and WHAT to ask — making RAG more flexible than hardcoded pipelines.

Agentic RAG (where the agent iterates: search → evaluate → search again) outperforms classical single-shot RAG for complex queries. MCP is the ideal substrate for agentic RAG.

Examples

  • 1
    An internal Q&A bot searching Notion before answering employee questions
  • 2
    A coding assistant fetching up-to-date docs via Context7 MCP
  • 3
    A support bot retrieving similar past tickets before replying
  • 4
    Perplexity-style answers: web search → synthesize → cite
  • 5
    A legal-research agent querying a case-law vector DB

What it's NOT

  • ✗RAG is NOT a model — it's a pattern applied on top of any LLM.
  • ✗RAG is NOT only vector search — hybrid search (BM25 + vector) often outperforms vectors alone.
  • ✗RAG is NOT a replacement for fine-tuning — they solve different problems (up-to-date vs style/format).
  • ✗RAG does NOT eliminate hallucination — but reduces it substantially for factual queries.

Related terms

Vector DatabaseEmbeddingLarge Language Model (LLM)Prompt Engineering

See also

  • Original RAG Paper
  • Anthropic on Contextual Retrieval

Frequently asked questions

Do I need a vector database for RAG?

Commonly yes, but not always — full-text search (Elasticsearch, Meilisearch) works for many cases, and hybrid is best.

How does MCP fit into RAG?

MCP servers expose retrieval as tools (e.g. `search_notion`). The LLM calls them when needed — making RAG fully agent-driven.

What's agentic RAG?

The agent iterates: retrieve → evaluate → retrieve again if needed. Better quality than single-shot RAG for complex queries.

Build with MCP

Browse 300+ MCP servers, explore recipes, or continue learning the MCP vocabulary.

Browse MarketplaceAll terms