In depth
RAG addresses two LLM weaknesses at once: training cutoffs (the model doesn't know recent events) and private data (the model wasn't trained on your internal docs). By retrieving relevant snippets at query time and adding them to the prompt, you get answers grounded in current, proprietary knowledge.
The typical RAG pipeline: (1) chunk documents, (2) embed each chunk into a vector, (3) store vectors in a vector database, (4) at query time, embed the query and find the top-k most similar chunks, (5) inject those chunks into the LLM's prompt, (6) the LLM answers using them as context.
MCP and RAG fit together naturally. A knowledge-base MCP (e.g. Notion, Confluence, Meilisearch, Pinecone) exposes `search` or `query` tools. Instead of hardcoding retrieval logic, the LLM agent calls these tools when it needs context. The LLM decides WHEN to retrieve and WHAT to ask — making RAG more flexible than hardcoded pipelines.
Agentic RAG (where the agent iterates: search → evaluate → search again) outperforms classical single-shot RAG for complex queries. MCP is the ideal substrate for agentic RAG.