Home Glossary
HomeGlossaryContext Window
MCP Glossary

Context Window

TL;DR

A context window is the maximum number of tokens (roughly, words + punctuation chunks) an LLM can process in a single inference. It includes the system prompt, conversation history, tool schemas, retrieved docs, and the new user message. Modern models range from 128K (GPT-4) to 1M+ (Claude Opus 4 1M, Gemini 1.5 Pro).

In depth

The context window is the LLM's short-term memory. Everything the model needs for the current response — system prompt, chat history, tool definitions, tool call results, retrieved documents — must fit inside this window. Exceed it and you hit a hard error or silently lose the earliest messages.

Context window size has exploded in 2024-2025. GPT-3.5 shipped with 4K tokens; GPT-4o has 128K; Claude Opus 4 supports 1M tokens for long-context; Gemini 1.5 Pro and 2.5 Pro go up to 1M-2M. Larger windows enable whole-codebase reasoning, long research reports, and multi-document synthesis.

But larger isn't free. Cost scales linearly with tokens (input and output priced separately). Latency grows with context size. And attention degrades at the edges — models often have 'lost in the middle' issues where content in the center of a long context gets ignored.

Managing context is a core skill for agent builders. Techniques include: summarize old turns, use RAG to inject only relevant docs, paginate tool results, and checkpoint state externally. MCP helps by letting servers expose large data incrementally rather than dumping it all into context.

Examples

  • 1
    Claude Opus 4 with 1M-token context analyzing an entire monorepo
  • 2
    Gemini 2.5 Pro with 2M tokens summarizing a full book series
  • 3
    A simple chatbot using GPT-4o Mini with 128K tokens
  • 4
    A RAG system injecting the top-5 relevant docs (2K tokens) into context
  • 5
    Claude Code hitting its context limit and auto-compacting old messages

What it's NOT

  • ✗Context window is NOT the same as memory — it's ephemeral, lost after each inference.
  • ✗Bigger context is NOT always better — costs grow, latency grows, attention degrades.
  • ✗Context window is NOT measured in characters or words — it's in tokens (roughly 0.75 words).
  • ✗Long context does NOT replace RAG — even 1M tokens isn't enough for large knowledge bases.

Related terms

Large Language Model (LLM)Prompt EngineeringRetrieval-Augmented Generation (RAG)Embedding

See also

  • Claude Long Context

Frequently asked questions

Which model has the biggest context window?

As of 2026: Gemini 2.5 Pro (up to 2M tokens), Claude Opus 4 (1M), Claude Sonnet 4 (200K-1M), GPT-4o (128K).

What happens when I exceed the context window?

Most APIs return an error. Some hosts auto-truncate the oldest messages to fit. Claude Code and Cursor auto-compact.

How do tokens relate to words?

Roughly 1 token ≈ 0.75 words (English). Code is denser — about 1 token per character. Non-English languages have different ratios.

Build with MCP

Browse 300+ MCP servers, explore recipes, or continue learning the MCP vocabulary.

Browse MarketplaceAll terms