In depth
An LLM is a statistical model that learns patterns in language by predicting the next token over massive training datasets (typically trillions of tokens scraped from the web, books, code, and more). Once trained, the model can generate text, answer questions, write code, and — with the right prompting — reason through complex problems.
Modern LLMs fall into two categories: **base models** (raw next-token predictors, e.g. Llama 3 base) and **instruction-tuned chat models** (post-trained with supervised fine-tuning + RLHF to follow instructions and be helpful, e.g. Claude, GPT-4). Most user-facing apps use chat models.
Leading models in 2026: Anthropic Claude (Opus, Sonnet, Haiku), OpenAI GPT (4o, 4.1, o1/o3 reasoning models), Google Gemini (2.5 Pro, Flash), Meta Llama (3.3, 4), Mistral, DeepSeek. Each has distinct strengths — Claude excels at writing and tool use; GPT-4 at coding; Gemini at long context; Llama at on-device.
MCP is model-agnostic. Any LLM that supports tool use (function calling) can consume MCP-provided tools via a compatible host. The server doesn't know or care which model is on the other end.