In depth
Sampling is one of MCP's most powerful features. Normally, LLMs are the client: they call server tools. Sampling inverts this — the server sends a `sampling/createMessage` request back up to the client, which forwards it to its LLM. The server gets a completion back and uses it to shape its next response.
Use cases: a server receives a user query and wants to classify its intent before choosing which data source to query. A server wants to summarize a long document before returning it. A server wants to validate user input conversationally.
Sampling requires the client to declare the `sampling` capability during initialize. The host typically shows the user a prompt before forwarding sampling requests — this preserves user control over LLM invocation and cost.
Sampling makes MCP servers composable mini-agents. Instead of every server being dumb I/O, servers can reason using the client's LLM, opening up sophisticated multi-hop workflows.