- Is this a runtime system or a wire protocol spec? β It's a protocol specification (like HTTP or LSP) with reference SDKs. We're designing the protocol architecture, transport mechanisms, and the system architecture of hosts/clients/servers that implement it.
- Local servers only, or remote too? β Both. Local servers (filesystem, git) use STDIO transport. Remote servers (Slack, Sentry, Asana) use Streamable HTTP. The protocol is transport-agnostic.
- What can servers expose? β Three primitives: Tools (model-invoked functions), Resources (data the app reads), and Prompts (reusable templates). Plus Sampling (server requests the LLM to generate text).
- Auth model? β OAuth 2.1 for remote servers. Local servers inherit host process permissions. This is critical β a server shouldn't access data the user hasn't authorized.
- Stateful or stateless? β Stateful. Sessions are initialized with capability negotiation, maintained throughout, and explicitly terminated. This differs from REST (stateless) and is closer to LSP (Language Server Protocol).
- Who are the participants? β Three roles: Host (the AI application, e.g., Claude Desktop), Client (a protocol handler per connection, embedded in the host), Server (a program exposing tools/resources).
| In Scope | Out of Scope |
|---|---|
| Protocol spec: JSON-RPC 2.0 message format | The LLM inference engine itself |
| Three primitives: Tools, Resources, Prompts | Agent orchestration frameworks (LangChain, CrewAI) |
| Capability negotiation & session lifecycle | Training data or fine-tuning pipelines |
| Two transports: STDIO (local) + Streamable HTTP (remote) | Model routing / gateway (not MCP's job) |
| Auth: OAuth 2.1 for remote, process-level for local | Billing, rate limiting, API key management |
| Sampling: server-initiated LLM requests | Multi-agent coordination protocols |
- UC1: Tool discovery & invocation β AI host connects to an MCP server (e.g., Sentry), discovers available tools (
tools/list), then the model decides to calltools/callwith arguments to create a Sentry issue. The result flows back to the model. - UC2: Resource access β An IDE (Cursor) connects to a filesystem MCP server, lists available resources (
resources/list), then reads a file's content (resources/read) to inject into the LLM's context window. - UC3: Prompt templates β A code review server exposes a "review-pull-request" prompt via
prompts/get. The host renders this template with user arguments and sends it to the LLM as a structured interaction. - UC4: Sampling (server β model) β An MCP server needs the LLM to summarize data it's fetched. It sends
sampling/createMessageto the client, which forwards to the host's LLM. The response flows back to the server. - UC5: Multi-server host β Claude Desktop connects to filesystem, GitHub, and Slack MCP servers simultaneously. Each connection is a separate client instance. The LLM can use tools from all servers in a single conversation turn.
- Transport-agnostic: The same protocol messages work over STDIO (local pipe), Streamable HTTP (remote), or future transports (WebSocket, gRPC). The data layer is decoupled from the transport layer.
- Capability negotiation: Not every server supports every feature. Clients and servers declare capabilities at initialization. A client never calls
tools/callon a server that didn't declaretoolscapability. - Human-in-the-loop: The host MUST control what the model can do. Tool calls require host approval (explicit or policy-based). The model never directly talks to a server β the host mediates every interaction.
- Backward compatible: As the spec evolves, older servers must interoperate with newer clients. Capability negotiation enables graceful degradation β unknown capabilities are ignored, not errors.
- Low latency for local servers: STDIO transport adds near-zero overhead. A local filesystem read should complete in single-digit milliseconds, not be bottlenecked by the protocol.
- Secure by default for remote servers: OAuth 2.1 mandatory. No ambient authority β a remote server can only access what the user explicitly granted via OAuth scopes.
| Requirement | Decision | Why (and what was rejected) | Consistency |
|---|---|---|---|
| Structured RPC with bidirectional messaging | JSON-RPC 2.0 (not REST, not gRPC) | Supports requests, responses, AND notifications (fire-and-forget). REST is request-response only, can't do server-initiated messages. gRPC requires protobuf compilation β too heavy for a pluggable ecosystem. | β |
| Local servers: zero-setup, sub-ms latency | STDIO transport (stdin/stdout pipes) | No network stack. Host spawns server as child process, communicates via pipes. No ports, no TLS, no auth needed. Perfect for local tools (filesystem, git). HTTP would add unnecessary overhead and port conflicts. | β |
| Remote servers: internet-scale, multi-client | Streamable HTTP (not SSE, not WebSocket) | Supports both streaming (long-running tool calls) and request-response (simple queries). SSE is serverβclient only. WebSocket requires persistent connection (proxy/firewall issues). Streamable HTTP works through any HTTP infrastructure. | β |
| Servers differ in capabilities | Capability negotiation at init | Not all servers support all features. Client and server exchange supported capabilities during initialize. No assumptions β a client never calls tools/call if server didn't declare tools capability. This enables graceful evolution. | β |
| Model must not have unchecked access | Host-mediated architecture | The LLM never talks to servers directly. The host intercepts every tool call and can approve/deny/modify. This is the security boundary β the host enforces policy, not the protocol. Without this, a prompt injection could invoke arbitrary tools. | CP |
| Remote access to user data (Slack, GitHub) | OAuth 2.1 (not API keys, not custom auth) | Standard, auditable, revocable. User grants specific scopes to specific servers. API keys are ambient authority β can't scope or revoke per-server. OAuth enables consent screens showing exactly what the server will access. | CP |
Host APPLICATION
- Contains the LLM and the user interface
- Creates and manages multiple Client instances
- Mediates ALL interactions: LLM β Server
- Enforces security policy (approve/deny tool calls)
- Assembles context from multiple servers into LLM prompt
Client PROTOCOL
- 1:1 connection to a single MCP server
- Maintains session state (capabilities, subscriptions)
- Handles JSON-RPC serialization/deserialization
- Manages transport lifecycle (connect, reconnect, close)
- Multiple clients per host (one per server connection)
Server TOOL PROVIDER
- Exposes primitives: tools, resources, prompts
- Local (STDIO): spawned as child process by host
- Remote (HTTP): runs on provider's infrastructure
- Stateful per-session (knows who's connected)
- Can request sampling (ask the LLM to generate text)
Transport Layer WIRE
- STDIO: stdin/stdout pipes for local servers
- Streamable HTTP: POST for requests, GET+SSE for streaming
- Both carry identical JSON-RPC messages
- Transport is pluggable β future: WebSocket, gRPC
- Handles framing, reconnection, session binding
| Phase | Messages | What Happens |
|---|---|---|
| Initialize | initialize request + response | Client sends protocol version + its capabilities. Server responds with its capabilities + server info. Both sides now know what the other supports. |
| Initialized | notifications/initialized | Client confirms initialization complete. Server can now start sending notifications (resource changes, etc.). |
| Active | Requests, responses, notifications | Bidirectional: client calls tools/list, tools/call, resources/read. Server sends notifications (resource updated). Server can request sampling. |
| Shutdown | Transport close | Client closes transport connection. Server cleans up session state. No explicit "shutdown" message β transport closure IS the signal. |
| Primitive | Control | Discovery | Execution | Example |
|---|---|---|---|---|
| Tools | Model-controlled | tools/list | tools/call | create_issue, send_message, run_query. The LLM decides based on the user's intent. |
| Resources | Application-controlled | resources/list | resources/read | file:///src/main.ts, db://users/schema. The host fetches these for context, not the model. |
| Prompts | User-controlled | prompts/list | prompts/get | "review-pull-request" template. User selects from a menu, host renders with arguments. |
| Sampling | Server-initiated | N/A (declared as capability) | sampling/createMessage | Server asks the host's LLM to summarize fetched data. Reverse direction: server β client β LLM. |
readOnlyHint: true (no side effects), destructiveHint: true (deletes data), idempotentHint: true (safe to retry), openWorldHint: true (interacts with external entities). Hosts use these to auto-approve safe tools and require confirmation for destructive ones. A tool annotated as read-only can be auto-approved; a tool marked destructive gets a confirmation dialog.| Property | STDIO | Streamable HTTP |
|---|---|---|
| Use case | Local servers (filesystem, git, database) | Remote servers (Slack, Sentry, Notion) |
| Connection | Host spawns server as child process. stdin/stdout pipes. | Client sends HTTP requests to server URL. Server can stream back via SSE. |
| Latency | Sub-millisecond (pipe IPC) | 50-2000ms (network round-trip) |
| Session | Implicit: one process = one session | Explicit: Mcp-Session-Id header binds requests to session state |
| Multi-client | No β single client per server process | Yes β one server serves many clients (each with own session ID) |
| Auth | Inherits host process permissions (user's filesystem access) | OAuth 2.1: authorization code flow with PKCE |
| Framing | Newline-delimited JSON on stdout | HTTP request/response + SSE event stream |
- Standardized server discovery & registry: A public registry where MCP servers publish metadata (capabilities, auth requirements, pricing). Hosts search for "email" and find verified MCP servers for Gmail, Outlook, etc. Like npm for AI tools. Includes trust scores and security audits.
- Multi-agent orchestration: Multiple LLM agents (planner, coder, reviewer) sharing a pool of MCP servers. Agents coordinate via a shared MCP session or pass tool results between each other. The protocol currently assumes a single LLM β multi-agent needs coordination primitives.
- Streaming tool results: Some tools produce large outputs over time (monitoring dashboards, log tails). Instead of waiting for the complete result, stream partial results to the LLM as they arrive. The LLM can begin reasoning before the tool finishes β reduces perceived latency.
- Composable server pipelines: Chain servers: output of one tool feeds as input to another. "Read a file (filesystem server) β analyze it (code intelligence server) β create a PR (GitHub server)." Currently, the host orchestrates this manually; a pipeline primitive could formalize it.
- Client-side caching: Cache resource responses and tool results at the client level. If the LLM asks for the same file twice in a session, serve it from cache. Resource subscriptions already notify on changes β the cache can invalidate on notification.
- Elicitation & interactive prompting: Servers can request additional input from the user mid-tool-execution: "Which Slack workspace should I search?" Currently the server must return, the LLM asks the user, and the user re-invokes. A formalized elicitation primitive would make this a single round-trip.
How does MCP differ from OpenAI's function calling? Why is a protocol needed on top of it?
Function calling is a model-level feature: the LLM outputs structured JSON saying "I want to call function X with arguments Y." But it doesn't define how to discover those functions, how to execute them, how to authenticate, or how to manage the connection to the system that implements them. The developer must write all that glue code β custom for each tool, each model, each application. MCP standardizes everything around the function call: discovery (tools/list), execution (tools/call), session management (initialize/shutdown), transport (STDIO/HTTP), and auth (OAuth 2.1). Think of it this way: function calling is the LLM saying "I want to call create_issue." MCP is the protocol that makes that call actually reach the Sentry API, authenticated, with the right arguments, and returns the result. Without MCP, every developer writes their own version of this plumbing. With MCP, they implement it once.
Why is MCP stateful? Couldn't you make it stateless like REST for simplicity?
Statefulness is required for three features: (1) Capability negotiation: the client learns the server's capabilities once at initialization. With REST, you'd need to send capabilities with every request or make a separate discovery call before every interaction β wasteful. (2) Subscriptions: a client can subscribe to resource changes and receive push notifications. This requires a persistent session. Stateless REST can't do server-initiated messages. (3) Server-side session state: a database MCP server might maintain a transaction across multiple tool calls. "Begin transaction β INSERT β SELECT β COMMIT" is inherently stateful. The tradeoff: stateful servers are harder to scale horizontally (session affinity required) and harder to recover from crashes (session state is lost). MCP accepts this tradeoff because the interaction model β an AI agent using tools in a conversation β is inherently session-scoped.
What prevents a malicious MCP server from tricking the LLM into harmful actions?
This is the biggest security challenge in MCP and it's addressed through defense in depth, not a single mechanism. Layer 1: Host mediation β the LLM never communicates directly with servers. Every tool call goes through the host controller, which can approve, deny, or modify it. A policy like "require human confirmation for any tool that sends data externally" catches most attacks. Layer 2: Tool annotations β servers declare whether tools are read-only, destructive, or external-facing. The host uses these to calibrate approval requirements. Layer 3: Tool result handling β results from servers are injected as tool-result content, not system instructions. Well-trained LLMs treat tool results as data, not instructions. Layer 4: Server isolation β tools from one server can't directly access another server's resources. The honest answer: prompt injection via tool results is an unsolved problem in the field. MCP's architecture minimizes the attack surface (host mediation is the key), but a sufficiently clever injection in a tool result could still influence the LLM. This is an active area of research.
How does MCP relate to LSP? What lessons were borrowed?
MCP is architecturally inspired by LSP (Language Server Protocol), which solved the same MΓN problem for IDE tooling. Before LSP: every IDE (VS Code, IntelliJ, Vim) needed a custom plugin for every language (Python, Rust, Go). LSP created a universal protocol: one language server per language, one LSP client per IDE. M+N instead of MΓN. MCP borrows: (1) JSON-RPC 2.0 as the wire format β proven, simple, language-agnostic. (2) Capability negotiation at initialization β the same pattern where client and server exchange what they support. (3) Stateful sessions with explicit lifecycle β initialize, active, shutdown. (4) The client-server-host separation of concerns. What MCP adds beyond LSP: OAuth authentication (LSP servers are local-only), Streamable HTTP transport (LSP only uses STDIO), tool annotations (LSP doesn't have action safety metadata), and sampling (servers requesting LLM inference β no LSP equivalent). The success of LSP (adopted by virtually every IDE and language) is the strongest evidence that this architectural pattern works for standardizing MΓN ecosystems.
How would a popular remote MCP server (like Notion) scale to millions of concurrent sessions?
MCP sessions are stateful (capabilities, auth context, subscriptions), which means the server must maintain per-session state. At millions of sessions, this is a stateful server scaling challenge. Two approaches: (1) Session affinity: the load balancer hashes the Mcp-Session-Id header to route all requests for a session to the same server instance. Session state lives in the server's memory. Simple, but a server failure loses all its sessions (clients must re-initialize). (2) Externalized state: store session state in Redis or a similar shared store. Any server instance can handle any request β truly stateless servers with shared session state. More complex but more resilient. In practice, MCP sessions are lightweight: a capabilities object, an auth token, and a list of subscriptions. This fits easily in a few KB per session. A single Redis cluster holding 10M sessions Γ 5KB = 50GB β very feasible. The heavy part is the actual tool execution (calling Notion's API), which is the server operator's existing scaling problem β MCP doesn't make it worse.