- What's the relationship to MCP? β Complementary. MCP connects an agent to its tools (databases, APIs). A2A connects agents to each other. An agent might use MCP to query a database, then use A2A to delegate a subtask to a specialist agent. They live at different layers of the stack.
- Are agents opaque to each other? β Yes β this is a core design principle. Agent A doesn't know Agent B's internal framework, LLM, memory, or tools. They interact only through the A2A protocol. This preserves IP and allows heterogeneous systems to collaborate.
- Synchronous or async interactions? β Both. Quick tasks complete synchronously in one request-response. Long-running tasks use a lifecycle (submitted β working β completed) with streaming updates via SSE or push notifications via webhooks.
- How do agents find each other? β Agent Cards: a JSON file at
/.well-known/agent.jsondescribing the agent's name, skills, endpoint, and auth requirements. Like a business card for AI agents. - Protocol bindings? β Three bindings: JSON-RPC over HTTP, gRPC, and plain HTTP/REST. The data model is defined in protobuf β bindings are generated from it. Protocol-agnostic data layer + concrete bindings.
- Enterprise-grade? β Yes. OAuth 2.0, OpenID Connect, signed agent cards, task-scoped tokens, audit trails. Backed by 150+ partners including Google, SAP, Salesforce, ServiceNow.
| In Scope | Out of Scope |
|---|---|
| Agent Card: discovery, capabilities, skills | Internal agent implementation (LLM, memory, tools) |
| Task lifecycle: submit β working β completed/failed | Agent orchestration frameworks (ADK, LangGraph internals) |
| Message exchange: text, files, structured data (Parts) | Shared agent memory or state synchronization |
| Streaming (SSE) + push notifications (webhooks) | Agent training or fine-tuning |
| Auth: OAuth 2.0, OpenID, API keys, signed cards | Billing, marketplace, agent monetization |
| Multi-binding: JSON-RPC, gRPC, HTTP/REST | In-process agent communication (same runtime) |
- UC1: Task delegation β A purchasing concierge agent discovers a pizza seller agent via its Agent Card, sends a message ("Order a large pepperoni"), and receives task updates as the seller agent processes the order. The agents may be built on entirely different frameworks (ADK vs CrewAI).
- UC2: Multi-turn collaboration β A travel agent delegates hotel booking to a hotel agent. The hotel agent asks a clarifying question ("Which dates?"). The travel agent relays the answer. Back-and-forth until the task completes. This requires stateful task tracking with the
input-requiredstate. - UC3: Long-running async task β A research agent submits a deep analysis request to a data agent. The data agent takes 10 minutes. The research agent subscribes via SSE for streaming progress updates, or registers a webhook for push notification on completion.
- UC4: Agent discovery β An enterprise orchestrator fetches Agent Cards from
/.well-known/agent.jsonfor all registered agents. It matches the user's request to the agent with the best-matching skill. No hardcoded agent registry β just standard HTTP discovery. - UC5: Multimodal exchange β A design agent sends an image artifact (PNG) to a QA agent for visual regression testing. The QA agent responds with a structured JSON report. A2A's Part system supports text, files, images, and structured data in a single message.
- Opacity: Agents are black boxes to each other. No sharing of internal prompts, memory, tools, or LLM choice. The protocol only exposes inputs (messages) and outputs (artifacts). This protects intellectual property and enables heterogeneous agent ecosystems.
- Framework-agnostic: An agent built with Google ADK can talk to one built with LangGraph, CrewAI, Semantic Kernel, or bare Python. The protocol is the contract, not the implementation.
- Task-oriented: Every interaction is structured as a Task with a lifecycle. Not free-form messaging β tasks have IDs, states, history, and artifacts. This enables tracking, auditing, and resumption.
- Multi-modal content negotiation: Agents declare what content types they accept and produce (text/plain, application/json, image/png). The client agent checks the server's capabilities before sending. No sending an image to a text-only agent.
- Enterprise security: OAuth 2.0, OpenID Connect, signed Agent Cards (cryptographic proof of identity), task-scoped short-lived tokens. Zero ambient authority.
- Backward compatible: Agent Cards include a
protocolVersion. Older clients gracefully degrade when encountering newer servers. Unknown fields are ignored, not errors.
/.well-known/agent.json provide a pull-based discovery mechanism. But in an enterprise with 50 internal agents, some form of registry or catalog becomes necessary β the protocol defines the card format but not the registry.| Requirement | Decision | Why (and what was rejected) | Consistency |
|---|---|---|---|
| Agent-to-agent communication (not agent-to-tool) | Task-oriented protocol (not tool-call protocol) | Agents are opaque peers, not transparent function providers. A "task" abstraction captures multi-turn, long-running, multi-modal collaboration. MCP's tool-call model treats the server as a function to invoke β A2A treats it as an autonomous collaborator. | β |
| Agents on different frameworks must interoperate | Protobuf-first data model + multiple bindings | Single authoritative protobuf definition generates JSON-RPC, gRPC, and REST bindings. Protobuf ensures type safety across languages. A JSON-only spec would lack the rigor for gRPC and typed SDK generation. | β |
| Discover agent capabilities before delegating | Agent Cards at /.well-known/agent.json | Standard HTTP discovery (like .well-known/openid-configuration). No central registry needed β each agent self-describes. A central directory would be a single point of failure and a governance bottleneck. | AP |
| Tasks range from instant to 30+ minutes | Sync (blocking) + SSE stream + webhook push | One-shot: message/send blocks until done. Streaming: message/stream returns SSE events. Async push: webhook URL in task config. Three patterns cover the full duration spectrum. Polling alone would waste bandwidth; SSE alone doesn't work for fire-and-forget. | Eventual |
| Rich data: text, JSON, images, files | Part-based content model with MIME types | Each message contains Parts (like MIME multipart). Each Part has a content type: text/plain, application/json, image/png, etc. Agents negotiate supported types via Agent Card. Text-only would exclude visual and structured data critical for enterprise workflows. | β |
| Cross-vendor auth in enterprise environments | OpenAPI-aligned security schemes | Agent Card declares supported auth: OAuth 2.0, OpenID Connect, API keys. Aligns with OpenAPI spec β enterprise infra already handles these. Custom auth would require new infrastructure at every org. | CP |
/.well-known/agent.json instead of a central registry? Decentralized discovery: each agent self-publishes its card at a well-known URL (same pattern as /.well-known/openid-configuration). No single registry to operate, no governance bottleneck, no single point of failure. The tradeoff: discovery requires knowing the agent's base URL first. In practice, enterprises maintain an internal catalog or directory that lists known agent URLs. The A2A protocol standardizes the card format and location β the "how do you find the URL in the first place" is left to the deployment environment (DNS, service mesh, marketplace).| State | Meaning | Transitions To |
|---|---|---|
| submitted | Task received, queued for processing | working, rejected |
| working | Agent actively processing the task | input-required, completed, failed, canceled |
| input-required | Agent needs more info from the client | working (after client sends more messages) |
| completed | Task finished successfully. Artifacts available. | (terminal) |
| failed | Task failed. Error details in messages. | (terminal) |
| canceled | Task canceled by client via tasks/cancel. | (terminal) |
tasks/get with the task ID to retrieve the current state. Free-form messaging loses this on disconnect. (3) Orchestration: a client agent managing 5 concurrent tasks to 5 different server agents needs structured tracking. Task IDs + states enable this. (4) Billing: task completion events are natural billing anchors. The tradeoff: more structured than simple chat. But A2A agents aren't chatting β they're collaborating on work.| Pattern | Method | When | How |
|---|---|---|---|
| Blocking (sync) | message/send | Quick tasks (<5s). Simple Q&A, lookups. | HTTP POST. Response body contains the final task state + artifacts. Client blocks until done. |
| Streaming (SSE) | message/stream | Long tasks with progress. Research, analysis. | HTTP POST returns SSE event stream. Server sends task_status and task_artifact events as they occur. Stream closes on completion. |
| Push (webhook) | message/send + pushNotification config | Very long tasks. Client can't hold connection. | Client provides a webhook URL in the task config. Server POSTs status updates to the webhook. Fully async β client's HTTP connection closes immediately. |
| Auth Mechanism | When | How It Works |
|---|---|---|
| OAuth 2.0 | Cross-org agent communication | Client obtains access token via authorization code flow. Token scoped to specific skills. Short-lived (minutes). Standard enterprise SSO integration. |
| OpenID Connect | Identity verification | Server verifies the client agent's identity claim. "Is this really the TravelCorp orchestrator?" ID tokens prove identity; access tokens prove authorization. |
| API Keys | Internal/trusted agents | Simple bearer token for intra-org communication. Less secure but lower friction for trusted internal agents. |
| Signed Agent Cards | Agent identity integrity | Agent Card is cryptographically signed. Client verifies the signature to ensure the card hasn't been tampered with. Prevents "agent impersonation" β a malicious server pretending to be a legitimate one. |
tasks/get with the task ID to check status. If the server is unreachable, client retries with exponential backoff. If server comes back, it may resume from its last state (if it persisted task state). If the task is lost, client receives an error and can re-submit. The task ID enables idempotent resumption.tasks/get to retrieve current state + history. The task's stateTransitionHistory capability ensures no updates are lost. Client can also re-subscribe via tasks/resubscribe to resume streaming from the current point. The task ID is the reconnection anchor.protocolVersion field enables compatibility checks.defaultInputModes (e.g., ["text/plain"]). Client checks before sending. If a mismatch occurs, server returns a JSON-RPC error: "Unsupported content type." This is content negotiation failure β caught at protocol level, not at the application level.| Dimension | MCP | A2A |
|---|---|---|
| What it connects | Agent β Tools and data sources | Agent β Other agents |
| Server opacity | Transparent: server exposes tool schemas | Opaque: server's internals are hidden |
| Interaction model | Tool call: invoke function, get result | Task collaboration: multi-turn, stateful |
| State | Stateful session (capability negotiation) | Stateful tasks (lifecycle, history, artifacts) |
| Discovery | Manual config (no standard discovery) | Agent Cards at /.well-known/agent.json |
| Transport | STDIO (local) + Streamable HTTP (remote) | JSON-RPC + gRPC + HTTP/REST (remote only) |
| Auth | OAuth 2.1 for remote | OAuth 2.0, OpenID, API keys, signed cards |
| Content | Tool result (text, images) | Multimodal Parts (text, files, structured data, images) |
| Typical flow | Host β Client β Server β Tool β Response | Client Agent β Server Agent β Task β Artifacts |
- Agent marketplace & registry: Google Cloud already launched an AI Agent Marketplace. Standardized A2A Agent Cards + signed identity + skill metadata enable a searchable catalog. Enterprises browse, evaluate, and connect agents like npm packages.
- Dynamic UX negotiation: An agent adds audio or video capability mid-conversation. "Let me show you a screen recording of the bug" β switches from text to video Part. Requires runtime capability upgrade negotiation within an active task.
- Hierarchical multi-agent orchestration: Agent A delegates to Agent B, which sub-delegates to C and D. Task dependency graphs with parallel execution and fan-out/fan-in. Requires formalized sub-task linking and progress aggregation.
- QuerySkill(): Dynamically check if an agent can handle an unanticipated skill. "Can you translate this to Japanese?" without that skill being in the Agent Card. The agent evaluates at runtime and responds with confidence score.
- Latency-aware agent selection: Twilio's extension: agents broadcast latency metrics. The orchestrator routes to the most responsive agent. Enables SLA-driven agent selection and graceful degradation (play a filler prompt if all agents are slow).
- Agent-to-agent trust chains: Transitive trust: "I trust Agent A. Agent A vouches for Agent B. Therefore I conditionally trust Agent B." Formalized trust delegation with signed attestations. Critical for enterprise-scale agent meshes.
Why do we need A2A when we already have MCP? Can't agents just expose themselves as MCP servers?
You could expose an agent as an MCP tool, but you'd lose three critical capabilities. (1) Opacity: MCP tools expose their full schema β input parameters, output format. An A2A agent is a black box: you send it a natural language request and it figures out how to fulfill it. You don't need to know its internal tool signatures. (2) Multi-turn collaboration: MCP is request-response (call a function, get a result). A2A supports stateful tasks where the server agent can ask for clarification, provide progress updates, and deliver artifacts incrementally. A hotel booking agent that asks "Which room type?" can't do that with a single MCP tool call. (3) Long-running tasks: MCP tool calls are expected to complete quickly. A2A tasks can run for minutes with SSE streaming. The fundamental mental model differs: MCP treats the server as a transparent tool. A2A treats it as an autonomous collaborator. Both are needed β use MCP for your own tools, A2A for other agents.
How does A2A handle an agent needing clarification mid-task?
This is the input-required task state β one of A2A's most important design features. When a server agent needs more information, it transitions the task to input-required and includes a message explaining what it needs ("I found 3 hotels. Which one do you prefer?"). The client agent receives this state, processes it (maybe relays the question to the human user or makes a decision itself), and sends another message to the same task. The server receives the clarification and transitions back to working. This loop can repeat multiple times. The task ID is the anchor β all messages and state transitions are tracked against it. This is fundamentally different from MCP, where a tool call either succeeds or fails β there's no mechanism for the tool to ask a follow-up question. A2A's task lifecycle was explicitly designed for this kind of collaborative, conversational workflow.
What stops a malicious agent from impersonating a trusted one?
Three layers of defense: (1) HTTPS: the Agent Card is fetched over TLS from the agent's domain. You trust the card because you trust the domain's TLS certificate. This prevents network-level MitM. (2) Signed Agent Cards (v0.3): the card itself is cryptographically signed. Even if someone copies the card to a different URL, the signature verification fails because it's bound to the original domain/key. This prevents card-level impersonation. (3) OAuth 2.0: even if an attacker tricks a client into connecting to the wrong server, the OAuth flow redirects through the legitimate authorization server β the attacker can't obtain valid tokens. In practice, enterprise deployments add a fourth layer: an internal agent registry (trusted catalog) that only lists verified agents. The client only communicates with agents in the registry. Unknown agents are rejected regardless of their card contents.
Why protobuf as the normative data model instead of JSON Schema?
Protobuf gives three advantages over JSON Schema: (1) Multi-binding generation: a single .proto file generates JSON-RPC types, gRPC stubs, and REST types. JSON Schema can't generate gRPC bindings. (2) Type safety: protobuf has strict typing with required fields, enums, and oneof unions. JSON Schema validation is looser and more error-prone. (3) Canonical serialization: signed Agent Cards require deterministic serialization for signature verification. Protobuf has well-defined canonical encoding; JSON serialization is non-deterministic (key ordering varies). The tradeoff: protobuf is less human-readable than raw JSON. But developers interact with SDKs (Python, TypeScript), not raw proto. The proto file is for machines (code generation, validation), the JSON representation is for humans (debugging, Agent Cards). A2A publishes both β proto is normative, JSON is derived.
How would you design a system where 50 enterprise agents need to discover and communicate with each other?
Layer 1: Each agent publishes its Agent Card at its own /.well-known/agent.json endpoint. This is the A2A standard. Layer 2: An internal Agent Registry (like a service catalog) crawls all agent endpoints, fetches their cards, and indexes them by skill, domain, and trust level. This isn't part of the A2A spec but is necessary for enterprise scale. Layer 3: An Orchestrator Agent queries the registry: "Which agent can handle hotel bookings?" The registry returns matching Agent Cards ranked by capability match and past performance. Layer 4: The orchestrator authenticates via the card's declared OAuth scheme and delegates the task. For auth at scale: a centralized identity provider (Okta, Azure AD) issues tokens. All 50 agents trust the same IdP. Agent-to-agent auth becomes "present a valid token from our IdP." For observability: distributed tracing (OpenTelemetry) across all A2A task delegations. Every task carries a trace ID. You can visualize the full delegation chain: user β orchestrator β hotel agent β payment agent.