Skip to content

MCP/A2A/Skill/Agent Architecture

Understanding the components of AI-driven development infrastructure and organizing their roles and relationships.

Positioning of This Document

Where the Vision (01-vision) defines "why" and Authoritative Reference Sources (02-reference-sources) defines "what," this document defines "how to structure it." It bridges the gap from philosophical foundations to actionable system design — structuring without destroying the underlying philosophy.

Meta Information
What this chapter establishesThe three-layer separation of Agent / Skills / MCP and the responsibility boundaries of each layer
What this chapter does NOT coverPattern selection criteria (→04), constraint mitigation (→05), Doctrine Layer details (→07)
Dependencies01-vision (design philosophy), 02-reference-sources (reference source framework)
Common misuseConfusing the three layers with physical deployment topology. The three layers represent separation of responsibilities, not deployment configuration

Layer Structure Overview

The architecture is organized into four distinct layers, each with specific responsibilities, as shown in the following diagram:

Doctrine Layer — "Constitutional Judgment Criteria"

These three layers define what AI knows and what AI can do. The question of on what basis AI judges and decides is addressed by the Doctrine Layer. The Doctrine Layer is not merely a system prompt — it is constitutional judgment criteria that governs all three layers through shared objectives, constraints, and judgment criteria.

The Responsibility Shift Model defined in the Vision (design-time: human / execution-time: agent / structural constraints: system) and the two-layer verification structure (guardrails + evaluation pipelines) are applied to each layer through this Doctrine Layer.

Layer Responsibilities

Each layer has distinct ownership and responsibility areas:

LayerResponsibilityOwnsExamples
AgentOrchestration, decision-makingTask flowClaude Code, Cursor
SkillsDomain knowledge, guidelinesBest practicesSOLID principles, translation guidelines
MCPExternal connectivityTool definitionsdeepl-mcp, rfcxml-mcp

About This Document

AI-driven development involves multiple components, and correctly understanding their roles and relationships is key to efficient development. This document organizes four main concepts: MCP (tool connectivity), A2A (agent-to-agent communication), Skill (static knowledge), and Custom Sub-agents (role specialization).

When you are unsure about "What should be implemented as MCP?", "When is a Skill sufficient?", or "When should I use a sub-agent?", refer to this document to make the appropriate choice.

Overall Architecture

The following diagram shows how all components interact within the complete system architecture:

MCP Three-Layer Structure

Host / Client / Server

MCP is built on a three-layer architecture, where communication flows through the client layer:

LayerRoleExampleDeveloper Involvement
HostUI, session managementClaude Code, Cursor, VS CodeConsumer
ClientProtocol processing, server managementBuilt into HostUsually not concerned
ServerTool/resource provisionrfcxml-mcp, deepl-mcpProvider

Why You Don't Need to Worry About the Client

For most developers, the client layer operates transparently as part of the host:

Typical development flow:
1. Create an MCP Server (e.g., rfcxml)
2. Add it to Claude Code configuration
3. Claude Code operates as a built-in Client
4. Tools become available

→ The Client is embedded in the Host and
   functions as a black box

MCP and A2A: Separation of Concerns

Protocol Differences

While MCP and A2A serve different purposes, they are complementary and address different communication needs:

ItemMCPA2A
Led byAnthropicGoogle → Linux Foundation
PurposeTool connectivityAgent-to-agent communication
Connects toMCP Server (tools)Other agents (including third-party)
ContextCan share with parent agentCompletely isolated
OwnerSelfSelf or others

Official Recommendation

Build with ADK, equip with MCP (tools), communicate with A2A (agents)

MCP = Using hands (tools)
A2A = Collaborating with others (agents)

Custom Sub-agents

What is a Sub-agent?

An AI assistant specialized for specific tasks that can be defined within Claude Code.

Location:
├── Project: .claude/agents/xxx.md (Priority: High)
└── User:    ~/.claude/agents/xxx.md (Priority: Low)

Definition Format

Sub-agents are defined using a simple markdown format that specifies their role, capabilities, and instructions:

markdown
name: rfc-specialist
description: Expert in RFC specification verification and validation
tools: rfcxml:get_rfc_structure, rfcxml:get_requirements
model: sonnet

You are an expert in RFC specifications.
Use only the rfcxml tools.

Additional Definition Example

Here is a more practical sub-agent that combines multiple MCPs and Skills:

markdown
name: compliance-checker
description: Expert agent for legal and technical compliance checking
tools: hourei:find_law_article, rfcxml:get_requirements, rfcxml:validate_statement
model: sonnet

You are an expert in legal and technical specification compliance checking.

1. Retrieve legal requirements via hourei-mcp
2. Retrieve technical requirements via rfcxml-mcp
3. Map both and report compliance status

Always cite sources (legal article numbers, RFC section numbers).

Sub-agent Positioning

The following diagram illustrates where sub-agents sit within the Claude Code architecture:

Important: Sub-agents are not a "replacement" for the MCP Client, but rather a "higher layer"

  • Sub-agent = Defines "what to do" (role, procedures)
  • MCP Client = Implements "how to connect" (protocol processing)

Skill

What is a Skill?

Static knowledge and guidelines that can be referenced in Claude Code. Skills are stored in the following locations:

Location:
├── Project: .claude/skills/xxx/SKILL.md
└── User:    ~/.claude/skills/xxx/SKILL.md

Skill Characteristics

In the Four-Layer Reference Model, Level 3 (organization rules) and Level 4 (best practices) use Skills as their primary delivery method. Skills have these key characteristics that distinguish them from other approaches:

ItemDescription
FormatMarkdown file
ContentBest practices, workflow definitions, guidelines
ExecutionNone (reference only)
Context consumptionLow (only when referenced)

MCP vs Skill vs Sub-agent

Decision Flow

Use this flowchart to determine whether to implement something as a Skill, MCP, or Sub-agent:

Comparison Table

AspectSkillMCPSub-agent
Context consumptionLowHighMedium
Dynamic processingNot possiblePossiblePossible
External APINot possiblePossibleVia MCP
MaintenanceMarkdown editingnpm publish, etc.Markdown editing
ReusabilityWithin projectGlobalWithin project
Use caseKnowledge/guidelinesTool/API integrationRole/expertise separation

Principles for Choosing

Skill = "Knowledge", "Guidelines", "Workflow definitions"
MCP   = "Tools", "API integration", "Dynamic processing"
Sub-agent = "Roles", "Expertise", "Task delegation"

Use Skills to define "what should be done"
Use MCP to provide "how to execute it"
Use Sub-agents to separate "who does it"

A2A vs Sub-agent

Fundamental Differences

AspectCustom Sub-agentA2A Agent
LocationWithin same processOver the network
OwnerSelfSelf or others
TrustFull trustAuthentication/authorization required
ContextPartially shared with parentCompletely isolated
LifecycleSession-limitedPersistent service
Internal implementationVisible (Markdown)Not visible (API contract only)

Analogy

Custom Sub-agent = "Internal specialized department"
A2A Agent        = "Outsourcing partner / Partner company"

Even with internal specialized departments, outsourcing partners are needed
Even with outsourcing partners, internal specialized departments are needed

→ Both are necessary; they are not substitutes for each other

When to Use Which

ScenarioWhat to Use
Want to use your own MCP expertlySub-agent
Want to reuse the same processing repeatedlySub-agent
Want to define a workflowSub-agent
Integrate with third-party agentsA2A
Expose your agent externallyA2A
Agent collaboration across multiple organizationsA2A

Executor Selection

Beyond choosing MCP / Skill / Sub-agent, the perspective of "who makes the decision" becomes crucial.

Evolution of Execution

The way we integrate with external services has evolved with technology.

In this evolution, not everything needs to be MCP-ified. The appropriate layer is determined by "who makes the decision".

Layer Selection by Decision Maker

Choose the implementation layer based on who makes the decision:

Decision MakerAppropriate LayerCharacteristicsExamples
None (Deterministic)Direct programNo judgment needed, fast, reliableBatch processing, CI/CD, cron
HumanCLIHuman decides, AI doesn't executegh pr list, aws s3 ls
AI (One-shot)MCP + SkillAI decides and executes per requestTranslation, RFC lookup, quality evaluation
AI (Continuous/Autonomous)Sub-agentAutonomous decisions with expertiseReview specialist, translation specialist

Decision Flow

This comprehensive flowchart guides the selection process from initial request through final implementation:

CLI vs MCP: When AI Makes the Decision

Key Insight: When an official CLI exists, CLI + Skill is more efficient than building an MCP

— From r/ClaudeAI community discussion

When both a CLI and an MCP are possible options, use this comparison table to choose the more efficient approach:

AspectCLI + SkillMCP
Token consumptionLow (command only)High (loads all tool definitions)
Startup costNoneRequires server process
AuthenticationLocalManaged by MCP
Purpose-built◎ (Dedicated design)△ (General purpose)

Examples

These examples illustrate the decision for popular services:

ServiceCLIRecommendation
GitHubghCLI + Skill
AWSawsCLI + Skill
Google CloudgcloudCLI + Skill
PostgreSQLpsqlCLI + Skill
LinearMCP
GreptileMCP
DeepLMCP

Key Insight

The fundamental principle for layer selection is:

Selection changes based on "who decides", not just "what to execute"

No decision needed  → Direct program
Human decides       → CLI
AI decides          → MCP or CLI + Skill
AI autonomous       → Sub-agent

With this perspective, you can avoid over-MCPization and implement at the appropriate layer.

Combination Patterns

The Most Powerful Combination

The most effective approach combines all three components working together:

Concrete Example: Translation Workflow

Here is a practical example showing how Skill, Sub-agent, and MCP work together:

markdown
<!-- skills/translation-workflow/SKILL.md -->

# Technical Document Translation Workflow

## MCP Tools Used

- `deepl` - Translation execution
- `xcomet` - Quality evaluation

## Guardrails (Inviolable Constraints)

- Registered glossary terms must always be used
- Do not add content not present in the source text

## Workflow

1. Translate with deepl:translate-text (formality: "more")
2. Evaluate with xcomet:xcomet_evaluate (evaluation pipeline)
   - Score 0.85 or higher: OK
   - Score below 0.85: Re-translate or manual correction
3. Detect errors with xcomet:xcomet_detect_errors

This example is a concrete implementation pattern of the two-layer verification structure defined in the Vision. Glossary adherence corresponds to guardrails (inviolable constraints), while the xCOMET score corresponds to the evaluation pipeline (probabilistic quality gate).

markdown
<!-- agents/translation-specialist.md -->

name: translation-specialist
description: Specialized agent for technical document translation and quality evaluation
tools: deepl:translate-text, xcomet:xcomet_evaluate, xcomet:xcomet_detect_errors
model: sonnet

You are an expert in technical translation.
Please refer to the translation-workflow skill.

Sequence Diagrams: Visualizing Execution Flow

Sequence diagrams help visualize how components interact during task execution.

Code Review Task

Here is how a code review task flows through the system:

Translation Workflow

Here is the sequence for a translation task with quality evaluation:

What the Three-Layer Model Does Not Explicitly Cover — Memory

The three-layer model (Agent / Skills / MCP) defines what an agent knows, what it can do, and on what basis it judges. However, how an agent remembers past interactions and outcomes and how it leverages that history — namely Memory — is outside the scope of this model.

Why Memory Is Not Included in the Three Layers

Memory is inherently dynamic, changing as conversations progress. In contrast, Skills are static domain knowledge and MCP is an explicit protocol interface — both are declaratively definable elements. Memory differs fundamentally in nature, and placing it alongside these layers would compromise the model's clarity.

Furthermore, Memory implementation varies significantly across LLMs and platforms:

  • Context window (short-term memory): Present in all LLMs, but size and lifecycle are model-dependent
  • Persistent memory: Platform-specific features (e.g., ChatGPT Memory) or application-layer implementations (e.g., LangChain ConversationBufferMemory)
  • CLAUDE.md: A concept close to "project-level memory" in Claude Code, but strictly an instruction file rather than Memory

Until unified protocols or standards are established, it is more appropriate to design Memory individually during the implementation phase rather than incorporating it into the three-layer model.

This Does Not Mean Memory Can Be Ignored

In actual agent design, Memory is an essential concern. Within the three-layer model, the Skills layer implicitly covers "long-term memory" of domain knowledge, and the MCP layer covers "reference memory" of external context. For conversation history and learning outcome retention, consider implementation strategies during the Development Phases.

Layer Structure Summary

The following layered structure shows how all components integrate. The Doctrine Layer (constraints, objectives, judgment criteria) governs all layers. In terms of information flow: doctrine constraints descend from above, resource facts ascend from below, and agent decision-making occurs at the center.

Released under the MIT License.