MCP/A2A/Skill/Agent Architecture
Understanding the components of AI-driven development infrastructure and organizing their roles and relationships.
Positioning of This Document
Where the Vision (01-vision) defines "why" and Authoritative Reference Sources (02-reference-sources) defines "what," this document defines "how to structure it." It bridges the gap from philosophical foundations to actionable system design — structuring without destroying the underlying philosophy.
Meta Information
| What this chapter establishes | The three-layer separation of Agent / Skills / MCP and the responsibility boundaries of each layer |
| What this chapter does NOT cover | Pattern selection criteria (→04), constraint mitigation (→05), Doctrine Layer details (→07) |
| Dependencies | 01-vision (design philosophy), 02-reference-sources (reference source framework) |
| Common misuse | Confusing the three layers with physical deployment topology. The three layers represent separation of responsibilities, not deployment configuration |
Layer Structure Overview
The architecture is organized into four distinct layers, each with specific responsibilities, as shown in the following diagram:
Doctrine Layer — "Constitutional Judgment Criteria"
These three layers define what AI knows and what AI can do. The question of on what basis AI judges and decides is addressed by the Doctrine Layer. The Doctrine Layer is not merely a system prompt — it is constitutional judgment criteria that governs all three layers through shared objectives, constraints, and judgment criteria.
The Responsibility Shift Model defined in the Vision (design-time: human / execution-time: agent / structural constraints: system) and the two-layer verification structure (guardrails + evaluation pipelines) are applied to each layer through this Doctrine Layer.
Layer Responsibilities
Each layer has distinct ownership and responsibility areas:
| Layer | Responsibility | Owns | Examples |
|---|---|---|---|
| Agent | Orchestration, decision-making | Task flow | Claude Code, Cursor |
| Skills | Domain knowledge, guidelines | Best practices | SOLID principles, translation guidelines |
| MCP | External connectivity | Tool definitions | deepl-mcp, rfcxml-mcp |
About This Document
AI-driven development involves multiple components, and correctly understanding their roles and relationships is key to efficient development. This document organizes four main concepts: MCP (tool connectivity), A2A (agent-to-agent communication), Skill (static knowledge), and Custom Sub-agents (role specialization).
When you are unsure about "What should be implemented as MCP?", "When is a Skill sufficient?", or "When should I use a sub-agent?", refer to this document to make the appropriate choice.
Overall Architecture
The following diagram shows how all components interact within the complete system architecture:
MCP Three-Layer Structure
Host / Client / Server
MCP is built on a three-layer architecture, where communication flows through the client layer:
| Layer | Role | Example | Developer Involvement |
|---|---|---|---|
| Host | UI, session management | Claude Code, Cursor, VS Code | Consumer |
| Client | Protocol processing, server management | Built into Host | Usually not concerned |
| Server | Tool/resource provision | rfcxml-mcp, deepl-mcp | Provider |
Why You Don't Need to Worry About the Client
For most developers, the client layer operates transparently as part of the host:
Typical development flow:
1. Create an MCP Server (e.g., rfcxml)
2. Add it to Claude Code configuration
3. Claude Code operates as a built-in Client
4. Tools become available
→ The Client is embedded in the Host and
functions as a black boxMCP and A2A: Separation of Concerns
Protocol Differences
While MCP and A2A serve different purposes, they are complementary and address different communication needs:
| Item | MCP | A2A |
|---|---|---|
| Led by | Anthropic | Google → Linux Foundation |
| Purpose | Tool connectivity | Agent-to-agent communication |
| Connects to | MCP Server (tools) | Other agents (including third-party) |
| Context | Can share with parent agent | Completely isolated |
| Owner | Self | Self or others |
Official Recommendation
Build with ADK, equip with MCP (tools), communicate with A2A (agents)
MCP = Using hands (tools)
A2A = Collaborating with others (agents)Custom Sub-agents
What is a Sub-agent?
An AI assistant specialized for specific tasks that can be defined within Claude Code.
Location:
├── Project: .claude/agents/xxx.md (Priority: High)
└── User: ~/.claude/agents/xxx.md (Priority: Low)Definition Format
Sub-agents are defined using a simple markdown format that specifies their role, capabilities, and instructions:
name: rfc-specialist
description: Expert in RFC specification verification and validation
tools: rfcxml:get_rfc_structure, rfcxml:get_requirements
model: sonnet
You are an expert in RFC specifications.
Use only the rfcxml tools.Additional Definition Example
Here is a more practical sub-agent that combines multiple MCPs and Skills:
name: compliance-checker
description: Expert agent for legal and technical compliance checking
tools: hourei:find_law_article, rfcxml:get_requirements, rfcxml:validate_statement
model: sonnet
You are an expert in legal and technical specification compliance checking.
1. Retrieve legal requirements via hourei-mcp
2. Retrieve technical requirements via rfcxml-mcp
3. Map both and report compliance status
Always cite sources (legal article numbers, RFC section numbers).Sub-agent Positioning
The following diagram illustrates where sub-agents sit within the Claude Code architecture:
Important: Sub-agents are not a "replacement" for the MCP Client, but rather a "higher layer"
- Sub-agent = Defines "what to do" (role, procedures)
- MCP Client = Implements "how to connect" (protocol processing)
Skill
What is a Skill?
Static knowledge and guidelines that can be referenced in Claude Code. Skills are stored in the following locations:
Location:
├── Project: .claude/skills/xxx/SKILL.md
└── User: ~/.claude/skills/xxx/SKILL.mdSkill Characteristics
In the Four-Layer Reference Model, Level 3 (organization rules) and Level 4 (best practices) use Skills as their primary delivery method. Skills have these key characteristics that distinguish them from other approaches:
| Item | Description |
|---|---|
| Format | Markdown file |
| Content | Best practices, workflow definitions, guidelines |
| Execution | None (reference only) |
| Context consumption | Low (only when referenced) |
MCP vs Skill vs Sub-agent
Decision Flow
Use this flowchart to determine whether to implement something as a Skill, MCP, or Sub-agent:
Comparison Table
| Aspect | Skill | MCP | Sub-agent |
|---|---|---|---|
| Context consumption | Low | High | Medium |
| Dynamic processing | Not possible | Possible | Possible |
| External API | Not possible | Possible | Via MCP |
| Maintenance | Markdown editing | npm publish, etc. | Markdown editing |
| Reusability | Within project | Global | Within project |
| Use case | Knowledge/guidelines | Tool/API integration | Role/expertise separation |
Principles for Choosing
Skill = "Knowledge", "Guidelines", "Workflow definitions"
MCP = "Tools", "API integration", "Dynamic processing"
Sub-agent = "Roles", "Expertise", "Task delegation"
Use Skills to define "what should be done"
Use MCP to provide "how to execute it"
Use Sub-agents to separate "who does it"A2A vs Sub-agent
Fundamental Differences
| Aspect | Custom Sub-agent | A2A Agent |
|---|---|---|
| Location | Within same process | Over the network |
| Owner | Self | Self or others |
| Trust | Full trust | Authentication/authorization required |
| Context | Partially shared with parent | Completely isolated |
| Lifecycle | Session-limited | Persistent service |
| Internal implementation | Visible (Markdown) | Not visible (API contract only) |
Analogy
Custom Sub-agent = "Internal specialized department"
A2A Agent = "Outsourcing partner / Partner company"
Even with internal specialized departments, outsourcing partners are needed
Even with outsourcing partners, internal specialized departments are needed
→ Both are necessary; they are not substitutes for each otherWhen to Use Which
| Scenario | What to Use |
|---|---|
| Want to use your own MCP expertly | Sub-agent |
| Want to reuse the same processing repeatedly | Sub-agent |
| Want to define a workflow | Sub-agent |
| Integrate with third-party agents | A2A |
| Expose your agent externally | A2A |
| Agent collaboration across multiple organizations | A2A |
Executor Selection
Beyond choosing MCP / Skill / Sub-agent, the perspective of "who makes the decision" becomes crucial.
Evolution of Execution
The way we integrate with external services has evolved with technology.
In this evolution, not everything needs to be MCP-ified. The appropriate layer is determined by "who makes the decision".
Layer Selection by Decision Maker
Choose the implementation layer based on who makes the decision:
| Decision Maker | Appropriate Layer | Characteristics | Examples |
|---|---|---|---|
| None (Deterministic) | Direct program | No judgment needed, fast, reliable | Batch processing, CI/CD, cron |
| Human | CLI | Human decides, AI doesn't execute | gh pr list, aws s3 ls |
| AI (One-shot) | MCP + Skill | AI decides and executes per request | Translation, RFC lookup, quality evaluation |
| AI (Continuous/Autonomous) | Sub-agent | Autonomous decisions with expertise | Review specialist, translation specialist |
Decision Flow
This comprehensive flowchart guides the selection process from initial request through final implementation:
CLI vs MCP: When AI Makes the Decision
Key Insight: When an official CLI exists, CLI + Skill is more efficient than building an MCP
— From r/ClaudeAI community discussion
When both a CLI and an MCP are possible options, use this comparison table to choose the more efficient approach:
| Aspect | CLI + Skill | MCP |
|---|---|---|
| Token consumption | Low (command only) | High (loads all tool definitions) |
| Startup cost | None | Requires server process |
| Authentication | Local | Managed by MCP |
| Purpose-built | ◎ (Dedicated design) | △ (General purpose) |
Examples
These examples illustrate the decision for popular services:
| Service | CLI | Recommendation |
|---|---|---|
| GitHub | gh | CLI + Skill |
| AWS | aws | CLI + Skill |
| Google Cloud | gcloud | CLI + Skill |
| PostgreSQL | psql | CLI + Skill |
| Linear | ❌ | MCP |
| Greptile | ❌ | MCP |
| DeepL | ❌ | MCP |
Key Insight
The fundamental principle for layer selection is:
Selection changes based on "who decides", not just "what to execute"
No decision needed → Direct program
Human decides → CLI
AI decides → MCP or CLI + Skill
AI autonomous → Sub-agentWith this perspective, you can avoid over-MCPization and implement at the appropriate layer.
Combination Patterns
The Most Powerful Combination
The most effective approach combines all three components working together:
Concrete Example: Translation Workflow
Here is a practical example showing how Skill, Sub-agent, and MCP work together:
<!-- skills/translation-workflow/SKILL.md -->
# Technical Document Translation Workflow
## MCP Tools Used
- `deepl` - Translation execution
- `xcomet` - Quality evaluation
## Guardrails (Inviolable Constraints)
- Registered glossary terms must always be used
- Do not add content not present in the source text
## Workflow
1. Translate with deepl:translate-text (formality: "more")
2. Evaluate with xcomet:xcomet_evaluate (evaluation pipeline)
- Score 0.85 or higher: OK
- Score below 0.85: Re-translate or manual correction
3. Detect errors with xcomet:xcomet_detect_errorsThis example is a concrete implementation pattern of the two-layer verification structure defined in the Vision. Glossary adherence corresponds to guardrails (inviolable constraints), while the xCOMET score corresponds to the evaluation pipeline (probabilistic quality gate).
<!-- agents/translation-specialist.md -->
name: translation-specialist
description: Specialized agent for technical document translation and quality evaluation
tools: deepl:translate-text, xcomet:xcomet_evaluate, xcomet:xcomet_detect_errors
model: sonnet
You are an expert in technical translation.
Please refer to the translation-workflow skill.Sequence Diagrams: Visualizing Execution Flow
Sequence diagrams help visualize how components interact during task execution.
Code Review Task
Here is how a code review task flows through the system:
Translation Workflow
Here is the sequence for a translation task with quality evaluation:
What the Three-Layer Model Does Not Explicitly Cover — Memory
The three-layer model (Agent / Skills / MCP) defines what an agent knows, what it can do, and on what basis it judges. However, how an agent remembers past interactions and outcomes and how it leverages that history — namely Memory — is outside the scope of this model.
Why Memory Is Not Included in the Three Layers
Memory is inherently dynamic, changing as conversations progress. In contrast, Skills are static domain knowledge and MCP is an explicit protocol interface — both are declaratively definable elements. Memory differs fundamentally in nature, and placing it alongside these layers would compromise the model's clarity.
Furthermore, Memory implementation varies significantly across LLMs and platforms:
- Context window (short-term memory): Present in all LLMs, but size and lifecycle are model-dependent
- Persistent memory: Platform-specific features (e.g., ChatGPT Memory) or application-layer implementations (e.g., LangChain
ConversationBufferMemory) CLAUDE.md: A concept close to "project-level memory" in Claude Code, but strictly an instruction file rather than Memory
Until unified protocols or standards are established, it is more appropriate to design Memory individually during the implementation phase rather than incorporating it into the three-layer model.
This Does Not Mean Memory Can Be Ignored
In actual agent design, Memory is an essential concern. Within the three-layer model, the Skills layer implicitly covers "long-term memory" of domain knowledge, and the MCP layer covers "reference memory" of external context. For conversation history and learning outcome retention, consider implementation strategies during the Development Phases.
Layer Structure Summary
The following layered structure shows how all components integrate. The Doctrine Layer (constraints, objectives, judgment criteria) governs all layers. In terms of information flow: doctrine constraints descend from above, resource facts ascend from below, and agent decision-making occurs at the center.