Skip to content

AI Design Patterns and the Role of MCP

As AI systems have advanced, how has the industry addressed the LLM's "knowledge limitations"? Where does MCP fit in, and what makes it different?

Positioning of This Page

01-vision (WHY — Why unshakeable references are needed)
02-reference-sources (WHAT — What qualifies as a reference source)
03-architecture (HOW — How to structure the system)
This page (WHICH — Which pattern to choose and when)

This page translates the abstract architecture from preceding chapters into concrete design decisions. Each pattern is described not as "something to adopt" but as a structure that works when preconditions are met — and breaks when they are not.

Meta Information
What this chapter establishesPositioning and selection criteria for RAG / MCP / Fine-tuning / Prompt Engineering, plus anti-patterns
What this chapter does NOT coverImplementation procedures for each pattern, model training methods, specific framework usage
Dependencies03-architecture (three-layer model structure definition)
Common misuseTreating any pattern as a "silver bullet." Each pattern has Failure Risk (collapse conditions)

About This Document

Implementing generative AI (LLM) in practical systems requires more than the capabilities of the model alone. AI knowledge has inherent limitations (see 01-vision), and various design patterns have emerged to overcome them.

This document provides an overview of major design patterns and carefully explains the differences between RAG (Retrieval-Augmented Generation) and MCP. The goal is to answer questions like "What's the difference between RAG and MCP?" and "Why does this project choose MCP?"

Architecture → Design Patterns Relationship

In software development, Architecture (structure) sits above Design Patterns (implementation techniques). The same relationship holds in this project.

Architecture defines "what goes where," while Design Patterns show "how to implement it concretely." The two reference each other in a mutually reinforcing relationship.

LLM's "Knowledge Limitations" — Why External Knowledge is Needed

LLMs are probabilistic generative models trained on massive amounts of text data. They possess remarkably broad knowledge, but have clear limitations (see Chapter 1 of 02-reference-sources for details).

LLM Knowledge
┌────────────────────────────────────────────────────────┐
│  ✅ Pre-trained knowledge (vast but fixed)             │
│     - General knowledge, programming, languages.       │
│     - Information up to training cutoff                │
├────────────────────────────────────────────────────────┤
│  ❌ Knowledge it lacks                                 │
│     - Information after training cutoff                │
│     - Internal documents and proprietary data          │
│     - Rare specialized knowledge (obscure RFC details) │
│     - Real-time information                            │
└────────────────────────────────────────────────────────┘

Various design patterns have been devised to address this "missing knowledge."

Major AI Design Patterns

2.1 Overview of Patterns

Here's a summary of the major design patterns for addressing LLM knowledge limitations.

2.2 Overview of Each Pattern

RAG (Retrieval-Augmented Generation)

A technique that searches external documents and injects relevant information into the LLM's prompt. Rather than modifying the LLM itself, it combines "retrieval + generation."

Key Point: The core of RAG is searching information using "vector similarity." Documents are split into small fragments (chunks), and fragments semantically closest to the question are retrieved and passed to the LLM.

CharacteristicDescription
TargetUnstructured text (documents, FAQs, internal wikis, etc.)
Search MethodVector similarity search (semantic search)
Pre-processingDocument chunking → Vectorization → DB storage
StrengthsCan find relevant information from large document collections
WeaknessesContext is lost through chunking, doesn't understand structure

MCP (Model Context Protocol)

A standard protocol developed by Anthropic for connecting AI models with external tools and services. It enables AIs to access external data through structured APIs and execute actions.

Key Point: The core of MCP is accessing information through "structured APIs." Data is retrieved with understanding of domain structure (sections, requirement levels, cross-references, etc.).

CharacteristicDescription
TargetStructured data, APIs, external services
Access MethodStructured API calls (JSON-RPC)
Pre-processingNot required (MCP server understands structure)
StrengthsAccurate data retrieval, verifiable, distributable
WeaknessesRequires MCP server development

MCP Details: See mcp/what-is-mcp.

Fine-tuning

A technique that performs additional training of LLM parameters on domain-specific data. It rewrites the model's "internal knowledge."

Base Model (GPT-4, Claude, etc.)
    ↓ + Additional training with domain-specific data
Customized Model

Can answer with domain-specific knowledge
CharacteristicDescription
TargetDomain-specific knowledge and style
Modified ComponentModel parameters themselves
CostHigh (data preparation + computational resources)
StrengthsDeep knowledge embedding in the model
WeaknessesDifficult to update, complete elimination of hallucinations impossible

Prompt Engineering

A technique that controls output quality through careful input prompt design without modifying model parameters.

Main techniques:
- Zero-shot:         Instruction only
- Few-shot:          Provide several examples
- Chain-of-Thought:  "Think step by step"
- System Prompt:     Pre-define role and constraints
CharacteristicDescription
TargetAny task
Modified ComponentInput prompt only
CostLowest
StrengthsCan try immediately, no model modification needed
WeaknessesCannot supplement knowledge the model doesn't have

Agentic AI

A pattern where the LLM autonomously plans, calls tools, and solves problems through multiple steps. MCP is one of the foundational technologies supporting this pattern.

CharacteristicDescription
TargetComplex, multi-step tasks
OperationAutonomous planning → execution → verification loop
DependenciesMCP (tool connection), Skills (knowledge reference)
StrengthsCan automate complex tasks
WeaknessesHard to predict, difficult to control in some cases

Agent Details: See 03-architecture.

GraphRAG

A technique combining knowledge graphs with standard RAG to leverage relationships between entities.

Standard RAG:  Documents → Chunks → Vector Search
GraphRAG:      Documents → Entity Extraction → Build Relationship Graph → Graph Search
CharacteristicDescription
TargetData where relationships between entities are important
StrengthsStrong at "How does A relate to B?"
WeaknessesHigh cost of graph construction

2.3 Pattern Categories

Each pattern can be classified into four categories based on its primary concern.

CategoryPatternsPrimary Concern
Knowledge InjectionRAG, MCP, GraphRAGSupplement external knowledge that LLMs lack
Reasoning EnhancementPrompt Engineering, Chain-of-ThoughtControl and improve the LLM's reasoning process
Autonomous BehaviorAgentic AIAutomate multi-step planning, execution, and verification
Model EnhancementFine-tuningModify the model's own parameters

These categories are not exclusive. Agentic AI internally combines knowledge injection (MCP/RAG) and reasoning enhancement (Prompt Engineering).

2.4 Pattern Comparison Table

PatternCategoryProblem SolvedModified ComponentCostReal-timeComplexityFailure Risk
RAGKnowledge InjectionKnowledge completionPromptMedium△ (depends on index update frequency)★★☆Depends on chunk quality
MCPKnowledge InjectionTool connection, accurate data retrievalPromptMedium-High◎ (Real-time)★★☆Server downtime
Fine-tuningModel EnhancementDomain specializationModel parametersHigh✗ (retraining needed)★★★Data contamination
Prompt EngineeringReasoning EnhancementOutput quality controlPromptLow-★☆☆Low (safest)
Agentic AIAutonomous BehaviorComplex task automationArchitectureHigh★★★Loss of control
GraphRAGKnowledge InjectionRelationship understandingPrompt + GraphHigh★★★Depends on graph quality

Reading the "Failure Risk" Column

No pattern is a silver bullet. The Failure Risk column indicates the primary condition under which the pattern stops working. During design, consider fallback strategies for these risks.

2.5 Patterns Are Not Mutually Exclusive

These patterns are not mutually exclusive; they can and should be combined.

For example, a system where Agentic AI receives appropriate instructions through Prompt Engineering, searches internal documents through RAG as needed, and confirms standard specifications through MCP is entirely plausible.

2.6 Common Anti-patterns

Misapplying design patterns can produce results opposite to expectations.

Anti-patternDescriptionWhen It Occurs
RAG-for-EverythingTrying to solve all knowledge needs with RAGApplying chunk search to structured data, degrading precision
Over-AgentificationApplying Agentic AI to simple tasksAdding overhead and unpredictability where autonomous judgment is unnecessary
Fine-tuning DependencyEmbedding frequently changing knowledge in model parametersRequiring retraining after every regulation or spec update, causing operational cost blowout
Prompt BloatTrying to control everything through Prompt EngineeringConsuming the context window, leaving no room for actual knowledge injection

Avoiding anti-patterns requires understanding each pattern's preconditions and failure risks (see the comparison table in Section 2.4) before making design choices.

Deep Dive into RAG

3.1 How RAG Works

RAG is a technique published in 2020 by Meta's (formerly Facebook's) research team and is the most popular pattern for providing external knowledge to LLMs.

Step 1: Index Building (Offline)

Original Document:
  "RFC 6455 defines the WebSocket protocol.
   Section 5.5.1 specifies the format of Close frames,
   Section 7.4.1 defines status codes.
   1006 indicates abnormal closure and
   MUST NOT be included in Close frames."

    ↓ Chunk Splitting

Chunk 1: "RFC 6455 defines the WebSocket protocol.
         Section 5.5.1 specifies the format of Close frames"
Chunk 2: "Section 7.4.1 defines status codes.
         1006 indicates abnormal closure"
Chunk 3: "MUST NOT be included in Close frames."

    ↓ Vectorization (Embedding)

Chunk 1 → [0.12, -0.34, 0.56, ...]  ← Numerical vector, hundreds to thousands of dimensions
Chunk 2 → [0.23, -0.11, 0.78, ...]
Chunk 3 → [0.45, -0.67, 0.12, ...]

    ↓ Store in Vector DB

Step 2: Search and Generation (Online)

User Question: "What does WebSocket status code 1006 mean?"

    ↓ Vectorize Question

Question Vector → [0.21, -0.15, 0.72, ...]

    ↓ Similarity Search (Cosine Similarity, etc.)

Closest Chunk → Chunk 2:
  "Section 7.4.1 defines status codes.
   1006 indicates abnormal closure"

    ↓ Inject into Prompt

"Please answer the question using the following information.
 ---
 Section 7.4.1 defines status codes.
 1006 indicates abnormal closure
 ---
 Question: What does WebSocket status code 1006 mean?"

    ↓ LLM Generates Answer

3.2 RAG Strengths and Typical Use Cases

Here are the scenarios where RAG is particularly effective.

Use CaseDescriptionExample
Internal Document SearchRetrieve information from large amounts of internal documentationInternal Wiki, Manuals, FAQs
Customer SupportGenerate answers from product knowledge baseHelp Center, Chatbots
Academic ResearchExtract relevant information from paper databasesLiterature review support
Legal SupportSimilarity search of contracts and case lawFinding similar clauses

What RAG Excels At: Finding information that is "semantically similar" from large amounts of unstructured text.

3.3 RAG Limitations

However, RAG has structural limitations.

Loss of Context Through Chunk Splitting

Original Context:
  "1006 indicates abnormal closure and
   MUST NOT be included in Close frames."

After Chunking:
  Chunk A: "1006 indicates abnormal closure"          ← Retrieved
  Chunk B: "MUST NOT be included in Close frames"     ← May not be retrieved

→ Risk of losing the important MUST NOT requirement

Insufficient Structure Understanding

What RAG Returns:
  "1006 indicates abnormal closure" (text fragment)

What RAG Cannot Return:
  - That this is defined in Section 7.4.1
  - That it is a MUST NOT level requirement
  - The relationship with Close frame format in Section 5.5.1
  - Its position in RFC 6455 as a whole

Search Precision Limitations

Vector similarity search returns things that are "semantically similar" but not necessarily "exactly matching."

Question: "What is the meaning of RFC 6455 status code 1002?"

Chunks that Might Be Returned:
  ✅ "1002 indicates a protocol error" (correct)
  ❌ "1006 indicates abnormal closure" (semantically similar, but not what was asked)
  ❌ "1000 indicates normal closure" (same category but different code)

3.4 Advanced RAG — Mitigating the Limitations

Several techniques (collectively called Advanced RAG) have been researched and implemented to address the limitations above.

TechniqueLimitation AddressedOverview
Hybrid SearchSearch precisionCombines vector search + keyword search (BM25, etc.) for improved accuracy
Re-rankingSearch precisionRe-ranks initial results using a cross-encoder
Parent-Child ChunkingContext lossSearches with small chunks, returns parent chunks (broader context)
HyDESearch precisionLLM generates a hypothetical answer, which is used as the search query

These techniques mitigate RAG's weaknesses but do not surpass MCP's precision for structured data. The principle of choosing based on data characteristics remains unchanged.

Where RAG Fits in the Architecture — What Should Users Do?

After understanding how RAG works, a common question arises:

"I get that RAG is important. But what should I actually do?"

To answer this, let's position RAG within this project's architecture.

3.5.1 RAG Is a "Design Pattern," Not a "Standard"

First, an important premise: RAG is not a standardized protocol.

AspectRAGMCPSkills
TypeDesign patternStandard protocol (with spec)Specification (Agent Skills Spec)
StandardizationNone (no RFC or W3C spec exists)modelcontextprotocol.ioagentskills.io
Implementation uniformityVendor-specific (LangChain, LlamaIndex, Bedrock, etc.)Standard SDKs (TypeScript/Python)Standard format (SKILL.md)
InteroperabilityNoneYes (any agent can connect)Yes (16+ agents supported)

In other words, RAG only has conceptual consensus — "retrieve, inject into context, generate" — while the concrete implementation is left to each vendor.

3.5.2 RAG Is an Internal Processing Pipeline of the Agent

RAG processing is executed as an internal operation of the agent, invisible to the user.

The key point here:

  • Users define "what to search" and "what knowledge to use" → via Skills and MCP
  • The agent decides "how to search and how to inject" → via the RAG pipeline

Users don't operate RAG directly — they benefit from RAG indirectly through Skills and MCP.

3.5.3 What Should Users Do?

To maximize the benefits of RAG, here's what users can do:

ActionConcrete StepsRelationship to RAG
Define SkillsWrite translation quality criteria, code review guidelines, spec compliance checklists as SKILL.mdBecomes the "static knowledge base" the agent references
Connect MCPConfigure MCP servers for vector DBs, RFCs, legislation, DeepL, etc.Becomes the "external data source" the agent searches
Write CLAUDE.mdDocument project policies, terminology, constraintsConstantly injected as the agent's "context"

Takeaway: You don't need to build or control RAG yourself. Your role as a user is to tell the agent "what it should know" through standardized interfaces — Skills, MCP, and CLAUDE.md.

Essential Differences Between RAG and MCP

4.1 Fundamental Difference in Approach

Both RAG and MCP "provide external knowledge to LLMs," but their approaches differ fundamentally.

AspectRAGMCP
Search PrincipleSemantic similarity of textPrecise query based on domain structure
PrerequisiteCan chunk documentsExists an API that understands domain structure
Result Nature"Probably relevant" text fragments"Definitely applicable" structured data
Source ClarityAmbiguous (hard to trace which chunk)Clear (RFC 6455 Section 7.4.1, etc.)

4.2 Comparison by the 5 Characteristics of "Unshakeable References"

Comparing by the five characteristics of "unshakeable references" defined in 02-reference-sources makes the differences even clearer.

CharacteristicRAGMCP (Reference MCP)Description
AuthorityRAG chunk origins are ambiguous. MCP accesses the original source directly
Immutability, Version ManagementRAG depends on index update timing. MCP reflects original version management
StructuringRAG loses structure through chunking. MCP preserves sections and requirement levels
VerifiabilityRAG makes it difficult to trace "which chunk generated this." MCP shows exact sources
AccessibilityBoth are programmatically accessible, but MCP uses a standard protocol

4.3 Comparison with Concrete Examples

Example: "What is the meaning of Close code 1006 in RFC 6455?"

With RAG:

1. Pre-process RFC 6455 full text into chunks and store in vector DB
2. Vectorize question and search similar chunks
3. Returned chunk:
   "1006 is a reserved value and MUST NOT be set as a status code
    in a Close control frame by an endpoint."

Problems:
- Unclear that this chunk comes from Section 7.4.1
- Requirement level MUST NOT not attached as metadata
- Surrounding context (why MUST NOT) may be missing
- May return a different chunk (explanation of 1002, etc.)

With rfcxml-mcp:

1. Call get_requirements(rfc=6455, section="7.4.1")
2. Returned result:
   {
     section: "7.4.1",
     requirement: "1006 is a reserved value and MUST NOT be set
                   as a status code in a Close control frame
                   by an endpoint.",
     level: "MUST NOT",
     context: "It is designated for use in applications
               expecting a status code to indicate that
               the connection was closed abnormally"
   }

Benefits:
- Section number is clear (Section 7.4.1)
- Requirement level is structured (MUST NOT)
- Surrounding context is preserved
- Can even verify implementation compliance with validate_statement()

Example: "What are the requirements of Article 2 of the Electronic Signature Act?"

With RAG:

Chunk the entire law → Search chunks related to "Article 2"
→ Law structure (articles, subsections, items) is lost
→ Risk of mixing pre-amendment and post-amendment versions

With hourei-mcp (e-gov-law MCP):

Call find_law_article(law_name="Electronic Signature Act", article_number="2")
→ Can retrieve the law text structure (subsections, items) intact
→ Get latest law data (real-time from e-Gov API)

4.4 Distribution Capability Difference

MCP has a decisive advantage: "it can be distributed as a standard protocol."

Sharing RAG Pipeline:
  1. Document preparation
  2. Implement chunking logic
  3. Select Embedding model
  4. Build and operate vector DB
  5. Tune search parameters
  → Each organization must build independently

Sharing MCP Server:
  npx @shuji-bonji/rfcxml-mcp
  → Anyone can get structured RFC access with just this
AspectRAGMCP
Distribution MethodIndependent construction requiredCan be distributed as npm package, etc.
Deployment CostVector DB construction + indexingSingle configuration file
Quality ConsistencyDepends on builder's skillsDeveloper ensures quality
MaintenanceEach organization handles separatelyDeveloper updates centrally

MCP Servers in This Project vs RAG

5.1 "Isn't This Just RAG in Disguise?" Question

Looking at this project's related MCP servers (rfcxml-mcp, w3c-mcp, pdf-spec-mcp, epsg-mcp, etc.), one might think "searching external specifications and passing them to LLM" is the same as RAG.

However, there is a fundamental difference.

5.2 "Text Search" vs "Domain Knowledge Codification"

Each MCP server provides an API that understands the target domain's structure and semantics. This is not merely text search; it is the codification of domain knowledge.

MCP ServerStructure It UnderstandsWhat RAG Loses
rfcxml-mcpSection hierarchy, MUST/SHOULD/MAY classification, RFC cross-references (obsoletes/updates)Distinction of requirement levels, relationships between sections
w3c-mcpWebIDL definitions, CSS specification structure, HTML element attributes and content modelsType information of interfaces, inheritance relationships of properties
pdf-spec-mcpISO 32000 chapter structure, requirement tables, term definitionsTable structure, differences between specification versions
epsg-mcpRecommended uses of coordinate reference systems, transformation paths, accuracy characteristicsSpatial scope of applicability, transformation accuracy information
pdf-reader-mcpInternal object structure of PDFs, tag hierarchy, font informationBinary structure interpretation, reference relationships between objects

5.3 Practical Differences

When to Use RAG vs MCP

6.1 Decision Flow

6.2 Selection Guide

ScenarioRecommendedReason
Verify RFC/W3C complianceMCPNeed structured requirement extraction
Search internal documentsRAGLarge-scale unstructured text search
Get law article textMCPNeed to preserve article, subsection, item structure
Customer support FAQsRAGFlexible handling of diverse questions
Translation quality evaluationMCPStructured scores and error detection
Summarize research papersRAGProcess large volumes of unstructured text
PDF spec requirement checkMCPAccurate retrieval of tables and requirement levels
Team knowledge sharingRAG or SkillChoose based on context

6.3 Combined Patterns

RAG and MCP are not mutually exclusive. The following combined patterns are conceivable.

Pattern: Understand Overview with RAG → Verify Accuracy with MCP

Why This Project Chooses MCP

7.1 Alignment with Project Philosophy

The core philosophy of this project is "unshakeable references" (see 01-vision).

RAG returns "probably relevant information"    → Can be unreliable
MCP returns "definitely applicable information" → Reliable

To give AI output verifiable foundations, clear sources and structured data access are essential. This is the fundamental reason why this project centers on MCP.

7.2 The 3 Values of MCP

The value that MCP provides in this project's context can be summarized in three points.

  1. Accuracy: Retrieve accurate information through APIs that understand domain structure
  2. Verifiability: Source (RFC number, section number, etc.) is always explicit
  3. Democratization: Can be distributed as npm packages, providing everyone equal quality access

7.3 We Are Not Rejecting RAG

As an important caveat, this project is not rejecting RAG. RAG is a very powerful technique for the purpose of "finding relevant information from large amounts of unstructured text."

However, for the purpose of providing "unshakeable references" to AI judgment, MCP's approach is more appropriate. Each pattern has appropriate use cases.

RAG's appropriate use:  "Want to find likely related items from lots of documents"
MCP's appropriate use:  "Want to get accurate information from specific standards/regulations"

→ Different purposes, so it's about choosing the right tool, not ranking

Summary

Pattern Selection Decision Summary

Here are the key questions to ask during design, and the recommended patterns based on your answers.

Key QuestionAnswerRecommended Pattern
Does the target data have clear structure?Yes → Structured API existsMCP
Yes → No API, but worth buildingBuild MCP
No → Large-scale text search neededRAG (+ consider Advanced RAG)
No → Small amount of knowledge sufficesPrompt Engineering / Skills
Do you need to modify the model's own knowledge?YesFine-tuning (accept cost and operational burden)
Do you need autonomous multi-step judgment?YesAgentic AI (+ MCP/Skills for knowledge supply)

When in doubt, start with Prompt Engineering, then progressively adopt MCP → RAG → Agentic AI as needed. The principle is to begin with the lowest-risk pattern.

Core Messages

  1. Generative AI has various design patterns — RAG, MCP, Fine-tuning, Agentic AI, etc., each solves different problems
  2. RAG searches by "text similarity" — Strong at finding related information from large amounts of unstructured text
  3. MCP searches by "domain structure" — Retrieves accurate information through structured APIs
  4. MCP is better suited for "unshakeable references" — MCP has advantages in authority, structuring, verifiability, and distributability
  5. RAG and MCP are not mutually exclusive — Each has appropriate use cases, and combined use is possible
  6. Each MCP server codifies domain knowledge — Not merely text search, but provides understanding of structure

Emerging Patterns to Watch

In addition to the patterns covered in this page, the following techniques may become influential.

PatternOverviewStatus
ToolformerLLM autonomously learns to invoke external toolsResearch stage (Meta, 2023)
RETROEmbeds retrieval mechanisms into the model architectureResearch stage (DeepMind)
Agentic WorkflowMultiple agents collaborate in coordinated workflowsAdvancing toward production

When these mature, they will be integrated into this page's pattern taxonomy.

Released under the MIT License.