Skip to content

Vision for AI-Driven Development

This document outlines the philosophy underlying AI agent architecture (MCP, Skills, and Agent integration) and the fundamental approach to AI-driven development.

Scope of this site

On this site, "AI agent" means an agent whose reasoning core is a foundation model (mainly an LLM). For where this sits within AI as a whole — including reinforcement learning agents and symbolic AI — see FAQ: Does "AI Agent" Mean Only LLMs?.

Audience: Engineers interested in AI-driven development. Whether you're a practitioner evaluating MCP/Skills adoption or a decision-maker considering team-wide integration, this document provides a foundational perspective.

Position of This Page

👉 This page (WHY — Why we need authoritative references)

Meta Information
What this chapter establishesAI's four fundamental limitations (accuracy, currency, authority, accountability) and the need for "authoritative references"
What this chapter does NOT coverSpecific reference source selection (→02), architecture design (→03), implementation techniques (→04)
DependenciesNone (starting point of the Concepts section)
Common misuseConcluding "AI is unusable." This chapter's argument is "recognize the constraints, then structurally compensate"

Core Understanding

AI is "Not Omnipotent"

While AI capabilities are rapidly advancing, it is crucial to correctly recognize their limitations. To avoid over-reliance on AI and use it appropriately, we need to understand the following constraints.

AI generates outputs probabilistically from training data, but cannot guarantee the following. Note that the means of addressing each constraint are not homogeneous.

AI LimitationDescriptionMeans of Resolution
AccuracyHallucination problem - may generate information that differs from factsMCP (connection to authoritative sources)
CurrencyDoes not have information beyond the training data cutoffMCP (dynamic retrieval)
AuthorityCannot guarantee official interpretation of specificationsMCP (direct citation of primary sources)
AccountabilityCannot provide grounds for legal or ethical judgmentsSkills + Doctrine + human review (connection alone cannot solve)

The first three can be structurally compensated by "connecting to trustworthy sources." In contrast, accountability cannot be solved by connection alone. Even though law can be referenced via the houki-series MCPs, ethical judgment and organization-specific value judgments are only ensured by combining domain knowledge systematized as Skills, judgment criteria made explicit by the Doctrine layer, and ultimately human review.

Problems solvable by connection vs. problems not solvable by connection

The core design principle of this document is to separate "problems solvable by connection" from "problems that require Skills + Doctrine + humans." The former is MCP's role; the latter corresponds to the responsibility that remains with humans, which this site emphasizes repeatedly.

The Essence of AI-Driven Development

AI-driven development ≠ Having AI write code
AI-driven development = Utilizing AI throughout all processes while humans retain judgment and creativity

What "All Processes" Means

This is not limited to coding. It refers to the broad range of management domains involved in software development — including project management, product management, SRE, security, and data management — as outlined in Management of Software Systems and Services.

However, realizing this ideal requires a prerequisite. For AI to be truly useful "across all processes," it must be placed in an environment where it can make correct judgments.

The Reality During This Transitional Period

The era where AI autonomously completes every process has not yet arrived. AI excels at "generating plausible output," but it cannot judge on its own whether that output is correct.

At the same time, the code AI generates today depends on abstraction layers — frameworks and libraries — whose foundations are the standards and specifications that humanity has accumulated over time. However, AI does not directly reference these standards (dashed line in the diagram below); it relies on inference through the abstraction layer.

Although the code AI generates is built on top of these standards, AI's training data is dominated by the abstraction layer (library code examples), and the direct reference path to the primary sources of standards has been lost (dashed line). This is the core problem.

For AI to function correctly, it must be able to directly reference the same standards and specifications that its generated code ultimately depends on. This is why "unwavering reference sources" are necessary.

The Importance of "Unwavering Reference Sources"

Why Reference Sources Are Needed

AI ChallengeWhat Reference Sources Solve
Fixed point-in-time training dataAccess to authoritative up-to-date sources
HallucinationProvision of verifiable evidence
Interpretation variance by contextConsistent decision criteria
Lack of latest informationRetrieval of current specifications

Two Means to Achieve "Unwavering Reference Sources"

MCP and Skills serve as means to provide AI with "unwavering reference sources."

MeansRoleExamples
MCPDynamic access to external authoritative sourcesRFC, legislation, W3C standards
SkillsSystematization of domain knowledge and best practicesDesign principles, workflows, coding standards

A note on "Skills" terminology

In this document, "Skills" refers to Markdown-based systematization of domain knowledge, following the format defined by vercel-labs/skills. Unlike OpenAI's "Actions" or LangChain's "Tools," Skills are not executable code — they are structured knowledge and judgment criteria that AI references.

Essential Definition of "Unwavering Reference Sources"

An "unwavering reference source" is a fact retrieved from a verifiable information source, not an LLM's speculation.

Based on this definition, reference sources can be classified into two types:

TypeCharacteristicsExamplesVerification Method
Static ReferenceContent is fixed and immutableRFCs, legislation, W3C specsVersion / section number
Dynamic ReferenceValues change, but are factual at the time of retrievalSensors, APIs, real-time dataTimestamp + data source ID

Both share the property of "not being speculatively generated by an LLM." Dynamic references require separate verification of the data source's authenticity, but retrieving them via MCP ensures clear provenance.

Value of Reference MCP/Skills

  1. AI decisions become verifiable - Can demonstrate the basis for outputs
  2. Consistent quality is supported - Standards-aligned outputs
  3. Vendor lock-in is avoided - Based on open standards
  4. Access to knowledge is democratized - Reach accurate information without being an expert
  5. Domain knowledge becomes reusable - Formalize team know-how as Skills

Democratization of Knowledge

Problems with the Traditional Approach

  • High cost (consultants, specialists, publishing costs)
  • One-way (no feedback path from readers back to experts)
  • Language barriers (translator bottleneck, time lag, dependence on translation quality)
  • Knowledge degradation (accuracy drops in a "telephone game" fashion)

The World MCP/Skills Enables

Development based on primary sources becomes possible without excessive reliance on expensive consultants or specialists, or on abstraction layers (libraries and frameworks).

For how to distinguish between MCP and Skills, see skills/vs-mcp.md.

Three Axes of Knowledge Transformation

Knowledge transformation in AI-driven development is not one-directional. This architecture defines the following three transformation axes.

AxisDirectionPurposeExample
① StructuringHuman → AITransform authoritative sources into AI-accessible formatsRFC → rfcxml-mcp
② ComprehensionAI → HumanTransform complex information into understandable formatsRFC 3161 → Checklist
③ VerificationSpec → TestConvert specifications into verifiable criteriaEPUB 3.3 requirements → JSON test suite

Axis ③ "Verification" differs from ① and ② in that it forms a quality closed loop (the dashed feedback in the diagram). Simply passing specifications to AI does not make its output verifiable. Only by converting specifications into tests, passing AI output through a pass/fail gate, and feeding failures back into regeneration does "driving" actually take effect.

Human → AI (Structuring) Knowledge Transformation

Enable AI to access "unwavering reference sources."

Structuring External Information Sources via MCP

Human KnowledgeStructured FormatAI-Usable Form
Legal texte-Gov APIhourei-mcp
Technical specificationsRFC XMLrfcxml-mcp
Web standardsW3C/WHATWGw3c-mcp
Translation rulesGlossaryDeepL Glossary

Systematizing Domain Knowledge via Skills

Team KnowledgeFormatAI-Usable Form
Design principlesMarkdownfrontend-design skill
Coding standardsMarkdowncoding-standards skill
WorkflowsMarkdowndoc-coauthoring skill

AI → Human (Comprehension Support) Knowledge Transformation

Enable humans to access accurate knowledge even without being specialists.

Complex Information SourceAI ProcessingHuman-Understandable Form
RFC 3161 (135 requirements)Extraction/ClassificationChecklist
Digital Signature Law + RFCMappingCorrespondence table
Technical specificationsVisualizationMermaid diagrams
English RFCsTranslationExplanations in local language

Division of Roles Between Humans and AI

The Responsibility Shift Model

As abstraction rises, the responsibilities of accuracy, reliability, and judgment do not disappear — they shift who holds them.

Responsibility PhaseOwnerScopeVerification Mechanism
Design-timeHumanSelection of reference sources, structural design, defining judgment criteriaSpec-to-Test conversion
Execution-timeAgentReasoning and task execution based on reference sourcesEvaluation pipeline (probabilistic quality gate)
Structural constraintsSystemConsistency of references, access control, audit trailsGuardrails (inviolable constraints)

If these responsibility boundaries remain ambiguous as abstraction increases, a situation arises where "no one is accountable." This architecture aims to make these boundaries explicit at the design level, using two verification layers:

Verification LayerNatureExample CriteriaRole
GuardrailsInviolable (boundary-based)ESLint errors = 0, type checks passDefines "lines that must not be crossed"
Evaluation PipelineProbabilistic (threshold-based)xCOMET >= 0.85, test coverage >= 80%Defines "acceptable ranges"

Test-first thinking remains valuable in AI-driven development

Traditional TDD (Red-Green-Refactor) does not apply directly, but test-first thinking — converting specifications into verifiable forms before implementation — remains fundamentally valuable in AI-driven development. Because AI outputs are probabilistic, defining a verifiable gate before implementation is the starting point of quality design.

Complementary Structure

The strengths of humans and AI are often not symmetric, even when described with similar terms. For example, "quality judgment" and "quality checking" are distinct: humans excel at the former, while AI wins on comprehensiveness in the latter. The table below makes this asymmetry concrete.

TaskHumanAINote
Quality judgment (setting the criteria)Value-laden decision about what "good" means
Quality checking (detecting against criteria)Humans miss things; AI wins on comprehensiveness
Building trust with stakeholdersReading expressions, context, tacit agreements
Comprehensive information gathering (specs, standards)AI does not tire and leaves fewer gaps
Ethical / value-based judgmentSkills + Doctrine can assist but not replace
High-throughput repetitive work (refactor, naming)AI wins on throughput
Creative hypothesis generationAI is good at combinations; the originating question is human
Multilingual translation / summarizationQuality can also be ensured by metrics like xCOMET

With this asymmetry in mind, the placement of "human roles" and "AI roles" in the Mermaid diagram below becomes easier to interpret.

Basic flow of MCP, Skills, and Agent

Here is the fundamental flow that shows how user input flows through the agent core and tool integrations to produce results. For the governance layer (judgment criteria, constraints, and objectives) that governs all these layers, see the Doctrine Layer.

Positioning of This Repository

This repository is a place to organize the design philosophy, architecture, and practical know-how of AI agent architecture (MCP, Skills, and Agent integration), and to document strategies for building "unwavering reference sources" as the foundation of AI-driven development.

What This Document Does Not Guarantee (Non-goals)

To clarify the scope of this document, we state the following explicitly.

What this document claimsWhat this document does NOT guaranteeWhat it provides instead
Providing verifiable reference sources improves AI judgment qualityFactual correctness of all AI outputsSpec-to-Test verification pipeline
MCP/Skills provide structural constraintsElimination of human reviewTwo-layer structure: guardrails (inviolable constraints) + evaluation pipeline (probabilistic quality gate)
The design aims to clarify responsibility allocationThat the system assumes legal or ethical liabilityDesign-time responsibility boundaries + runtime audit trails

Reader contract

This document presents a design philosophy for "how to place AI in a trustworthy environment." It does not promise specific quality levels or safety guarantees. However, rather than leaving outputs unverifiable, it adopts an approach of converting specifications into verifiable tests and ensuring quality through a two-layer system of guardrails and evaluation pipelines. Final judgment and accountability always remain with humans.

Core Messages

  1. AI-driven development is not just code generation - Utilize AI throughout all processes
  2. AI needs guidelines for decision-making - The importance of unwavering reference sources
  3. Systematize human engineering knowledge - Formalize as MCP/Skills
  4. Standards-based MCPs are the foundation - Democratize access to RFC, W3C, legislation, etc.
  5. Share domain knowledge via Skills - Make team know-how reusable
  6. Bidirectional knowledge transformation - Human→AI (structuring), AI→Human (comprehension support)
  7. Explicit judgment criteria - Define constraints, objectives, and judgment criteria via the Doctrine Layer to enable autonomous AI decision-making

Released under the MIT License.