Physical AI — Extending the Three-Layer Architecture to the Edge

The Agent / Skills / MCP three-layer model established in the cloud maintains structural integrity when deployed to edge devices and robotics.

INFO

Does the Agent / Skills / MCP three-layer model that works in cloud environments also hold up in edge devices operating in the physical world?

The answer is "yes" — and moreover, it holds up without changing the structure.

Target audience: Engineers who want to understand AI agent architecture beyond the boundaries of software. Also useful for teams designing interfaces with edge AI, IoT, and robotics.

Position of This Page

01-vision (WHY — Why we need authoritative references)
→ 02-reference-sources (WHAT — What to use as references)
→ 03-architecture (HOW — How to structure the system)
→ 04-ai-design-patterns (WHICH — Which pattern to choose and when)
→ 05-solving-ai-limitations (REALITY — How to face real-world constraints)
→ This page (EXTENSION — Extending the three-layer model to the physical world)

Meta Information


What this chapter establishes	Structural consistency of the three-layer model's edge extension, cloud↔edge symmetry
What this chapter does NOT cover	Robotics control details (Motion Planner and below), specific hardware implementation guides
Dependencies	03-architecture (three-layer model), 07-doctrine-and-intent (Doctrine Layer)
Common misuse	Treating BitNet as the only edge inference technology. This chapter's claim is "the structure doesn't change" — not dependence on a specific technology

Position Within the Document Series

What Is Physical AI?

Physical AI refers to the technology domain where AI perceives the physical world, makes decisions, and directly acts upon it. Autonomous driving, industrial robots, drones, and humanoids are typical application areas.

Relationship with Embodied AI

In academic contexts, the term Embodied AI is widely used. Embodied AI focuses on "learning through physical embodiment and interaction with the environment," and Physical AI can be positioned as one of its implementation forms. This document uses the term "Physical AI" to align with the context of extending the three-layer architecture to the edge.

The Importance of World Models

What fundamentally distinguishes Physical AI from information-space AI is the need for a World Model — an internal representation of physical world laws. Without understanding gravity, friction, collision, and inertia, a robot cannot operate safely.

Information-space AI: Text/data processing → Physical laws are irrelevant
Physical AI:          Real-world action → Understanding gravity, friction, collision is essential

The World Model functions as part of the domain knowledge embedded in the Skills layer, providing physical plausibility to Agent layer decisions.

Traditionally, Physical AI has been discussed as a separate world from software AI. However, the following technological advances are dissolving this boundary:

BitNet (1.58-bit quantized LLM) — Enables large language model inference on edge devices
MCP (Model Context Protocol) — Standardizes tool connectivity
Edge computing evolution — Improves real-time processing capabilities on devices

BitNet b1.58 — Making Edge Inference a Reality

The key to establishing the Agent layer for Physical AI is BitNet b1.58, published by Microsoft Research.

Why "1.58-bit"?

Conventional LLMs store weights as 16-bit (FP16) or 32-bit (FP32) floating-point values. BitNet b1.58 compresses these to an extreme — only three values: {-1, 0, 1}. The number "1.58" derives from the information required to encode three equiprobable values: log₂(3) ≈ 1.58.

Conventional LLM: weights = arbitrary floating-point values (FP16: 65,536 possibilities)
BitNet b1.58:     weights = only three values {-1, 0, 1}

How It Differs from Conventional Quantization

Existing methods such as GPTQ, AWQ, and QLoRA compress pre-trained models after the fact. A trade-off between precision and compression ratio is unavoidable. In contrast, BitNet b1.58 replaces the Transformer's linear layers with BitLinear and trains from scratch at 1.58-bit. This structurally avoids the quality degradation inherent in post-hoc compression.

Existing quantization: Train (FP16) → Post-compress → Quality degradation is unavoidable
BitNet b1.58:          Train at 1.58-bit from scratch → Structurally optimized

Concrete Performance

The 70B-parameter BitNet b1.58 model shows the following results compared to LLaMA (FP16) of equivalent scale:

Metric	BitNet b1.58 vs LLaMA (FP16)
Inference speed	4.1x faster
Batch capacity	11x
Throughput	8.9x
Matrix operation energy efficiency	71.4x
ARM CPU speedup	1.37–5.07x
x86 CPU speedup	2.37–6.17x
x86 CPU energy reduction	71.9–82.2%

Notably, a 100B-parameter model runs on a single CPU, achieving processing speeds equivalent to human reading speed (5–7 tokens per second).

No Special Hardware Required

The greatest significance of BitNet b1.58 is that it requires no special hardware.

The inference framework BitNet.cpp is built on llama.cpp and has been verified on the following architectures:

Architecture	Verified Hardware Examples	Use Case
x86-64 (AVX2)	Intel i7-13800H (laptop), AMD EPYC	Desktop / Server
ARM (NEON)	Apple M2, Cobalt 100	Laptop / Tablet
ARM (DOTPROD)	ARM v8.2 and later	Mobile / Edge devices

Running an LLM at practical speeds on a laptop CPU — this means "edge AI" is no longer confined to research labs. It can be started on the hardware you already have.

The latest parallel kernel optimizations (January 2026) introduced configurable tiling, achieving an additional 1.15–2.1x speedup. Embedding layer quantization (Q6_K format) is also supported, improving memory usage and inference speed while nearly maintaining accuracy.

GPU's Role Changes but Doesn't Disappear

BitNet b1.58 drastically reduces GPU dependency for inference, but GPU is still required for model training. The accurate understanding is not "GPU becomes unnecessary" but rather "inference becomes practical on CPU". Additionally, being built on llama.cpp makes integration with existing inference pipelines straightforward.

Current Limitations of BitNet b1.58

BitNet is a promising technology, but the following constraints should be recognized:

Limited pre-trained model selection — Unlike FP16 models, there is no abundant ecosystem of pre-trained BitNet models
Text generation quality — FP16 models remain superior for high-precision natural language generation tasks
Ecosystem maturity — Toolchains and community support are still developing
Fine-tuning methods not established — Domain adaptation techniques for 1.58-bit models are still in the research stage

While BitNet has sufficient precision for Physical AI control tasks (discrete decisions, binary classification), it is not a universal solution. Identifying the right application domain is critical.

Other Edge Inference Approaches Beyond BitNet

This document highlights BitNet b1.58 as a representative example, but other technologies also enable edge inference.

Approach	Characteristics	Maturity
GGUF Quantization (llama.cpp)	Post-training quantization (Q4_K_M / Q5_K_M etc.). Largest model selection	High
Apple MLX	Inference framework optimized for Apple Silicon	Medium–High
TinyLlama / Phi-3-mini	Small-by-design models. Can run on edge without quantization	Medium
MediaPipe LLM Inference	Google's mobile / edge inference API	Medium

The three-layer model's structural claim (separation of responsibilities holds at the edge) remains valid regardless of what powers the Agent layer's inference engine. BitNet stands out among these for being "structurally optimized from scratch rather than post-compressed," aligning well with this document's design philosophy.

Affinity with Physical AI

This efficiency holds particular significance in the context of Physical AI. Robots and drones are fundamentally battery-powered, and 71.4x energy efficiency means not just "it can run on the edge" but "it can operate autonomously on battery for extended periods."

Furthermore, physical world control tasks often don't require the same precision as language generation:

Text generation     : Expressing subtle nuances → High precision required
Code generation     : Syntactic accuracy → High precision required
──────────────────────────────────────────
Robot control       : "Rotate 30 degrees right" → Discrete decisions suffice
Anomaly detection   : "Normal / Abnormal" → Close to binary classification
Route selection     : "A / B / or C" → Limited choices

1.58-bit weight precision may be insufficient for text generation, but it is fully practical for physical control. This is what makes the combination of quantized models and Physical AI viable.

Three-Layer Mapping: Cloud and Edge

Structural Symmetry

The three-layer model established in the cloud maps directly to the edge:

Layer Correspondence

Layer	Cloud	Edge / Physical
Agent	LLMs such as Claude, GPT	Local inference via quantized LLM (BitNet etc.)
Skills	Markdown documents, guidelines	Embedded domain knowledge, safety standards, physical parameters
MCP	Web API, DB, external services	Sensor input, actuator control, physical device I/O

Why It Holds Up "Without Changing the Structure"

The essence of the three-layer model is not technical implementation but separation of responsibilities:

Agent  = "Decide what should be done"
Skills = "Hold the knowledge needed for decisions"
MCP    = "Connect to the outside world and execute"

This separation of responsibilities doesn't change whether the connection target is a Web API or a sensor. What changes is each layer's implementation, not its structure.

Implementation Differences

Aspect	Cloud	Edge
Inference model	Full-size LLM (tens to hundreds of B parameters)	Quantized model (1.58-bit, several B or less)
Knowledge storage	File system / API	Embedded ROM / local storage
Tool connectivity	HTTP / JSON-RPC	GPIO / CAN / serial communication
Latency requirements	Seconds to minutes	Milliseconds to seconds
Connectivity	Always-connected assumed	Offline operation is essential
Energy constraints	Data center power	Battery-powered (efficiency is a survival condition)

Edge AI-Specific Latency Design Considerations

Edge environments require consideration of latency characteristics that differ fundamentally from the cloud.

Type	Target Range	Design Impact
Inference latency	10–100ms	Depends on model size and hardware. Reducible with quantized models
Sensor fusion	1–10ms	Synchronization timing across multiple sensors directly affects decision accuracy
Control loop	1–10ms	Real-time requirements for PID control etc. May require an RTOS (Real-Time OS)
Network round-trip	50–500ms	Round-trip to cloud. Unusable for emergency decisions, necessitating decision distribution design
Degraded mode transition	Immediate	Switching to fallback behavior upon communication loss or sensor failure

These latency constraints are the rationale behind the "decision distribution" collaboration pattern (real-time decisions at edge, advanced analysis in cloud).

The Indispensability of the Doctrine Layer

For AI operating autonomously in the physical world, the importance of the Doctrine Layer is even higher than in the cloud.

Cloud: Decision error → Data misprocessing, degraded user experience
Edge:  Decision error → Physical accidents, potential human casualties

Physical AI doctrine includes elements absent from software:

Safety Constraints — Hard limits to prevent physical harm
Fail-safe — Retreat behavior during communication loss or anomalies
Ethical Constraints — Inviolable rules that prioritize human safety above all
Real-time Constraints — Acceptable limits for decision latency

Irreversibility — Autonomous Decisions in the Physical World

In the software world, decision errors can be "undone," but in the physical world, they can produce irreversible consequences. The most important principle in Physical AI design is recognizing this irreversibility.

Latency constraints: Emergency stop decisions for robots must complete within 100ms. Round-trips to the cloud are unacceptable — immediate edge-based decisions are mandatory
Safety margins: Because the cost of decision errors is orders of magnitude higher, Doctrine layer constraints function as safeguards
Mandatory fail-safe: Retreat behavior during communication loss or sensor anomalies is not optional — it is a design requirement

The nature of decisions doesn't change — what changes is the severity of consequences and latency constraints. This is the essential design challenge of Physical AI.

Correspondence with the OODA Cycle

Physical AI is the most intuitive implementation of the OODA cycle:

OODA Phase	Three-Layer Correspondence	Physical AI Examples
Observe	MCP Layer (input)	Data acquisition from cameras, LiDAR, temperature sensors
Orient	Skills Layer + Doctrine	Referencing safety standards, classifying situations, determining priorities
Decide	Agent Layer	Selecting actions such as "stop," "evade," or "continue"
Act	MCP Layer (output)	Motor control, alert notification, communication transmission

From Agent to Robot — The Control Flow

Between the Agent layer's decision and the robot's physical action, the signal passes through multiple control layers. The three-layer model does not replace the control system — it provides decision and knowledge layers above it.

Control Layer — Full Hierarchy from Agent to Robot

Layer	Responsibility	This Document's Scope
Agent	High-level intent determination ("Move to Shelf A")	✅ Three-layer model's Agent layer
Task Planner	Decompose intent into subtasks ("Split path into 3 stages")	⚠️ Boundary area (uses Skills layer knowledge)
Motion Planner	Path planning and collision avoidance in physical space	❌ Conventional robotics domain
Controller	Real-time control such as PID and torque control	❌ Conventional robotics domain
Robot	Actuator drive and physical action	❌ Hardware domain

This document's scope primarily covers the Agent layer (and its boundary with the Task Planner). Motion Planner and below are handled by conventional robotics engineering — the three-layer model provides decision-making and knowledge above them, not as a replacement.

Cloud × Edge Collaboration Patterns

In actual Physical AI systems, edge devices do not operate in isolation — collaboration with the cloud is assumed:

Collaboration Patterns

Pattern	Description	Example
Knowledge Sync	Reflect cloud Skills to the edge	Safety standard updates, distributing new operating parameters
Decision Distribution	Real-time decisions at edge, advanced analysis in cloud	Emergency stop is local, route optimization is cloud
Status Reporting	Aggregate edge sensor data to the cloud	Anomaly detection log transmission, remote monitoring

Multi-Agent Systems and A2A

In Physical AI deployments, multi-agent configurations where multiple robots or drones coordinate on tasks are becoming commonplace. Agent-to-Agent (A2A) protocols for inter-agent communication are being put into practice in warehouse robot swarm control, drone formation flight, and similar applications.

However, in the physical world, inter-agent communication is not always guaranteed. Radio interference, distance limitations, and jamming can cause communication blackouts. Whether each agent can act safely and independently during these periods becomes the most critical design challenge.

Communication available: Agent A → Agent B "Avoid Shelf 3" → Direct coordination
Communication lost:      Each Agent decides independently based on shared doctrine
                         → "Collision avoidance is top priority" "Stop for unknown obstacles"
                           "Retry connection after 30 seconds"

Here, the role of doctrine elevates from "constraints on individual agents" to "the foundation for distributed consensus." Even without communication, if all agents share the same doctrine, each agent's behavior becomes mutually predictable. This is the very archetype of military doctrine — the same structure as a unit that has lost contact with its commander acting autonomously according to pre-shared principles of action (see 07-doctrine-and-intent).

Digital Twins and MCP

By exposing a physical device's digital twin (virtual replica) as an MCP server, cloud-side Agents can reference and control the physical device's state. This is a natural extension of MCP's concept of "standardized external tool connectivity" and further blurs the boundary between the physical world and software.

What This Means for Software Engineers

Physical AI is not "a different world." With an understanding of the three-layer model, software engineers can participate in architecture design:

The MCP server you're designing right now —
its connection target just changed from a Web API to a sensor.

The Skill domain knowledge you're writing right now —
it just changed from translation guidelines to safety standards.

The Agent decision logic you're configuring right now —
it just changed from text processing to motor control.

The three-layer structure is the same. Only the implementation details change.

Summary

Physical AI is not a "special case" of the three-layer architecture — it is its most direct extension.

Aspect	Core Message
Structural consistency	The separation of responsibilities across Agent / Skills / MCP holds in the physical world as-is
Edge inference realized	BitNet b1.58 has made local inference on edge devices practical (71.4x energy efficiency vs LLaMA)
Addressing irreversibility	The importance of the Doctrine layer is most pronounced in the physical world, where consequences can be irreversible
Design framework	The OODA cycle functions naturally as a design framework for Physical AI
Essential difference	The nature of decisions doesn't change — what changes is the severity of consequences and latency constraints

Architecture learned in the cloud becomes a bridge to the physical world.
Those who understand the structure can adapt regardless of the deployment target.

References

BitNet.cpp — Microsoft Research — Official 1-bit LLM inference framework repository
BitNet CPU Inference Optimization — Technical details on parallel kernel implementations, supported architectures, and benchmark results
BitNet: A 1-bit Model Solving LLM Challenges — Background on BitNet and comparison with existing quantization methods
The Mechanism and Significance of BitNet b1.58 — Naming rationale for b1.58, performance data, and edge device applicability

03-architecture — HOW: Three-layer model structure definition (the foundation for this page's edge extension)
04-ai-design-patterns — WHICH: Pattern selection guidelines (relevant to pattern application in edge environments)
05-solving-ai-limitations — REALITY: AI constraints and countermeasures (latency and safety constraints are even stricter in the physical world)
07-doctrine-and-intent — DOCTRINE: Doctrine and intent design (essential for autonomous decisions in the physical world)

Physical AI — Extending the Three-Layer Architecture to the Edge ​

Position Within the Document Series ​

What Is Physical AI? ​

The Importance of World Models ​

BitNet b1.58 — Making Edge Inference a Reality ​

Why "1.58-bit"? ​

How It Differs from Conventional Quantization ​

Concrete Performance ​

No Special Hardware Required ​

Affinity with Physical AI ​

Three-Layer Mapping: Cloud and Edge ​

Structural Symmetry ​

Layer Correspondence ​

Why It Holds Up "Without Changing the Structure" ​

Implementation Differences ​

The Indispensability of the Doctrine Layer ​

Correspondence with the OODA Cycle ​

From Agent to Robot — The Control Flow ​

Cloud × Edge Collaboration Patterns ​

Collaboration Patterns ​

Multi-Agent Systems and A2A ​

Digital Twins and MCP ​

What This Means for Software Engineers ​

Summary ​

References ​

Related Documents ​