🌐 日本語
Authority and LLM Constraints — Why Durable Delegation Is Hard
NOTE
Durably delegating authority to an agent presupposes three capabilities: retaining principles, self-judging scope, and self-reporting deviation. This page shows how the LLM's structural problems erode each of the three — and why the continued dominance of harnesses that repeat per-action permission is not design conservatism but a rational response to structural constraints.
About This Document
The sister site's Permission vs. Authority organized, in design vocabulary (What/How), the difference between what harness-type agents (repeated per-action permission) and doctrine-type agents (durable delegated authority) request at the boundary. This page covers the Why side: why it is hard to hand an LLM durable authority, explained in the vocabulary of the eight structural problems.
TIP
In three lines
- Delegating authority presupposes a stable meta-judgment: "measuring the scope of one's own principles."
- Instruction Decay, Sycophancy, and Context Rot erode precisely that presupposition.
- So the dominance of repeated permission (harness) is not conservatism — it is a rational response to structure.
Premise: Permission vs. Authority
| Permission | Authority | |
|---|---|---|
| Granularity | One-shot approval of an individual action | Durable grant of discretion over a domain |
| Basis of safety | External walls and approval flow | Principles internalized by the agent |
For the full structure, see the sister site's Permission vs. Authority. Only one point matters here: in the authority model, the basis of safety migrates into the agent's internal state (its retention and operation of principles). So "how much can the LLM's internal state be trusted" directly sets the ceiling on delegability.
The Three Capabilities Durable Authority Requires
For delegated authority to work safely, the delegate must continuously maintain:
- Retaining principles — keeping the principles that justify the delegation (objectives, constraints, judgment criteria) at the same weight across the full length of a task
- Self-judging scope — accurately asking, at each action, "Is this within the scope of my principles?"
- Self-reporting deviation — stopping work and reporting when it recognizes it is out of scope
Delegation works in human organizations because these three capabilities are (imperfectly but) stable. In LLMs, several of the eight problems erode them directly.
1. Retaining Principles × Instruction Decay / Context Rot
Through Instruction Decay, the effective weight of principles given at the start of a conversation decays over a long task. There is no structural guarantee that constraints agreed upon at delegation time still apply with the same strength fifty turns later. Context Rot compounds this: the more the context grows, the worse the precision with which the principles themselves are referenced.
The permission model does not depend on this capability. Because a human outside approves each action, safety is preserved even if the agent's retention of principles collapses. Only the authority model is directly exposed to this decay.
2. Self-Judging Scope × Sycophancy / Priority Saturation
The self-check "Is this within the scope of my principles?" is a judgment that can conflict with what the user wants. Sycophancy skews this judgment systematically toward "if the user wants it, it's probably in scope." What makes this serious is that it is not random error — it is a bias that always tips toward expanding the delegation.
Priority Saturation adds to this: as principles, constraints, and instructions accumulate, the priority over "which criterion applies right now" saturates, destabilizing scope judgment itself.
3. Self-Reporting Deviation × Sycophancy / Hallucination
Self-reporting a deviation means "reporting one's own limits and failures and stopping work" — a head-on collision with Sycophancy's preference for compliance and continuation. Hallucination adds the risk of generating post-hoc justifications that an already-taken action was in scope. A design that relies solely on the agent's self-report to detect deviation places the detector in the most fragile possible location.
Doctrine-Type Failure Arises from Structure, Not Character
In the sister site's framing, the doctrine-type failure mode is "expansive interpretation of authority — deviation." Given the above, this is not a matter of the agent's "character" or insufficient training: it is the necessary consequence of the erosion of the three capabilities.
| Eroded capability | Eroding structural problems | Resulting failure |
|---|---|---|
| Retaining principles | Instruction Decay, Context Rot | "Forgets" the constraints set at delegation and deviates |
| Self-judging scope | Sycophancy, Priority Saturation | Conveniently expands its interpretation of scope |
| Self-reporting deviation | Sycophancy, Hallucination | Keeps running while deviating, generating justifications |
IMPORTANT
Harnesses (repeated permission) dominate not because agents are distrusted, nor because the design philosophy is conservative. It is the rational choice, under structural constraints, of not letting safety depend on the agent's internal state. This is one more instance of this site's consistent pattern: the prescription is justified by the diagnosis.
Mitigations — Re-injection and External Detection That Support Delegation
The erosion of the three capabilities is structural, but it can be mitigated. Every mitigation points in the same direction: reduce dependence on internal state and support it with external devices.
| Capability supported | Mitigations (Claude Code examples) |
|---|---|
| Retaining principles | Persistent residency via CLAUDE.md, per-loop instruction re-injection, context cleanup via /compact |
| Self-judging scope | Explicit scoping of principles (MUST/SHOULD), mechanical definition of "scope" via allowlists |
| Self-reporting deviation | Mechanical detection via Hooks, sub-agent quality gates, audit logs |
CAUTION
These are mitigations, not solutions. The erosion of the three capabilities cannot be reduced to zero, so the further you move toward the right end of the delegation spectrum (broad authority), the more mitigation layers become mandatory. "We wrote a doctrine, so bypassPermissions is safe" is not a claim the structure supports.
Related Pages
- Harness and LLM Constraints — the four harness elements mapped to the eight problems (companion to this page)
- Problems × Countermeasures Map — the eight problems × Claude Code features
- Part 1: Structural Problems — overview of the eight problems
- Instruction Decay / Sycophancy / Context Rot — details of the main eroding problems
🔗 Going Deeper: Designing Gradual Delegation from Permission to Authority
This page covered why durable authority is hard to hand over (Why). For how to delegate gradually from permission to authority (What/How), see the sister site.
- ai-agent-architecture / Permission vs. Authority — the locus of boundary detection, the symmetry of failure modes, and the delegation spectrum implemented by Claude Code's permission modes
Next: Lifecycle × Config MapPrevious: Harness and LLM Constraints