Skip to content

🌐 日本語

Authority and LLM Constraints — Why Durable Delegation Is Hard

NOTE

Durably delegating authority to an agent presupposes three capabilities: retaining principles, self-judging scope, and self-reporting deviation. This page shows how the LLM's structural problems erode each of the three — and why the continued dominance of harnesses that repeat per-action permission is not design conservatism but a rational response to structural constraints.

About This Document

The sister site's Permission vs. Authority organized, in design vocabulary (What/How), the difference between what harness-type agents (repeated per-action permission) and doctrine-type agents (durable delegated authority) request at the boundary. This page covers the Why side: why it is hard to hand an LLM durable authority, explained in the vocabulary of the eight structural problems.

TIP

In three lines

  • Delegating authority presupposes a stable meta-judgment: "measuring the scope of one's own principles."
  • Instruction Decay, Sycophancy, and Context Rot erode precisely that presupposition.
  • So the dominance of repeated permission (harness) is not conservatism — it is a rational response to structure.

Premise: Permission vs. Authority

PermissionAuthority
GranularityOne-shot approval of an individual actionDurable grant of discretion over a domain
Basis of safetyExternal walls and approval flowPrinciples internalized by the agent

For the full structure, see the sister site's Permission vs. Authority. Only one point matters here: in the authority model, the basis of safety migrates into the agent's internal state (its retention and operation of principles). So "how much can the LLM's internal state be trusted" directly sets the ceiling on delegability.

The Three Capabilities Durable Authority Requires

For delegated authority to work safely, the delegate must continuously maintain:

  1. Retaining principles — keeping the principles that justify the delegation (objectives, constraints, judgment criteria) at the same weight across the full length of a task
  2. Self-judging scope — accurately asking, at each action, "Is this within the scope of my principles?"
  3. Self-reporting deviation — stopping work and reporting when it recognizes it is out of scope

Delegation works in human organizations because these three capabilities are (imperfectly but) stable. In LLMs, several of the eight problems erode them directly.

1. Retaining Principles × Instruction Decay / Context Rot

Through Instruction Decay, the effective weight of principles given at the start of a conversation decays over a long task. There is no structural guarantee that constraints agreed upon at delegation time still apply with the same strength fifty turns later. Context Rot compounds this: the more the context grows, the worse the precision with which the principles themselves are referenced.

The permission model does not depend on this capability. Because a human outside approves each action, safety is preserved even if the agent's retention of principles collapses. Only the authority model is directly exposed to this decay.

2. Self-Judging Scope × Sycophancy / Priority Saturation

The self-check "Is this within the scope of my principles?" is a judgment that can conflict with what the user wants. Sycophancy skews this judgment systematically toward "if the user wants it, it's probably in scope." What makes this serious is that it is not random error — it is a bias that always tips toward expanding the delegation.

Priority Saturation adds to this: as principles, constraints, and instructions accumulate, the priority over "which criterion applies right now" saturates, destabilizing scope judgment itself.

3. Self-Reporting Deviation × Sycophancy / Hallucination

Self-reporting a deviation means "reporting one's own limits and failures and stopping work" — a head-on collision with Sycophancy's preference for compliance and continuation. Hallucination adds the risk of generating post-hoc justifications that an already-taken action was in scope. A design that relies solely on the agent's self-report to detect deviation places the detector in the most fragile possible location.

Doctrine-Type Failure Arises from Structure, Not Character

In the sister site's framing, the doctrine-type failure mode is "expansive interpretation of authority — deviation." Given the above, this is not a matter of the agent's "character" or insufficient training: it is the necessary consequence of the erosion of the three capabilities.

Eroded capabilityEroding structural problemsResulting failure
Retaining principlesInstruction Decay, Context Rot"Forgets" the constraints set at delegation and deviates
Self-judging scopeSycophancy, Priority SaturationConveniently expands its interpretation of scope
Self-reporting deviationSycophancy, HallucinationKeeps running while deviating, generating justifications

IMPORTANT

Harnesses (repeated permission) dominate not because agents are distrusted, nor because the design philosophy is conservative. It is the rational choice, under structural constraints, of not letting safety depend on the agent's internal state. This is one more instance of this site's consistent pattern: the prescription is justified by the diagnosis.

Mitigations — Re-injection and External Detection That Support Delegation

The erosion of the three capabilities is structural, but it can be mitigated. Every mitigation points in the same direction: reduce dependence on internal state and support it with external devices.

Capability supportedMitigations (Claude Code examples)
Retaining principlesPersistent residency via CLAUDE.md, per-loop instruction re-injection, context cleanup via /compact
Self-judging scopeExplicit scoping of principles (MUST/SHOULD), mechanical definition of "scope" via allowlists
Self-reporting deviationMechanical detection via Hooks, sub-agent quality gates, audit logs

CAUTION

These are mitigations, not solutions. The erosion of the three capabilities cannot be reduced to zero, so the further you move toward the right end of the delegation spectrum (broad authority), the more mitigation layers become mandatory. "We wrote a doctrine, so bypassPermissions is safe" is not a claim the structure supports.

🔗 Going Deeper: Designing Gradual Delegation from Permission to Authority

This page covered why durable authority is hard to hand over (Why). For how to delegate gradually from permission to authority (What/How), see the sister site.


Next: Lifecycle × Config MapPrevious: Harness and LLM Constraints

Released under the CC BY 4.0 License.