Skip to content

🌐 日本語

Context Rot — Output Quality Degrades as Token Count Increases

NOTE

In short: A phenomenon where LLM output quality deteriorates as the number of input tokens increases. Even with a 200K token capacity, degradation begins around 50K tokens. Because it doesn't produce errors, it's the most insidious structural constraint in LLMs.

What is Context Rot?

Context Rot is a phenomenon where performance degrades as input length increases.

Confirmed across all 18 models—including GPT-4.1 and Claude Opus 4—in Chroma's 2025 research. Critically, this is not a context window overflow. Models with 200K capacity already degrade at 50K tokens. It's difficult to notice because it doesn't manifest as errors.

Three Mechanisms

Context Rot is not a single phenomenon but a compound of three distinct mechanisms.

1. Lost in the Middle (Information Loss in the Middle Ranks)

LLMs direct strong attention to beginning and ending tokens while attention to middle sections drops dramatically (U-curve pattern). Beyond 50%, the U-curve shifts, prioritizing the most recent tokens instead.

→ See Lost in the Middle for details

2. Attention Dilution

The Transformer self-attention mechanism performs O(N²) pairwise computations. When token count increases 10-fold, processing pairs grow 100-fold, causing relative attention to each token to decrease proportionally.

3. Distractor Interference

Unrelated but semantically similar information misleads the model. Structured text is particularly prone to generating incorrect outputs. This is especially severe in coding, where similar function names and import statements cause interference.

Impact on Semantic Understanding

Context Rot becomes most severe in coding tasks. Understanding code requires broad contextual semantic comprehension—tracking variables, grasping dependencies, and recognizing design patterns all depend on context length.

Quantitative Evidence

ModelShort Context AccuracyLong Context AccuracyDegradation
GPT-4.1HighMediumSignificant
Claude Opus 4HighMediumSignificant
All 18 modelsConfirmed across all models

IMPORTANT

Critical insight: The problem is not "LLMs are unintelligent" but "input design is poor."

Mitigation in Claude Code

MitigationMechanismAddresses Mechanism(s)
/compactSummarizes and compresses conversation historyAttention Dilution, Distractor Interference
/clearResets session for fresh contextAll mechanisms
CLAUDE.md 200-line limitMinimizes resident context consumptionAttention Dilution
.claude/rules/Injects rules only when conditions matchDistractor Interference
SkillsLoads specialized knowledge only when neededAttention Dilution, Distractor Interference
AgentsExecute in independent context windowsAll mechanisms (fundamental mitigation)
HooksMechanical verification outside contextUnaffected by Context Rot
MCP Tool SearchLazy-loads tool definitionsAttention Dilution

References

  • Hong, K., Troynikov, A., & Huber, J. (2025). "Context Rot: How Increasing Input Tokens Impacts LLM Performance." Chroma Research. research.trychroma.com — Quantitative measurement of Context Rot across 18 models

Next: Lost in the Middle

Discussion: #6 Context Rot

Released under the CC BY 4.0 License.