Knowledge Boundary — LLMs Cannot Admit What They Don't Know

NOTE

In a nutshell: LLMs cannot accurately identify the limits of their own knowledge. Instead of answering "I don't know" to questions about unfamiliar topics, they generate incorrect responses with high confidence. This "poor calibration" is a direct source of hallucination and creates the most dangerous failure mode in coding agents.

What is Knowledge Boundary?

Knowledge Boundary refers to the line separating what an LLM can answer correctly from what it cannot. The problem is that LLMs themselves cannot accurately recognize this boundary.

Humans can say "I don't know" about unfamiliar topics, but LLMs generate incorrect answers with high confidence about things they don't understand.

Why Can't They Admit Ignorance?

1. The Objective Function: Next-Token Prediction

LLMs are trained to "predict the next token." This objective function offers no reward for outputs that express uncertainty. Responses like "I don't know" are vastly underrepresented in training data, as most human-written text is structured around making claims rather than admitting ignorance.

2. Overconfidence Reinforced by RLHF

During RLHF, human evaluators tend to rate responsive answers higher than refusals. As a result, models are rewarded for "providing answers," and the bias toward "attempting to answer even when uncertain" becomes stronger.

3. Insufficient Training Data for Expressing Uncertainty

The vast majority of training data consists of definitive claims ("X is Y"), with structural scarcity of expressions that explicitly convey uncertainty.

4. Ambiguity in Knowledge Boundaries Across Types

Knowledge can be classified into four layers:

Layer	Category	Description	Risk Level
1	Known-Known	Knowledge the model is certain about	Low
2	Prompt-Dependent Known	Answers correctly or incorrectly depending on how the question is framed	Medium
3	Known-Unknown	Areas where the model recognizes its lack of knowledge	Low
4	Unknown-Unknown	Areas where the model cannot recognize its own ignorance	Highest

The most dangerous category is "unknown-unknown" — domains where the LLM cannot recognize its own knowledge gap.

Impact on Code Generation

Pattern 1: Generating Code from Outdated Knowledge

If an LLM was trained on Angular 16 knowledge but the user needs Angular 18, it confidently generates code using older patterns.

Pattern 2: Calling Nonexistent APIs

Without precise version-to-version knowledge, it generates code that calls APIs or methods that don't exist.

Pattern 3: Pretending to Know Internal Tooling

When asked about methods in proprietary tools, it confidently generates type definitions for methods that don't exist.

Pattern 4: Prompt-Dependent Unstable Knowledge

The same question can be answered correctly or incorrectly depending on how it's phrased. This shows that knowledge is not a binary "know/don't know" state but a continuous confidence level dependent on the prompt.

Mitigation Strategies in Claude Code

Mitigation	Mechanism	Why It Works
MCP (External Knowledge Reference)	Query external trusted sources directly	Extends knowledge boundaries beyond the LLM's internal knowledge
Test Code	Externally verify generated code correctness	Detects outputs that exceed knowledge boundaries through their results
Version Declaration in CLAUDE.md	Explicitly specify library versions	Clarifies "which version's knowledge should be used?"
Agents (Knowledge Isolation)	Delegate to specialized agents	Shrinks knowledge domains, reducing the probability of boundary exceedance
Prompt Design	Instruct "unconfident APIs require verification"	Explicitly permits the LLM to say "I don't know"

Relationship to Other Structural Problems

Knowledge Boundary acts as an entry point for Hallucination:

References

Pan et al. (2025). "Can LLMs Refuse Questions They Do Not Know? Measuring Knowledge-Aware Refusal in Factual Tasks." arXiv:2510.01782 — Quantitative measurement of refusal behavior across 16 models and 5 datasets using the Refusal Index (RI) metric

Previous: Sycophancy

Next: Prompt Sensitivity

Discussion: #11 Knowledge Boundary

Knowledge Boundary — LLMs Cannot Admit What They Don't Know ​

What is Knowledge Boundary? ​

Why Can't They Admit Ignorance? ​

1. The Objective Function: Next-Token Prediction ​

2. Overconfidence Reinforced by RLHF ​

3. Insufficient Training Data for Expressing Uncertainty ​

4. Ambiguity in Knowledge Boundaries Across Types ​

Impact on Code Generation ​

Pattern 1: Generating Code from Outdated Knowledge ​

Pattern 2: Calling Nonexistent APIs ​

Pattern 3: Pretending to Know Internal Tooling ​

Pattern 4: Prompt-Dependent Unstable Knowledge ​

Mitigation Strategies in Claude Code ​

Relationship to Other Structural Problems ​

References ​