Skip to content

🌐 日本語

Prompt Sensitivity — Same Meaning, Different Results

NOTE

In short: LLMs generate significantly different outputs for semantically equivalent prompts. Asking the same question with different wording can result in differences of up to 76 percentage points in accuracy. This is not merely instability, but a structural constraint that reveals shallow model understanding.

What is Prompt Sensitivity?

Prompt Sensitivity is the phenomenon where LLMs produce substantially different outputs even when given semantically identical prompts, if the wording differs.

For example:

  • "Please refactor this function"
  • "Please improve this function"
  • "Please clean up this function"

Although these are semantically nearly equivalent, an LLM may generate different outputs for each.

Why Does It Occur?

Mathematical Explanation

Analysis using Taylor expansion shows that output differences are determined by:

Output Difference ≈ Gradient Norm × Embedding Difference Norm

The crucial point: LLMs do not internally cluster semantically similar inputs. Even with the same meaning, different token sequences generate different embedding vectors, leading to different outputs.

Impact of Surface Form

LLMs respond largely to statistical patterns in tokens rather than meaning. As a result:

  • Imperative vs. interrogative sentences produce different results
  • Bullet points vs. free text produce different results
  • Technical terminology vs. plain language produce different results

Quantitative Evidence

  • Up to 76 percentage points difference between different phrasings of the same question
  • This reflects not "instability" but "training on specific expression patterns"

Impact on Coding

  • Rules written ambiguously in CLAUDE.md are less likely to be followed
  • Vague Skills descriptions lead to failed automatic invocations
  • The quality of generated code varies depending on how users phrase their natural language requests

Mitigation in Claude Code

Mitigation StrategyMechanismWhy It Works
CLAUDE.md writing styleConcrete, imperative language with code examplesEliminates ambiguous expressions, improves compliance rate
Skills description designInclude diverse user natural language expressionsSimilar to SEO principles, improves matching accuracy across varied phrasings
Conditional injection via .claude/rules/Reduces number of simultaneously active instructionsPrevents sensitivity degradation (effect increases with more instructions)
Hooks and testsExternal validation independent of prompt wordingVerifies results regardless of how the prompt is written

Writing Effective CLAUDE.md

markdown
# ❌ Ambiguous (high sensitivity)

- Please write good tests
- I want clean code

# ✅ Concrete (low sensitivity)

- Create Jasmine tests for all public methods
- Place test files in *.spec.ts
- Use describe/it structure in test writing

Writing Effective Skills Descriptions

yaml
# ❌ Ambiguous (auto-invocation often fails)
description: Component-related tasks

# ✅ Concrete (covers diverse expressions)
description: >
  Create new Angular components. Generate scaffolding with OnPush
  change detection, NgRx Store integration, and Jasmine tests.
  Use for requests like "create a component", "add a new screen", etc.

Relationship to Other Structural Problems

Prompt Sensitivity bidirectionally amplifies with other problems.

TIP

Solid arrows (→): Direction in which each problem amplifies Prompt Sensitivity / Dashed arrows (⇢): Feedback loops where Prompt Sensitivity worsens each problem

References

  • Zhuo, J., Zhang, S., Fang, X., Duan, H., Lin, D., & Chen, K. (2024). "Assessing and Understanding the Prompt Sensitivity of LLMs." EMNLP 2024 Findings. ACL Anthology — Mathematical formulation of Prompt Sensitivity using first-order Taylor expansion and Cauchy-Schwarz inequality
  • Lu, S., Schuff, H., & Gurevych, I. (2024). "How are Prompts Different in Terms of Sensitivity?" NAACL 2024. ACL Anthology — Analysis of the mechanism by which minor prompt changes produce large output differences

Previous: Knowledge Boundary

Next: Instruction Decay

Discussion: #12 Prompt Sensitivity

Released under the CC BY 4.0 License.