Are context and prompt the same thing?

No. Context is the full input state the model sees at inference: system instructions, tool definitions, retrieved documents, message history and tool outputs — five components. Prompt is just one slice. Context engineering tunes how all five assemble, ordering, and how much loads on demand — not just the user line.

Why does a longer context often make output worse, not better?

Attention budget is finite — transformer attention scales as n², so the longer the context the thinner each token's slice. Models train mostly on short sequences, so long-range dependencies degrade in extended contexts (attention-budget depletion). Prefix caching does not erase the cost. The goal is the smallest high-signal token set, not maximum length.

Do tool outputs really consume that much of the context?

Yes. Research shows tool outputs can consume 83.9% of context tokens — far more than system prompt and user message combined. The model has to attend to all of it whether relevant or not. That is why you need observation masking (replace verbose output with a compact reference), compaction (kick in at 70-80% utilisation), and selective retention (keep only the fields that matter).

What is progressive disclosure and how do I apply it?

Load on demand: at start-up bring in only skill names + descriptions, fetch full bodies when the task actually needs them. Filesystem-backed agents fit naturally — file size, name and timestamp double as relevance signals. In production it is usually hybrid: pre-load a small stable layer (CLAUDE.md, project rules) and let everything else load lazily.

Where should the system prompt live, and should I split it into sections?

Break it into four blocks: BACKGROUND (project context), INSTRUCTIONS (behaviour rules), TOOL_GUIDANCE (how to use tools), OUTPUT_DESCRIPTION (output format). Use XML tags or Markdown headers to delimit them. Keep the right altitude — too hardcoded is brittle, too abstract leaves no execution signal. Stronger models care less about format, but clean structure still keeps drift down.

Context Fundamentals

⏱️ 30 min

Context Engineering Fundamentals

Context is the complete input state a language model sees at inference time — system instructions, tool definitions, retrieved documents, message history, and tool outputs. Understanding context fundamentals is the prerequisite for context engineering.

Think of context as "everything the model can see right now." The more control you have over its structure and quality, the more stable and reproducible your outputs become.

Context isn't just the prompt — it's the entire input state (system, tools, docs, history, outputs).
Attention budget is finite. Longer context dilutes the signal.
Progressive disclosure uses "load on demand" to keep context lean.
Put critical information at the beginning or end.
The goal is "minimum high-signal token set," not "maximum length."

What You'll Learn

How to break down context into its core components and understand each one's risk profile
Why long context gets "fuzzy" and how structure reduces that risk
How progressive disclosure controls context cost
How templated structure improves stability

When to Activate

Activate this skill when:

Designing new agent systems or modifying existing architectures
Debugging unexpected agent behavior that may relate to context
Optimizing context usage to reduce token costs or improve performance
Onboarding new team members to context engineering concepts
Reviewing context-related design decisions

If you're doing agent system design, troubleshooting "why won't the model follow instructions," or dealing with token cost spikes — this chapter is your starting point.

Core Concepts

Context is composed of multiple components, each with different characteristics and constraints. The attention mechanism provides a limited attention budget, which caps how much context can be effective. Progressive disclosure manages that limit by loading information on demand. The core job of context engineering: filter down to the smallest high-signal token set.

The key insight: more context isn't better. High-signal, small set is the target.

Detailed Topics

The Anatomy of Context

System Prompts System prompts establish the agent's identity, constraints, and behavioral guidelines. They're loaded at session start and typically persist throughout the conversation. They should be clear, direct, and pitched at the right "altitude."

Getting the altitude right means balancing two extremes: over-specifying leads to brittleness and maintenance overhead; being too abstract means the model lacks actionable signal. The sweet spot: specific enough to guide behavior, flexible enough to stay extensible.

Use XML tags or Markdown headers to section them out — split into background, instructions, tool guidance, output description. Stronger models are less format-sensitive, but clear structure still pays off.

Tool Definitions Tool definitions specify what actions the agent can take. Each tool has a name, description, parameters, and return format. They usually sit near the front of the context (around the system prompt).

Tool descriptions directly shape agent behavior. Poor descriptions force the model to guess; good ones include usage context, examples, and defaults. The consolidation principle: if a human engineer can't tell which tool to use, the model definitely won't pick right either.

Retrieved Documents Retrieved documents provide domain knowledge and task-relevant information. RAG's core is runtime retrieval, not loading everything upfront.

The just-in-time approach keeps lightweight references (file paths, stored queries, web links) and loads full content only when needed. This mirrors how humans work: we use index systems to retrieve on demand rather than memorizing entire libraries.

Message History Message history tracks the conversation between user and agent — questions, answers, and reasoning steps. During long tasks, history often becomes the biggest context cost.

Message history is essentially scratchpad memory for tracking progress and state. Manage it poorly and it'll drag down the quality of long-running tasks.

Tool Outputs Tool outputs are the results of agent actions: file contents, search results, command output, API responses, etc. Research shows tool outputs can account for 83.9% of context tokens.

Whether relevant or not, tool outputs consume context. That's what drives the need for observation masking, compaction, and selective retention strategies.

Context Windows and Attention Mechanics

The Attention Budget Constraint LLM attention processes n-squared token relationships. Longer context means more relationships and thinner attention budget per relationship.

Models see shorter sequences more often during training, so they're weaker at modeling long-range dependencies in very long contexts — that's "attention budget depletion."

Position Encoding and Context Extension Position encoding interpolation lets models handle longer sequences, but at the cost of positional understanding precision. Even with bigger context windows, long-range retrieval and reasoning capability still degrades.

The Progressive Disclosure Principle Progressive disclosure means: only load information when it's needed. At startup, load just skill names and descriptions. When a task requires it, load the full content. This applies at multiple levels — skill selection, document loading, tool output retrieval.

Context Quality Versus Context Quantity

The assumption that "bigger context solves memory problems" has been empirically disproven. Context engineering's goal is finding the minimum high-signal token set.

As context grows, cost rises exponentially and model performance drops — even when the context window technically allows more tokens. Prefix caching doesn't eliminate long-input costs either.

The core principle is informativity over exhaustiveness: only include what's needed for the decision at hand. Fetch everything else on demand.

Context as Finite Resource

Context is a finite resource with diminishing returns. Every additional token consumes attention budget. The engineering problem: maximize utility within fixed limits.

Context engineering isn't one-time prompt writing — it's an ongoing context management process.

Practical Guidance

File-System-Based Access

Agents with filesystem access can naturally use progressive disclosure. Store materials in the file system and read them when needed, rather than stuffing everything into context.

The file system itself provides structural cues: file size, naming conventions, and timestamps all serve as proxy signals for relevance.

Hybrid Strategies

Best practices are usually hybrid: preload a small amount of stable context (like CLAUDE.md, project rules), then explore other information on demand. Where the boundary sits depends on task nature and dynamism.

Context Budgeting

Design with an explicit context budget. Monitor token usage, set compaction triggers. Assume context will degrade — don't hope it won't.

Watch attention distribution: the middle of context is most likely to be ignored. Put critical information at the beginning and end.

Minimal Context Template

Here's a minimal context template you can use directly:

<SYSTEM>
Role, constraints, output format
</SYSTEM>

<TASK>
Objective, success criteria, constraints
</TASK>

<TOOLS>
Tool list + usage notes
</TOOLS>

<FACTS>
Structured facts / IDs / sources
</FACTS>

<HISTORY>
Short summary of relevant turns
</HISTORY>

Examples

Example 1: Organizing System Prompts

<BACKGROUND_INFORMATION>
You are a Python expert helping a development team.
Current project: Data processing pipeline in Python 3.9+
</BACKGROUND_INFORMATION>

<INSTRUCTIONS>
- Write clean, idiomatic Python code
- Include type hints for function signatures
- Add docstrings for public functions
- Follow PEP 8 style guidelines
</INSTRUCTIONS>

<TOOL_GUIDANCE>
Use bash for shell operations, python for code tasks.
File operations should use pathlib for cross-platform compatibility.
</TOOL_GUIDANCE>

<OUTPUT_DESCRIPTION>
Provide code blocks with syntax highlighting.
Explain non-obvious decisions in comments.
</OUTPUT_DESCRIPTION>

Example 2: Progressive Document Loading

# Instead of loading all documentation at once:

# Step 1: Load summary

docs/api_summary.md # Lightweight overview

# Step 2: Load specific section as needed

docs/api/endpoints.md # Only when API calls needed
docs/api/authentication.md # Only when auth context needed

Guidelines

Treat Context as a finite resource with diminishing returns
Place critical info at attention-favored positions (beginning/end)
Use progressive disclosure to defer loading until needed
Organize system prompts with clear section boundaries
Monitor Context usage during development
Implement compaction triggers at 70-80% utilization
Design for Context degradation rather than hoping to avoid it
Prefer smaller high-signal Context over larger low-signal Context

Practice Task

Pick an agent/project you're working on and split its context into 5 blocks
Write a "minimal context" version using the template above
Compare against the original and flag low-signal content that can be removed

Integration

This skill provides foundational context that all other skills build upon. It should be studied first before exploring:

context-degradation - Understanding how Context fails
context-optimization - Techniques for extending Context capacity
multi-agent-patterns - How Context isolation enables multi-agent systems
tool-design - How tool definitions interact with Context

References

Related skills in this collection:

context-degradation - Understanding Context failure patterns
context-optimization - Techniques for efficient Context use

External resources:

Research on transformer attention mechanisms
Production engineering guides from leading AI labs
Framework documentation on Context window management

Skill Metadata

Created: 2025-12-20 Last Updated: 2025-12-20 Author: Agent Skills for Context Engineering Contributors Version: 1.0.0

📚 相关资源

❓ 常见问题

关于本章主题最常被搜索的问题，点击展开答案

Context 和 Prompt 是一个东西吗？

不是。Context 是模型推理时能看到的完整输入状态：system instructions、tool definitions、retrieved documents、message history、tool outputs 五块。Prompt 只是其中一小块。Context Engineering 调的是这 5 块怎么组装、谁放头谁放尾、按需加载多少，不只是 user 那一行字。

为什么 context 越长效果反而越差？

Attention budget 有限——transformer 注意力是 n² 量级，context 越长每个 token 分到的注意力越薄。模型训练时短序列见得多，超长 context 下远距离依赖建模能力会下降，叫 attention budget depletion。Prefix caching 也消除不了长输入成本。目标是「最小高信号 token 集合」，不是「最大长度」。

tool outputs 真的会吃掉这么多 context 吗？

会。研究数据显示 tool outputs 可占 context tokens 的 83.9%，远超 system prompt 和 user message。无论相关与否，模型都得 attend。这就是为什么需要 observation masking（用紧凑 reference 替换冗长 output）、compaction（达 70-80% 时压缩）、selective retention（只留关键字段）。

progressive disclosure 是什么？怎么用？

按需加载：启动只加载 skill names + descriptions，任务真正需要时再加载全文。filesystem-based 的 agent 天然适合——把资料放文件系统里，文件大小、命名、时间戳本身就是 relevance 信号。生产里通常 hybrid：少量稳定 context（如 CLAUDE.md / 项目规则）预加载 + 其它按需探索。

system prompt 放哪？要拆几块？

拆成 4 块：BACKGROUND（项目上下文）、INSTRUCTIONS（行为规则）、TOOL_GUIDANCE（工具用法）、OUTPUT_DESCRIPTION（输出格式）。用 XML 标签或 Markdown headers 分区。「高度」要平衡：太硬编码脆弱难维护，太抽象缺执行信号。模型越强，格式影响越小，但结构清晰仍能防漂移。