Context Fundamentals
Context Engineering Fundamentals
Context is the complete input state a language model sees at inference time — system instructions, tool definitions, retrieved documents, message history, and tool outputs. Understanding context fundamentals is the prerequisite for context engineering.
Think of context as "everything the model can see right now." The more control you have over its structure and quality, the more stable and reproducible your outputs become.
- Context isn't just the prompt — it's the entire input state (system, tools, docs, history, outputs).
- Attention budget is finite. Longer context dilutes the signal.
- Progressive disclosure uses "load on demand" to keep context lean.
- Put critical information at the beginning or end.
- The goal is "minimum high-signal token set," not "maximum length."
What You'll Learn
- How to break down context into its core components and understand each one's risk profile
- Why long context gets "fuzzy" and how structure reduces that risk
- How progressive disclosure controls context cost
- How templated structure improves stability
When to Activate
Activate this skill when:
- Designing new agent systems or modifying existing architectures
- Debugging unexpected agent behavior that may relate to context
- Optimizing context usage to reduce token costs or improve performance
- Onboarding new team members to context engineering concepts
- Reviewing context-related design decisions
If you're doing agent system design, troubleshooting "why won't the model follow instructions," or dealing with token cost spikes — this chapter is your starting point.
Core Concepts
Context is composed of multiple components, each with different characteristics and constraints. The attention mechanism provides a limited attention budget, which caps how much context can be effective. Progressive disclosure manages that limit by loading information on demand. The core job of context engineering: filter down to the smallest high-signal token set.
The key insight: more context isn't better. High-signal, small set is the target.
Detailed Topics
The Anatomy of Context
System Prompts System prompts establish the agent's identity, constraints, and behavioral guidelines. They're loaded at session start and typically persist throughout the conversation. They should be clear, direct, and pitched at the right "altitude."
Getting the altitude right means balancing two extremes: over-specifying leads to brittleness and maintenance overhead; being too abstract means the model lacks actionable signal. The sweet spot: specific enough to guide behavior, flexible enough to stay extensible.
Use XML tags or Markdown headers to section them out — split into background, instructions, tool guidance, output description. Stronger models are less format-sensitive, but clear structure still pays off.
Tool Definitions Tool definitions specify what actions the agent can take. Each tool has a name, description, parameters, and return format. They usually sit near the front of the context (around the system prompt).
Tool descriptions directly shape agent behavior. Poor descriptions force the model to guess; good ones include usage context, examples, and defaults. The consolidation principle: if a human engineer can't tell which tool to use, the model definitely won't pick right either.
Retrieved Documents Retrieved documents provide domain knowledge and task-relevant information. RAG's core is runtime retrieval, not loading everything upfront.
The just-in-time approach keeps lightweight references (file paths, stored queries, web links) and loads full content only when needed. This mirrors how humans work: we use index systems to retrieve on demand rather than memorizing entire libraries.
Message History Message history tracks the conversation between user and agent — questions, answers, and reasoning steps. During long tasks, history often becomes the biggest context cost.
Message history is essentially scratchpad memory for tracking progress and state. Manage it poorly and it'll drag down the quality of long-running tasks.
Tool Outputs Tool outputs are the results of agent actions: file contents, search results, command output, API responses, etc. Research shows tool outputs can account for 83.9% of context tokens.
Whether relevant or not, tool outputs consume context. That's what drives the need for observation masking, compaction, and selective retention strategies.
Context Windows and Attention Mechanics
The Attention Budget Constraint LLM attention processes n-squared token relationships. Longer context means more relationships and thinner attention budget per relationship.
Models see shorter sequences more often during training, so they're weaker at modeling long-range dependencies in very long contexts — that's "attention budget depletion."
Position Encoding and Context Extension Position encoding interpolation lets models handle longer sequences, but at the cost of positional understanding precision. Even with bigger context windows, long-range retrieval and reasoning capability still degrades.
The Progressive Disclosure Principle Progressive disclosure means: only load information when it's needed. At startup, load just skill names and descriptions. When a task requires it, load the full content. This applies at multiple levels — skill selection, document loading, tool output retrieval.
Context Quality Versus Context Quantity
The assumption that "bigger context solves memory problems" has been empirically disproven. Context engineering's goal is finding the minimum high-signal token set.
As context grows, cost rises exponentially and model performance drops — even when the context window technically allows more tokens. Prefix caching doesn't eliminate long-input costs either.
The core principle is informativity over exhaustiveness: only include what's needed for the decision at hand. Fetch everything else on demand.
Context as Finite Resource
Context is a finite resource with diminishing returns. Every additional token consumes attention budget. The engineering problem: maximize utility within fixed limits.
Context engineering isn't one-time prompt writing — it's an ongoing context management process.
Practical Guidance
File-System-Based Access
Agents with filesystem access can naturally use progressive disclosure. Store materials in the file system and read them when needed, rather than stuffing everything into context.
The file system itself provides structural cues: file size, naming conventions, and timestamps all serve as proxy signals for relevance.
Hybrid Strategies
Best practices are usually hybrid: preload a small amount of stable context (like CLAUDE.md, project rules), then explore other information on demand. Where the boundary sits depends on task nature and dynamism.
Context Budgeting
Design with an explicit context budget. Monitor token usage, set compaction triggers. Assume context will degrade — don't hope it won't.
Watch attention distribution: the middle of context is most likely to be ignored. Put critical information at the beginning and end.
Minimal Context Template
Here's a minimal context template you can use directly:
<SYSTEM>
Role, constraints, output format
</SYSTEM>
<TASK>
Objective, success criteria, constraints
</TASK>
<TOOLS>
Tool list + usage notes
</TOOLS>
<FACTS>
Structured facts / IDs / sources
</FACTS>
<HISTORY>
Short summary of relevant turns
</HISTORY>
Examples
Example 1: Organizing System Prompts
<BACKGROUND_INFORMATION>
You are a Python expert helping a development team.
Current project: Data processing pipeline in Python 3.9+
</BACKGROUND_INFORMATION>
<INSTRUCTIONS>
- Write clean, idiomatic Python code
- Include type hints for function signatures
- Add docstrings for public functions
- Follow PEP 8 style guidelines
</INSTRUCTIONS>
<TOOL_GUIDANCE>
Use bash for shell operations, python for code tasks.
File operations should use pathlib for cross-platform compatibility.
</TOOL_GUIDANCE>
<OUTPUT_DESCRIPTION>
Provide code blocks with syntax highlighting.
Explain non-obvious decisions in comments.
</OUTPUT_DESCRIPTION>
Example 2: Progressive Document Loading
# Instead of loading all documentation at once:
# Step 1: Load summary
docs/api_summary.md # Lightweight overview
# Step 2: Load specific section as needed
docs/api/endpoints.md # Only when API calls needed
docs/api/authentication.md # Only when auth context needed
Guidelines
- Treat Context as a finite resource with diminishing returns
- Place critical info at attention-favored positions (beginning/end)
- Use progressive disclosure to defer loading until needed
- Organize system prompts with clear section boundaries
- Monitor Context usage during development
- Implement compaction triggers at 70-80% utilization
- Design for Context degradation rather than hoping to avoid it
- Prefer smaller high-signal Context over larger low-signal Context
Practice Task
- Pick an agent/project you're working on and split its context into 5 blocks
- Write a "minimal context" version using the template above
- Compare against the original and flag low-signal content that can be removed
Related Pages
- Claude Code Examples
- Context Degradation Patterns
- Context Compression Strategies
- Tool Design for Agents
Integration
This skill provides foundational context that all other skills build upon. It should be studied first before exploring:
- context-degradation - Understanding how Context fails
- context-optimization - Techniques for extending Context capacity
- multi-agent-patterns - How Context isolation enables multi-agent systems
- tool-design - How tool definitions interact with Context
References
Related skills in this collection:
- context-degradation - Understanding Context failure patterns
- context-optimization - Techniques for efficient Context use
External resources:
- Research on transformer attention mechanisms
- Production engineering guides from leading AI labs
- Framework documentation on Context window management
Skill Metadata
Created: 2025-12-20 Last Updated: 2025-12-20 Author: Agent Skills for Context Engineering Contributors Version: 1.0.0