Context Compression & Optimization
Context Compression Strategies
When agent sessions generate massive conversation history, compression becomes a must. The intuitive approach is minimizing tokens-per-request, but the correct target is tokens-per-task: the total tokens needed to finish the job, including re-fetch costs from losing critical information to compression.
The right goal isn't "shortest single request" — it's "lowest total cost to complete the task."
- Optimize tokens-per-task, not tokens-per-request.
- Structured summaries beat aggressive compression for long tasks.
- Artifact trail is the hardest information to preserve.
- Trigger compression at 70-80% context.
- Use probe questions to evaluate quality.
What You'll Learn
- Trade-offs between three mainstream compression strategies
- Why "structured summarization" is the safest engineering practice
- How to evaluate compression quality with probe questions
When to Activate
Activate this skill when:
- Agent sessions exceed context window limits
- Designing conversation summarization strategies
- Evaluating different compression approaches for production systems
- Debugging cases where agents "forget" what files they modified
- Building evaluation frameworks for compression quality
- Optimizing long-running coding or debugging sessions
Core Concepts
Context compression is a trade-off between token savings and information loss. Three production-ready approaches:
-
Anchored Iterative Summarization: Maintains a structured, continuously updated summary containing session intent, file modifications, decisions, and next steps. On trigger, only the newly truncated portion gets summarized and merged. The structure itself forces retention of critical information.
-
Opaque Compression: Chases maximum compression ratios (99%+), but interpretability is low and you can't verify what was retained.
-
Regenerative Full Summary: Generates a complete summary each time. Readable, but multi-round compression keeps shedding details.
Key conclusion: structured summaries "force retention" and prevent silent information drift.
Detailed Topics
Why Tokens-Per-Task Matters
Traditional metrics only look at tokens-per-request — that's the wrong optimization target. Once compression drops a file path or error message, the agent re-fetches, re-explores, and ends up consuming more tokens overall.
The correct metric is tokens-per-task: total consumption from task start to completion. Saving 0.5% on tokens but incurring 20% re-fetch overhead makes things more expensive, not less.
The Artifact Trail Problem
Artifact trail is the weakest dimension across all compression methods, scoring only 2.2–2.5/5 in evaluations. Even structured summaries struggle to consistently preserve complete file trails.
Coding agents need to know:
- Which files were created
- Which files were modified and what changed
- Which files were read but not modified
- Function names, variable names, error messages
This usually requires dedicated mechanisms beyond natural language summaries.
Structured Summary Sections
An effective structured summary should include:
## Session Intent
[What the user is trying to accomplish]
## Files Modified
- auth.controller.ts: Fixed JWT token generation
- config/redis.ts: Updated connection pooling
- tests/auth.test.ts: Added mock setup for new config
## Decisions Made
- Using Redis connection pool instead of per-request connections
- Retry logic with exponential backoff for transient failures
## Current State
- 14 tests passing, 2 failing
- Remaining: mock setup for session service tests
## Next Steps
1. Fix remaining test failures
2. Run full test suite
3. Update documentation
The point of structure is to "force coverage of critical information" and prevent omissions.
Compression Trigger Strategies
When to compress matters as much as how:
| Strategy | Trigger Point | Trade-off |
|---|---|---|
| Fixed threshold | 70-80% context utilization | Simple but may trigger early |
| Sliding window | Keep last N turns + summary | Predictable |
| Importance-based | Compress low-relevance first | Complex but preserves signal |
| Task-boundary | Compress at task boundaries | Readable but unpredictable |
For coding agents, sliding window + structured summary is usually the best balance.
Probe-Based Evaluation
ROUGE/embedding similarity can't measure functional fidelity. A summary might "look similar" but be missing a critical file path.
Probe-based evaluation tests retention through questions:
| Probe Type | What It Tests | Example Question |
|---|---|---|
| Recall | Factual retention | "What was the original error message?" |
| Artifact | File tracking | "Which files have we modified?" |
| Continuation | Task planning | "What should we do next?" |
| Decision | Reasoning chain | "What did we decide about the Redis issue?" |
Evaluation Dimensions
Six dimensions for measuring compression quality:
- Accuracy: Are technical details correct?
- Context Awareness: Does it match the current conversation state?
- Artifact Trail: Is the file trail complete?
- Completeness: Does it cover the key points?
- Continuity: Can the task resume seamlessly?
- Instruction Following: Are constraints respected?
Accuracy has the widest variance. Artifact Trail is consistently the weakest.
Practical Guidance
Implementing Anchored Iterative Summarization
- Define summary sections (tailored to your task type)
- First compression: generate a complete structured summary
- Subsequent compressions: only summarize the newly truncated portion and merge
- Don't regenerate from scratch — that causes detail drift
- Record summary provenance for debugging
When to Use Each Approach
Use anchored iterative summarization when:
- Sessions are long (100+ messages)
- File tracking is critical
- You need verifiable information retention
Use opaque compression when:
- Maximum compression ratio is the priority
- Sessions are relatively short
- Re-fetch cost is low
Use regenerative summaries when:
- Summary readability is paramount
- There are clear phase boundaries
- You can accept repeated review passes
Compression Ratio Considerations
| Method | Compression Ratio | Quality Score | Trade-off |
|---|---|---|---|
| Anchored Iterative | 98.6% | 3.70 | Best quality, slightly less compression |
| Regenerative | 98.7% | 3.44 | Medium quality |
| Opaque | 99.3% | 3.35 | Best compression, quality drops |
Examples
Example 1: Debugging Session Compression
Original context (89,000 tokens, 178 messages):
- 401 error on /api/auth/login endpoint
- Traced through auth controller, middleware, session store
- Found stale Redis connection
- Fixed connection pooling, added retry logic
- 14 tests passing, 2 failing
Structured summary after compression:
## Session Intent
Debug 401 Unauthorized error on /api/auth/login despite valid credentials.
## Root Cause
Stale Redis connection in session store. JWT generated correctly but session could not be persisted.
## Files Modified
- auth.controller.ts: No changes (read only)
- middleware/cors.ts: No changes (examined)
- config/redis.ts: Fixed connection pooling configuration
- services/session.service.ts: Added retry logic for transient failures
- tests/auth.test.ts: Updated mock setup
## Test Status
14 passing, 2 failing (mock setup issues)
## Next Steps
1. Fix remaining test failures (mock session service)
2. Run full test suite
3. Deploy to staging
Example 2: Probe Response Quality
After compression, asking "What was the original error?":
Good response (structured summarization):
"The original error was a 401 Unauthorized response from the /api/auth/login endpoint. Users received this error with valid credentials. Root cause was stale Redis connection in session store."
Poor response (aggressive compression):
"We were debugging an authentication issue. The login was failing. We fixed some configuration problems."
Guidelines
- Optimize for tokens-per-task, not tokens-per-request
- Use structured summaries with explicit sections for file tracking
- Trigger compression at 70-80% context utilization
- Implement incremental merging rather than full regeneration
- Test compression quality with probe-based evaluation
- Track artifact trail separately if file tracking is critical
- Accept slightly lower compression ratios for better quality retention
- Monitor re-fetching frequency as a compression quality signal
Practice Task
- Write a structured summary template for your project using the sections above
- Design 3 probe questions to verify whether critical facts were retained
Related Pages
- Claude Code Examples
- Context Engineering Fundamentals
- Context Degradation Patterns
- Advanced Evaluation
Integration
This skill connects to:
- context-degradation - Compression is a mitigation strategy
- context-optimization - Compression is one optimization technique
- evaluation - Probe-based evaluation applies to compression testing
- memory-systems - Compression relates to scratchpad and summary memory patterns
References
Related skills:
- context-degradation - Understanding what compression prevents
- context-optimization - Broader optimization strategies
- evaluation - Building evaluation frameworks
External resources:
- Factory Research: Evaluating Context Compression for AI Agents (December 2025)
- Research on LLM-as-judge evaluation methodology (Zheng et al., 2023)
Skill Metadata
Created: 2025-12-22 Last Updated: 2025-12-22 Author: Agent Skills for Context Engineering Contributors Version: 1.0.0