Context Compression & Optimization
Context Compression Strategies
当 agent session 产生大量 conversation history 时,compression 变成必需。直觉做法是把 tokens-per-request 压到最低,但正确目标是 tokens-per-task:完成任务所需的总 token,包括因为压缩丢失关键信息而产生的 re-fetch 成本。
压缩的正确目标不是“单次最短”,而是“任务总成本最低”。
- Optimize tokens-per-task, not tokens-per-request.
- Structured summaries beat aggressive compression for long tasks.
- Artifact trail 是最难保留的信息。
- Trigger compression at 70-80% context.
- Use probe questions to evaluate quality.
你将学到什么
- 三种主流 compression 策略的取舍
- 为什么“结构化摘要”是最稳妥的工程实践
- 如何用 probe 问题评估压缩质量
When to Activate
Activate this skill when:
- Agent sessions exceed context window limits
- Designing conversation summarization strategies
- Evaluating different compression approaches for production systems
- Debugging cases where agents "forget" what files they modified
- Building evaluation frameworks for compression quality
- Optimizing long-running coding or debugging sessions
Core Concepts
Context compression 在 token savings 与 information loss 之间做权衡。三种生产可用方案:
-
Anchored Iterative Summarization: 维护结构化、持续更新的 summary,包含 session intent、file modifications、decisions、next steps。触发时只总结新增截断部分并 merge。结构化本身强迫保留关键信息。
-
Opaque Compression: 追求最高压缩率(99%+),但可解释性低,无法验证保留了什么。
-
Regenerative Full Summary: 每次生成完整 summary,可读性高,但多轮压缩会不断丢细节。
关键结论:结构化 summary 会“强迫保留”,避免 silent information drift。
Detailed Topics
Why Tokens-Per-Task Matters
传统指标只看 tokens-per-request,这是错误优化目标。一旦压缩丢失 file paths 或 error messages,agent 就会重新检索、重复探索,反而消耗更多 tokens。
正确指标是 tokens-per-task:从任务开始到完成的总消耗。节省 0.5% tokens 但带来 20% re-fetch 成本,整体更贵。
The Artifact Trail Problem
Artifact trail 是所有压缩方案里最弱的维度,评测只有 2.2-2.5/5。即便结构化 summary,也难以持续保留完整文件轨迹。
Coding agents 需要知道:
- 哪些 files 被创建
- 哪些 files 被修改、改了什么
- 哪些 files 被读取但未修改
- function names、variable names、error messages
这通常需要额外机制,而不只是自然语言 summary。
Structured Summary Sections
有效的结构化 summary 应包含:
## Session Intent
[What the user is trying to accomplish]
## Files Modified
- auth.controller.ts: Fixed JWT token generation
- config/redis.ts: Updated connection pooling
- tests/auth.test.ts: Added mock setup for new config
## Decisions Made
- Using Redis connection pool instead of per-request connections
- Retry logic with exponential backoff for transient failures
## Current State
- 14 tests passing, 2 failing
- Remaining: mock setup for session service tests
## Next Steps
1. Fix remaining test failures
2. Run full test suite
3. Update documentation
结构化的目的在于“强制覆盖关键信息”,避免遗漏。
Compression Trigger Strategies
何时压缩和如何压缩同样重要:
| Strategy | Trigger Point | Trade-off |
|---|---|---|
| Fixed threshold | 70-80% context utilization | 简单但可能过早 |
| Sliding window | Keep last N turns + summary | 可预测 |
| Importance-based | Compress low-relevance first | 复杂但更保信号 |
| Task-boundary | Compress at task boundaries | 可读但不可预测 |
对 coding agent 来说,sliding window + structured summary 通常最平衡。
Probe-Based Evaluation
ROUGE/embedding similarity 无法衡量功能性保真。summary 可能“看起来很像”,但缺了关键 file path。
Probe-based evaluation 通过提问验证保留度:
| Probe Type | What It Tests | Example Question |
|---|---|---|
| Recall | Factual retention | "What was the original error message?" |
| Artifact | File tracking | "Which files have we modified?" |
| Continuation | Task planning | "What should we do next?" |
| Decision | Reasoning chain | "What did we decide about the Redis issue?" |
Evaluation Dimensions
六个维度衡量 compression quality:
- Accuracy: 技术细节是否正确
- Context Awareness: 是否符合当前对话状态
- Artifact Trail: 文件轨迹是否完整
- Completeness: 是否覆盖问题要点
- Continuity: 是否能无缝继续任务
- Instruction Following: 是否遵守约束
Accuracy 差异最大,Artifact Trail 最弱。
Practical Guidance
Implementing Anchored Iterative Summarization
- 定义 summary sections(贴近你的任务类型)
- 第一次压缩时做完整结构化 summary
- 后续只总结新增截断段并 merge
- 不做全量再生成,避免细节漂移
- 记录 summary 来源用于调试
When to Use Each Approach
Use anchored iterative summarization when:
- Sessions 很长(100+ messages)
- File tracking 很关键
- 需要可验证的保留信息
Use opaque compression when:
- 极端追求压缩率
- Sessions 相对短
- Re-fetch 成本低
Use regenerative summaries when:
- Summary 可读性极其重要
- 有明确 phase boundaries
- 能接受反复 review
Compression Ratio Considerations
| Method | Compression Ratio | Quality Score | Trade-off |
|---|---|---|---|
| Anchored Iterative | 98.6% | 3.70 | 最好质量,略少压缩 |
| Regenerative | 98.7% | 3.44 | 质量中等 |
| Opaque | 99.3% | 3.35 | 最佳压缩,质量下降 |
Examples
Example 1: Debugging Session Compression
Original context (89,000 tokens, 178 messages):
- 401 error on /api/auth/login endpoint
- Traced through auth controller, middleware, session store
- Found stale Redis connection
- Fixed connection pooling, added retry logic
- 14 tests passing, 2 failing
Structured summary after compression:
## Session Intent
Debug 401 Unauthorized error on /api/auth/login despite valid credentials.
## Root Cause
Stale Redis connection in session store. JWT generated correctly but session could not be persisted.
## Files Modified
- auth.controller.ts: No changes (read only)
- middleware/cors.ts: No changes (examined)
- config/redis.ts: Fixed connection pooling configuration
- services/session.service.ts: Added retry logic for transient failures
- tests/auth.test.ts: Updated mock setup
## Test Status
14 passing, 2 failing (mock setup issues)
## Next Steps
1. Fix remaining test failures (mock session service)
2. Run full test suite
3. Deploy to staging
Example 2: Probe Response Quality
After compression, asking "What was the original error?":
Good response (structured summarization):
"The original error was a 401 Unauthorized response from the /api/auth/login endpoint. Users received this error with valid credentials. Root cause was stale Redis connection in session store."
Poor response (aggressive compression):
"We were debugging an authentication issue. The login was failing. We fixed some configuration problems."
Guidelines
- Optimize for tokens-per-task, not tokens-per-request
- Use structured summaries with explicit sections for file tracking
- Trigger compression at 70-80% context utilization
- Implement incremental merging rather than full regeneration
- Test compression quality with probe-based evaluation
- Track artifact trail separately if file tracking is critical
- Accept slightly lower compression ratios for better quality retention
- Monitor re-fetching frequency as a compression quality signal
Practice Task
- 用“结构化摘要模板”为你的项目写一版 summary
- 设计 3 个 probe 问题验证是否保留了关键事实
Related Pages
- Claude Code Examples
- Context Engineering Fundamentals
- Context Degradation Patterns
- Advanced Evaluation
Integration
This skill connects to:
- context-degradation - Compression is a mitigation strategy
- context-optimization - Compression is one optimization technique
- evaluation - Probe-based evaluation applies to compression testing
- memory-systems - Compression relates to scratchpad and summary memory patterns
References
Internal reference:
- Evaluation Framework Reference - Detailed probe types and scoring rubrics
Related skills:
- context-degradation - Understanding what compression prevents
- context-optimization - Broader optimization strategies
- evaluation - Building evaluation frameworks
External resources:
- Factory Research: Evaluating Context Compression for AI Agents (December 2025)
- Research on LLM-as-judge evaluation methodology (Zheng et al., 2023)
Skill Metadata
Created: 2025-12-22 Last Updated: 2025-12-22 Author: Agent Skills for Context Engineering Contributors Version: 1.0.0