Context Compression & Optimization
Context Compression Strategies
当 agent session 产生大量 conversation history 时,compression 变成必需。直觉做法是把 tokens-per-request 压到最低,但正确目标是 tokens-per-task:完成任务所需的总 token,包括因为压缩丢失关键信息而产生的 re-fetch 成本。
压缩的正确目标不是“单次最短”,而是“任务总成本最低”。
- Optimize tokens-per-task, not tokens-per-request.
- Structured summaries beat aggressive compression for long tasks.
- Artifact trail 是最难保留的信息。
- Trigger compression at 70-80% context.
- Use probe questions to evaluate quality.
你将学到什么
- 三种主流 compression 策略的取舍
- 为什么“结构化摘要”是最稳妥的工程实践
- 如何用 probe 问题评估压缩质量
When to Activate
Activate this skill when:
- Agent sessions exceed context window limits
- Designing conversation summarization strategies
- Evaluating different compression approaches for production systems
- Debugging cases where agents "forget" what files they modified
- Building evaluation frameworks for compression quality
- Optimizing long-running coding or debugging sessions
Core Concepts
Context compression 在 token savings 与 information loss 之间做权衡。三种生产可用方案:
-
Anchored Iterative Summarization: 维护结构化、持续更新的 summary,包含 session intent、file modifications、decisions、next steps。触发时只总结新增截断部分并 merge。结构化本身强迫保留关键信息。
-
Opaque Compression: 追求最高压缩率(99%+),但可解释性低,无法验证保留了什么。
-
Regenerative Full Summary: 每次生成完整 summary,可读性高,但多轮压缩会不断丢细节。
关键结论:结构化 summary 会“强迫保留”,避免 silent information drift。
Detailed Topics
Why Tokens-Per-Task Matters
传统指标只看 tokens-per-request,这是错误优化目标。一旦压缩丢失 file paths 或 error messages,agent 就会重新检索、重复探索,反而消耗更多 tokens。
正确指标是 tokens-per-task:从任务开始到完成的总消耗。节省 0.5% tokens 但带来 20% re-fetch 成本,整体更贵。
The Artifact Trail Problem
Artifact trail 是所有压缩方案里最弱的维度,评测只有 2.2-2.5/5。即便结构化 summary,也难以持续保留完整文件轨迹。
Coding agents 需要知道:
- 哪些 files 被创建
- 哪些 files 被修改、改了什么
- 哪些 files 被读取但未修改
- function names、variable names、error messages
这通常需要额外机制,而不只是自然语言 summary。
Structured Summary Sections
有效的结构化 summary 应包含:
## Session Intent
[What the user is trying to accomplish]
## Files Modified
- auth.controller.ts: Fixed JWT token generation
- config/redis.ts: Updated connection pooling
- tests/auth.test.ts: Added mock setup for new config
## Decisions Made
- Using Redis connection pool instead of per-request connections
- Retry logic with exponential backoff for transient failures
## Current State
- 14 tests passing, 2 failing
- Remaining: mock setup for session service tests
## Next Steps
1. Fix remaining test failures
2. Run full test suite
3. Update documentation
结构化的目的在于“强制覆盖关键信息”,避免遗漏。
Compression Trigger Strategies
何时压缩和如何压缩同样重要:
| Strategy | Trigger Point | Trade-off |
|---|---|---|
| Fixed threshold | 70-80% context utilization | 简单但可能过早 |
| Sliding window | Keep last N turns + summary | 可预测 |
| Importance-based | Compress low-relevance first | 复杂但更保信号 |
| Task-boundary | Compress at task boundaries | 可读但不可预测 |
对 coding agent 来说,sliding window + structured summary 通常最平衡。
Probe-Based Evaluation
ROUGE/embedding similarity 无法衡量功能性保真。summary 可能“看起来很像”,但缺了关键 file path。
Probe-based evaluation 通过提问验证保留度:
| Probe Type | What It Tests | Example Question |
|---|---|---|
| Recall | Factual retention | "What was the original error message?" |
| Artifact | File tracking | "Which files have we modified?" |
| Continuation | Task planning | "What should we do next?" |
| Decision | Reasoning chain | "What did we decide about the Redis issue?" |
Evaluation Dimensions
六个维度衡量 compression quality:
- Accuracy: 技术细节是否正确
- Context Awareness: 是否符合当前对话状态
- Artifact Trail: 文件轨迹是否完整
- Completeness: 是否覆盖问题要点
- Continuity: 是否能无缝继续任务
- Instruction Following: 是否遵守约束
Accuracy 差异最大,Artifact Trail 最弱。
Practical Guidance
Implementing Anchored Iterative Summarization
- 定义 summary sections(贴近你的任务类型)
- 第一次压缩时做完整结构化 summary
- 后续只总结新增截断段并 merge
- 不做全量再生成,避免细节漂移
- 记录 summary 来源用于调试
When to Use Each Approach
Use anchored iterative summarization when:
- Sessions 很长(100+ messages)
- File tracking 很关键
- 需要可验证的保留信息
Use opaque compression when:
- 极端追求压缩率
- Sessions 相对短
- Re-fetch 成本低
Use regenerative summaries when:
- Summary 可读性极其重要
- 有明确 phase boundaries
- 能接受反复 review
Compression Ratio Considerations
| Method | Compression Ratio | Quality Score | Trade-off |
|---|---|---|---|
| Anchored Iterative | 98.6% | 3.70 | 最好质量,略少压缩 |
| Regenerative | 98.7% | 3.44 | 质量中等 |
| Opaque | 99.3% | 3.35 | 最佳压缩,质量下降 |
Examples
Example 1: Debugging Session Compression
Original context (89,000 tokens, 178 messages):
- 401 error on /api/auth/login endpoint
- Traced through auth controller, middleware, session store
- Found stale Redis connection
- Fixed connection pooling, added retry logic
- 14 tests passing, 2 failing
Structured summary after compression:
## Session Intent
Debug 401 Unauthorized error on /api/auth/login despite valid credentials.
## Root Cause
Stale Redis connection in session store. JWT generated correctly but session could not be persisted.
## Files Modified
- auth.controller.ts: No changes (read only)
- middleware/cors.ts: No changes (examined)
- config/redis.ts: Fixed connection pooling configuration
- services/session.service.ts: Added retry logic for transient failures
- tests/auth.test.ts: Updated mock setup
## Test Status
14 passing, 2 failing (mock setup issues)
## Next Steps
1. Fix remaining test failures (mock session service)
2. Run full test suite
3. Deploy to staging
Example 2: Probe Response Quality
After compression, asking "What was the original error?":
Good response (structured summarization):
"The original error was a 401 Unauthorized response from the /api/auth/login endpoint. Users received this error with valid credentials. Root cause was stale Redis connection in session store."
Poor response (aggressive compression):
"We were debugging an authentication issue. The login was failing. We fixed some configuration problems."
Guidelines
- Optimize for tokens-per-task, not tokens-per-request
- Use structured summaries with explicit sections for file tracking
- Trigger compression at 70-80% context utilization
- Implement incremental merging rather than full regeneration
- Test compression quality with probe-based evaluation
- Track artifact trail separately if file tracking is critical
- Accept slightly lower compression ratios for better quality retention
- Monitor re-fetching frequency as a compression quality signal
Practice Task
- 用“结构化摘要模板”为你的项目写一版 summary
- 设计 3 个 probe 问题验证是否保留了关键事实
Related Pages
- Claude Code Examples
- Context Engineering Fundamentals
- Context Degradation Patterns
- Advanced Evaluation
Integration
This skill connects to:
- context-degradation - Compression is a mitigation strategy
- context-optimization - Compression is one optimization technique
- evaluation - Probe-based evaluation applies to compression testing
- memory-systems - Compression relates to scratchpad and summary memory patterns
References
Related skills:
- context-degradation - Understanding what compression prevents
- context-optimization - Broader optimization strategies
- evaluation - Building evaluation frameworks
External resources:
- Factory Research: Evaluating Context Compression for AI Agents (December 2025)
- Research on LLM-as-judge evaluation methodology (Zheng et al., 2023)
Skill Metadata
Created: 2025-12-22 Last Updated: 2025-12-22 Author: Agent Skills for Context Engineering Contributors Version: 1.0.0
📚 相关资源
❓ 常见问题
关于本章主题最常被搜索的问题,点击展开答案
压缩 context 应该追求 token 最少吗?
不要。目标是 tokens-per-task(任务总成本),不是 tokens-per-request(单次最少)。压得太狠丢了 file paths、error messages 这类关键信息,agent 会重新检索/重复探索,整体反而更贵。节省 0.5% tokens 但带来 20% re-fetch 成本,账算下来负数。
三种 compression 策略各自适合什么场景?
(1) Anchored Iterative Summarization:结构化 summary 持续更新,压缩 98.6% / 质量分 3.70,适合长 session(100+ messages)+ file tracking 关键场景;(2) Opaque Compression:99.3% 压缩率但可解释性低,适合 sessions 短 + re-fetch 成本低;(3) Regenerative Full Summary:可读性高但多轮压缩会丢细节,适合 phase boundary 清晰的任务。
压缩什么时候触发?70% 还是 90%?
70-80% 是甜点。常见触发策略:fixed threshold(70-80% utilization,简单但可能过早)、sliding window(保留最近 N 轮 + summary,可预测)、importance-based(先压低 relevance 内容,复杂但保信号)、task-boundary(任务边界压缩,可读但不可预测)。Coding agent 推荐 sliding window + structured summary 组合最平衡。
structured summary 应该有哪些 section?
5 个固定 section:## Session Intent(用户目标)、## Files Modified(带具体修改说明)、## Decisions Made(关键决策)、## Current State(测试通过/失败、当前进度)、## Next Steps(下一步动作列表)。结构化的目的是「强制覆盖」关键信息——artifact trail(文件轨迹)是评测得分最低的维度(2.2-2.5/5),必须显式列出来。
怎么验证压缩后没丢关键信息?
用 probe-based evaluation,不要只看 ROUGE / embedding similarity(只看「像不像」,不看功能保真)。问 4 类 probe:Recall(原始错误信息是什么?)、Artifact(改了哪些文件?)、Continuation(下一步该做什么?)、Decision(关于 Redis 的决策是什么?)。再按 6 维评估:Accuracy / Context Awareness / Artifact Trail / Completeness / Continuity / Instruction Following。