08

Context Compression & Optimization

⏱️ 35分钟

Context Compression Strategies

当 agent session 产生大量 conversation history 时,compression 变成必需。直觉做法是把 tokens-per-request 压到最低,但正确目标是 tokens-per-task:完成任务所需的总 token,包括因为压缩丢失关键信息而产生的 re-fetch 成本。

压缩的正确目标不是“单次最短”,而是“任务总成本最低”。

  • Optimize tokens-per-task, not tokens-per-request.
  • Structured summaries beat aggressive compression for long tasks.
  • Artifact trail 是最难保留的信息。
  • Trigger compression at 70-80% context.
  • Use probe questions to evaluate quality.

你将学到什么

  • 三种主流 compression 策略的取舍
  • 为什么“结构化摘要”是最稳妥的工程实践
  • 如何用 probe 问题评估压缩质量

When to Activate

Activate this skill when:

  • Agent sessions exceed context window limits
  • Designing conversation summarization strategies
  • Evaluating different compression approaches for production systems
  • Debugging cases where agents "forget" what files they modified
  • Building evaluation frameworks for compression quality
  • Optimizing long-running coding or debugging sessions

Core Concepts

Context compression 在 token savings 与 information loss 之间做权衡。三种生产可用方案:

  1. Anchored Iterative Summarization: 维护结构化、持续更新的 summary,包含 session intent、file modifications、decisions、next steps。触发时只总结新增截断部分并 merge。结构化本身强迫保留关键信息。

  2. Opaque Compression: 追求最高压缩率(99%+),但可解释性低,无法验证保留了什么。

  3. Regenerative Full Summary: 每次生成完整 summary,可读性高,但多轮压缩会不断丢细节。

关键结论:结构化 summary 会“强迫保留”,避免 silent information drift。

Detailed Topics

Why Tokens-Per-Task Matters

传统指标只看 tokens-per-request,这是错误优化目标。一旦压缩丢失 file paths 或 error messages,agent 就会重新检索、重复探索,反而消耗更多 tokens。

正确指标是 tokens-per-task:从任务开始到完成的总消耗。节省 0.5% tokens 但带来 20% re-fetch 成本,整体更贵。

The Artifact Trail Problem

Artifact trail 是所有压缩方案里最弱的维度,评测只有 2.2-2.5/5。即便结构化 summary,也难以持续保留完整文件轨迹。

Coding agents 需要知道:

  • 哪些 files 被创建
  • 哪些 files 被修改、改了什么
  • 哪些 files 被读取但未修改
  • function names、variable names、error messages

这通常需要额外机制,而不只是自然语言 summary。

Structured Summary Sections

有效的结构化 summary 应包含:

## Session Intent

[What the user is trying to accomplish]

## Files Modified

-   auth.controller.ts: Fixed JWT token generation
-   config/redis.ts: Updated connection pooling
-   tests/auth.test.ts: Added mock setup for new config

## Decisions Made

-   Using Redis connection pool instead of per-request connections
-   Retry logic with exponential backoff for transient failures

## Current State

-   14 tests passing, 2 failing
-   Remaining: mock setup for session service tests

## Next Steps

1. Fix remaining test failures
2. Run full test suite
3. Update documentation

结构化的目的在于“强制覆盖关键信息”,避免遗漏。

Compression Trigger Strategies

何时压缩和如何压缩同样重要:

StrategyTrigger PointTrade-off
Fixed threshold70-80% context utilization简单但可能过早
Sliding windowKeep last N turns + summary可预测
Importance-basedCompress low-relevance first复杂但更保信号
Task-boundaryCompress at task boundaries可读但不可预测

对 coding agent 来说,sliding window + structured summary 通常最平衡。

Probe-Based Evaluation

ROUGE/embedding similarity 无法衡量功能性保真。summary 可能“看起来很像”,但缺了关键 file path。

Probe-based evaluation 通过提问验证保留度:

Probe TypeWhat It TestsExample Question
RecallFactual retention"What was the original error message?"
ArtifactFile tracking"Which files have we modified?"
ContinuationTask planning"What should we do next?"
DecisionReasoning chain"What did we decide about the Redis issue?"

Evaluation Dimensions

六个维度衡量 compression quality:

  1. Accuracy: 技术细节是否正确
  2. Context Awareness: 是否符合当前对话状态
  3. Artifact Trail: 文件轨迹是否完整
  4. Completeness: 是否覆盖问题要点
  5. Continuity: 是否能无缝继续任务
  6. Instruction Following: 是否遵守约束

Accuracy 差异最大,Artifact Trail 最弱。

Practical Guidance

Implementing Anchored Iterative Summarization

  1. 定义 summary sections(贴近你的任务类型)
  2. 第一次压缩时做完整结构化 summary
  3. 后续只总结新增截断段并 merge
  4. 不做全量再生成,避免细节漂移
  5. 记录 summary 来源用于调试

When to Use Each Approach

Use anchored iterative summarization when:

  • Sessions 很长(100+ messages)
  • File tracking 很关键
  • 需要可验证的保留信息

Use opaque compression when:

  • 极端追求压缩率
  • Sessions 相对短
  • Re-fetch 成本低

Use regenerative summaries when:

  • Summary 可读性极其重要
  • 有明确 phase boundaries
  • 能接受反复 review

Compression Ratio Considerations

MethodCompression RatioQuality ScoreTrade-off
Anchored Iterative98.6%3.70最好质量,略少压缩
Regenerative98.7%3.44质量中等
Opaque99.3%3.35最佳压缩,质量下降

Examples

Example 1: Debugging Session Compression

Original context (89,000 tokens, 178 messages):

  • 401 error on /api/auth/login endpoint
  • Traced through auth controller, middleware, session store
  • Found stale Redis connection
  • Fixed connection pooling, added retry logic
  • 14 tests passing, 2 failing

Structured summary after compression:

## Session Intent

Debug 401 Unauthorized error on /api/auth/login despite valid credentials.

## Root Cause

Stale Redis connection in session store. JWT generated correctly but session could not be persisted.

## Files Modified

-   auth.controller.ts: No changes (read only)
-   middleware/cors.ts: No changes (examined)
-   config/redis.ts: Fixed connection pooling configuration
-   services/session.service.ts: Added retry logic for transient failures
-   tests/auth.test.ts: Updated mock setup

## Test Status

14 passing, 2 failing (mock setup issues)

## Next Steps

1. Fix remaining test failures (mock session service)
2. Run full test suite
3. Deploy to staging

Example 2: Probe Response Quality

After compression, asking "What was the original error?":

Good response (structured summarization):

"The original error was a 401 Unauthorized response from the /api/auth/login endpoint. Users received this error with valid credentials. Root cause was stale Redis connection in session store."

Poor response (aggressive compression):

"We were debugging an authentication issue. The login was failing. We fixed some configuration problems."

Guidelines

  1. Optimize for tokens-per-task, not tokens-per-request
  2. Use structured summaries with explicit sections for file tracking
  3. Trigger compression at 70-80% context utilization
  4. Implement incremental merging rather than full regeneration
  5. Test compression quality with probe-based evaluation
  6. Track artifact trail separately if file tracking is critical
  7. Accept slightly lower compression ratios for better quality retention
  8. Monitor re-fetching frequency as a compression quality signal

Practice Task

  • 用“结构化摘要模板”为你的项目写一版 summary
  • 设计 3 个 probe 问题验证是否保留了关键事实

Integration

This skill connects to:

  • context-degradation - Compression is a mitigation strategy
  • context-optimization - Compression is one optimization technique
  • evaluation - Probe-based evaluation applies to compression testing
  • memory-systems - Compression relates to scratchpad and summary memory patterns

References

Related skills:

  • context-degradation - Understanding what compression prevents
  • context-optimization - Broader optimization strategies
  • evaluation - Building evaluation frameworks

External resources:

  • Factory Research: Evaluating Context Compression for AI Agents (December 2025)
  • Research on LLM-as-judge evaluation methodology (Zheng et al., 2023)

Skill Metadata

Created: 2025-12-22 Last Updated: 2025-12-22 Author: Agent Skills for Context Engineering Contributors Version: 1.0.0

📚 相关资源

❓ 常见问题

关于本章主题最常被搜索的问题,点击展开答案

压缩 context 应该追求 token 最少吗?

不要。目标是 tokens-per-task(任务总成本),不是 tokens-per-request(单次最少)。压得太狠丢了 file paths、error messages 这类关键信息,agent 会重新检索/重复探索,整体反而更贵。节省 0.5% tokens 但带来 20% re-fetch 成本,账算下来负数。

三种 compression 策略各自适合什么场景?

(1) Anchored Iterative Summarization:结构化 summary 持续更新,压缩 98.6% / 质量分 3.70,适合长 session(100+ messages)+ file tracking 关键场景;(2) Opaque Compression:99.3% 压缩率但可解释性低,适合 sessions 短 + re-fetch 成本低;(3) Regenerative Full Summary:可读性高但多轮压缩会丢细节,适合 phase boundary 清晰的任务。

压缩什么时候触发?70% 还是 90%?

70-80% 是甜点。常见触发策略:fixed threshold(70-80% utilization,简单但可能过早)、sliding window(保留最近 N 轮 + summary,可预测)、importance-based(先压低 relevance 内容,复杂但保信号)、task-boundary(任务边界压缩,可读但不可预测)。Coding agent 推荐 sliding window + structured summary 组合最平衡。

structured summary 应该有哪些 section?

5 个固定 section:## Session Intent(用户目标)、## Files Modified(带具体修改说明)、## Decisions Made(关键决策)、## Current State(测试通过/失败、当前进度)、## Next Steps(下一步动作列表)。结构化的目的是「强制覆盖」关键信息——artifact trail(文件轨迹)是评测得分最低的维度(2.2-2.5/5),必须显式列出来。

怎么验证压缩后没丢关键信息?

用 probe-based evaluation,不要只看 ROUGE / embedding similarity(只看「像不像」,不看功能保真)。问 4 类 probe:Recall(原始错误信息是什么?)、Artifact(改了哪些文件?)、Continuation(下一步该做什么?)、Decision(关于 Redis 的决策是什么?)。再按 6 维评估:Accuracy / Context Awareness / Artifact Trail / Completeness / Continuity / Instruction Following。