logo
08

Context Compression & Optimization

⏱️ 35分钟

Context Compression Strategies

当 agent session 产生大量 conversation history 时,compression 变成必需。直觉做法是把 tokens-per-request 压到最低,但正确目标是 tokens-per-task:完成任务所需的总 token,包括因为压缩丢失关键信息而产生的 re-fetch 成本。

压缩的正确目标不是“单次最短”,而是“任务总成本最低”。

  • Optimize tokens-per-task, not tokens-per-request.
  • Structured summaries beat aggressive compression for long tasks.
  • Artifact trail 是最难保留的信息。
  • Trigger compression at 70-80% context.
  • Use probe questions to evaluate quality.

你将学到什么

  • 三种主流 compression 策略的取舍
  • 为什么“结构化摘要”是最稳妥的工程实践
  • 如何用 probe 问题评估压缩质量

When to Activate

Activate this skill when:

  • Agent sessions exceed context window limits
  • Designing conversation summarization strategies
  • Evaluating different compression approaches for production systems
  • Debugging cases where agents "forget" what files they modified
  • Building evaluation frameworks for compression quality
  • Optimizing long-running coding or debugging sessions

Core Concepts

Context compression 在 token savings 与 information loss 之间做权衡。三种生产可用方案:

  1. Anchored Iterative Summarization: 维护结构化、持续更新的 summary,包含 session intent、file modifications、decisions、next steps。触发时只总结新增截断部分并 merge。结构化本身强迫保留关键信息。

  2. Opaque Compression: 追求最高压缩率(99%+),但可解释性低,无法验证保留了什么。

  3. Regenerative Full Summary: 每次生成完整 summary,可读性高,但多轮压缩会不断丢细节。

关键结论:结构化 summary 会“强迫保留”,避免 silent information drift。

Detailed Topics

Why Tokens-Per-Task Matters

传统指标只看 tokens-per-request,这是错误优化目标。一旦压缩丢失 file paths 或 error messages,agent 就会重新检索、重复探索,反而消耗更多 tokens。

正确指标是 tokens-per-task:从任务开始到完成的总消耗。节省 0.5% tokens 但带来 20% re-fetch 成本,整体更贵。

The Artifact Trail Problem

Artifact trail 是所有压缩方案里最弱的维度,评测只有 2.2-2.5/5。即便结构化 summary,也难以持续保留完整文件轨迹。

Coding agents 需要知道:

  • 哪些 files 被创建
  • 哪些 files 被修改、改了什么
  • 哪些 files 被读取但未修改
  • function names、variable names、error messages

这通常需要额外机制,而不只是自然语言 summary。

Structured Summary Sections

有效的结构化 summary 应包含:

## Session Intent

[What the user is trying to accomplish]

## Files Modified

-   auth.controller.ts: Fixed JWT token generation
-   config/redis.ts: Updated connection pooling
-   tests/auth.test.ts: Added mock setup for new config

## Decisions Made

-   Using Redis connection pool instead of per-request connections
-   Retry logic with exponential backoff for transient failures

## Current State

-   14 tests passing, 2 failing
-   Remaining: mock setup for session service tests

## Next Steps

1. Fix remaining test failures
2. Run full test suite
3. Update documentation

结构化的目的在于“强制覆盖关键信息”,避免遗漏。

Compression Trigger Strategies

何时压缩和如何压缩同样重要:

StrategyTrigger PointTrade-off
Fixed threshold70-80% context utilization简单但可能过早
Sliding windowKeep last N turns + summary可预测
Importance-basedCompress low-relevance first复杂但更保信号
Task-boundaryCompress at task boundaries可读但不可预测

对 coding agent 来说,sliding window + structured summary 通常最平衡。

Probe-Based Evaluation

ROUGE/embedding similarity 无法衡量功能性保真。summary 可能“看起来很像”,但缺了关键 file path。

Probe-based evaluation 通过提问验证保留度:

Probe TypeWhat It TestsExample Question
RecallFactual retention"What was the original error message?"
ArtifactFile tracking"Which files have we modified?"
ContinuationTask planning"What should we do next?"
DecisionReasoning chain"What did we decide about the Redis issue?"

Evaluation Dimensions

六个维度衡量 compression quality:

  1. Accuracy: 技术细节是否正确
  2. Context Awareness: 是否符合当前对话状态
  3. Artifact Trail: 文件轨迹是否完整
  4. Completeness: 是否覆盖问题要点
  5. Continuity: 是否能无缝继续任务
  6. Instruction Following: 是否遵守约束

Accuracy 差异最大,Artifact Trail 最弱。

Practical Guidance

Implementing Anchored Iterative Summarization

  1. 定义 summary sections(贴近你的任务类型)
  2. 第一次压缩时做完整结构化 summary
  3. 后续只总结新增截断段并 merge
  4. 不做全量再生成,避免细节漂移
  5. 记录 summary 来源用于调试

When to Use Each Approach

Use anchored iterative summarization when:

  • Sessions 很长(100+ messages)
  • File tracking 很关键
  • 需要可验证的保留信息

Use opaque compression when:

  • 极端追求压缩率
  • Sessions 相对短
  • Re-fetch 成本低

Use regenerative summaries when:

  • Summary 可读性极其重要
  • 有明确 phase boundaries
  • 能接受反复 review

Compression Ratio Considerations

MethodCompression RatioQuality ScoreTrade-off
Anchored Iterative98.6%3.70最好质量,略少压缩
Regenerative98.7%3.44质量中等
Opaque99.3%3.35最佳压缩,质量下降

Examples

Example 1: Debugging Session Compression

Original context (89,000 tokens, 178 messages):

  • 401 error on /api/auth/login endpoint
  • Traced through auth controller, middleware, session store
  • Found stale Redis connection
  • Fixed connection pooling, added retry logic
  • 14 tests passing, 2 failing

Structured summary after compression:

## Session Intent

Debug 401 Unauthorized error on /api/auth/login despite valid credentials.

## Root Cause

Stale Redis connection in session store. JWT generated correctly but session could not be persisted.

## Files Modified

-   auth.controller.ts: No changes (read only)
-   middleware/cors.ts: No changes (examined)
-   config/redis.ts: Fixed connection pooling configuration
-   services/session.service.ts: Added retry logic for transient failures
-   tests/auth.test.ts: Updated mock setup

## Test Status

14 passing, 2 failing (mock setup issues)

## Next Steps

1. Fix remaining test failures (mock session service)
2. Run full test suite
3. Deploy to staging

Example 2: Probe Response Quality

After compression, asking "What was the original error?":

Good response (structured summarization):

"The original error was a 401 Unauthorized response from the /api/auth/login endpoint. Users received this error with valid credentials. Root cause was stale Redis connection in session store."

Poor response (aggressive compression):

"We were debugging an authentication issue. The login was failing. We fixed some configuration problems."

Guidelines

  1. Optimize for tokens-per-task, not tokens-per-request
  2. Use structured summaries with explicit sections for file tracking
  3. Trigger compression at 70-80% context utilization
  4. Implement incremental merging rather than full regeneration
  5. Test compression quality with probe-based evaluation
  6. Track artifact trail separately if file tracking is critical
  7. Accept slightly lower compression ratios for better quality retention
  8. Monitor re-fetching frequency as a compression quality signal

Practice Task

  • 用“结构化摘要模板”为你的项目写一版 summary
  • 设计 3 个 probe 问题验证是否保留了关键事实

Integration

This skill connects to:

  • context-degradation - Compression is a mitigation strategy
  • context-optimization - Compression is one optimization technique
  • evaluation - Probe-based evaluation applies to compression testing
  • memory-systems - Compression relates to scratchpad and summary memory patterns

References

Internal reference:

Related skills:

  • context-degradation - Understanding what compression prevents
  • context-optimization - Broader optimization strategies
  • evaluation - Building evaluation frameworks

External resources:

  • Factory Research: Evaluating Context Compression for AI Agents (December 2025)
  • Research on LLM-as-judge evaluation methodology (Zheng et al., 2023)

Skill Metadata

Created: 2025-12-22 Last Updated: 2025-12-22 Author: Agent Skills for Context Engineering Contributors Version: 1.0.0