Should context compression aim for the lowest possible token count?

No. Optimise for tokens-per-task (total cost to finish the job), not tokens-per-request (lowest single call). Squeeze too hard and you lose file paths, error messages and other critical bits — the agent then re-fetches and re-explores, costing far more overall. Saving 0.5% per call while incurring 20% re-fetch cost is a net loss.

Which of the three compression strategies fits which scenario?

(1) Anchored Iterative Summarization — structured summary kept up to date; 98.6% compression, quality score 3.70; best for long sessions (100+ messages) where file tracking matters. (2) Opaque Compression — 99.3% ratio but low explainability; fits short sessions where re-fetch is cheap. (3) Regenerative Full Summary — high readability but loses detail across rounds; works when phase boundaries are clear.

When should compression trigger — 70% or 90% utilisation?

70-80% is the sweet spot. Common triggers: fixed threshold (70-80% utilisation — simple but can fire early); sliding window (keep last N turns + summary — predictable); importance-based (compress low-relevance first — complex but preserves signal); task-boundary (compress at task edges — readable but unpredictable). For coding agents the sliding window + structured summary combo is the most balanced.

Which sections should a structured summary contain?

Five fixed sections: ## Session Intent (the user's goal), ## Files Modified (with specific edits noted), ## Decisions Made (key calls), ## Current State (tests pass/fail, current progress), ## Next Steps (concrete next actions). The whole point of structure is to force the critical bits in — artifact trail is the weakest evaluated dimension (2.2-2.5/5) and must be listed explicitly.

How do I verify compression has not dropped critical information?

Use probe-based evaluation, not just ROUGE / embedding similarity (those measure surface likeness, not functional fidelity). Ask four probe types: Recall (what was the original error?), Artifact (which files did we modify?), Continuation (what should we do next?), Decision (what did we decide about Redis?). Then score on six dimensions: Accuracy / Context Awareness / Artifact Trail / Completeness / Continuity / Instruction Following.

Context Compression & Optimization

⏱️ 35 min

Context Compression Strategies

When agent sessions generate massive conversation history, compression becomes a must. The intuitive approach is minimizing tokens-per-request, but the correct target is tokens-per-task: the total tokens needed to finish the job, including re-fetch costs from losing critical information to compression.

The right goal isn't "shortest single request" — it's "lowest total cost to complete the task."

Optimize tokens-per-task, not tokens-per-request.
Structured summaries beat aggressive compression for long tasks.
Artifact trail is the hardest information to preserve.
Trigger compression at 70-80% context.
Use probe questions to evaluate quality.

What You'll Learn

Trade-offs between three mainstream compression strategies
Why "structured summarization" is the safest engineering practice
How to evaluate compression quality with probe questions

When to Activate

Activate this skill when:

Agent sessions exceed context window limits
Designing conversation summarization strategies
Evaluating different compression approaches for production systems
Debugging cases where agents "forget" what files they modified
Building evaluation frameworks for compression quality
Optimizing long-running coding or debugging sessions

Core Concepts

Context compression is a trade-off between token savings and information loss. Three production-ready approaches:

Anchored Iterative Summarization: Maintains a structured, continuously updated summary containing session intent, file modifications, decisions, and next steps. On trigger, only the newly truncated portion gets summarized and merged. The structure itself forces retention of critical information.
Opaque Compression: Chases maximum compression ratios (99%+), but interpretability is low and you can't verify what was retained.
Regenerative Full Summary: Generates a complete summary each time. Readable, but multi-round compression keeps shedding details.

Key conclusion: structured summaries "force retention" and prevent silent information drift.

Detailed Topics

Why Tokens-Per-Task Matters

Traditional metrics only look at tokens-per-request — that's the wrong optimization target. Once compression drops a file path or error message, the agent re-fetches, re-explores, and ends up consuming more tokens overall.

The correct metric is tokens-per-task: total consumption from task start to completion. Saving 0.5% on tokens but incurring 20% re-fetch overhead makes things more expensive, not less.

The Artifact Trail Problem

Artifact trail is the weakest dimension across all compression methods, scoring only 2.2–2.5/5 in evaluations. Even structured summaries struggle to consistently preserve complete file trails.

Coding agents need to know:

Which files were created
Which files were modified and what changed
Which files were read but not modified
Function names, variable names, error messages

This usually requires dedicated mechanisms beyond natural language summaries.

Structured Summary Sections

An effective structured summary should include:

## Session Intent

[What the user is trying to accomplish]

## Files Modified

-   auth.controller.ts: Fixed JWT token generation
-   config/redis.ts: Updated connection pooling
-   tests/auth.test.ts: Added mock setup for new config

## Decisions Made

-   Using Redis connection pool instead of per-request connections
-   Retry logic with exponential backoff for transient failures

## Current State

-   14 tests passing, 2 failing
-   Remaining: mock setup for session service tests

## Next Steps

1. Fix remaining test failures
2. Run full test suite
3. Update documentation

The point of structure is to "force coverage of critical information" and prevent omissions.

Compression Trigger Strategies

When to compress matters as much as how:

Strategy	Trigger Point	Trade-off
Fixed threshold	70-80% context utilization	Simple but may trigger early
Sliding window	Keep last N turns + summary	Predictable
Importance-based	Compress low-relevance first	Complex but preserves signal
Task-boundary	Compress at task boundaries	Readable but unpredictable

For coding agents, sliding window + structured summary is usually the best balance.

Probe-Based Evaluation

ROUGE/embedding similarity can't measure functional fidelity. A summary might "look similar" but be missing a critical file path.

Probe-based evaluation tests retention through questions:

Probe Type	What It Tests	Example Question
Recall	Factual retention	"What was the original error message?"
Artifact	File tracking	"Which files have we modified?"
Continuation	Task planning	"What should we do next?"
Decision	Reasoning chain	"What did we decide about the Redis issue?"

Evaluation Dimensions

Six dimensions for measuring compression quality:

Accuracy: Are technical details correct?
Context Awareness: Does it match the current conversation state?
Artifact Trail: Is the file trail complete?
Completeness: Does it cover the key points?
Continuity: Can the task resume seamlessly?
Instruction Following: Are constraints respected?

Accuracy has the widest variance. Artifact Trail is consistently the weakest.

Practical Guidance

Implementing Anchored Iterative Summarization

Define summary sections (tailored to your task type)
First compression: generate a complete structured summary
Subsequent compressions: only summarize the newly truncated portion and merge
Don't regenerate from scratch — that causes detail drift
Record summary provenance for debugging

When to Use Each Approach

Use anchored iterative summarization when:

Sessions are long (100+ messages)
File tracking is critical
You need verifiable information retention

Use opaque compression when:

Maximum compression ratio is the priority
Sessions are relatively short
Re-fetch cost is low

Use regenerative summaries when:

Summary readability is paramount
There are clear phase boundaries
You can accept repeated review passes

Compression Ratio Considerations

Method	Compression Ratio	Quality Score	Trade-off
Anchored Iterative	98.6%	3.70	Best quality, slightly less compression
Regenerative	98.7%	3.44	Medium quality
Opaque	99.3%	3.35	Best compression, quality drops

Examples

Example 1: Debugging Session Compression

Original context (89,000 tokens, 178 messages):

401 error on /api/auth/login endpoint
Traced through auth controller, middleware, session store
Found stale Redis connection
Fixed connection pooling, added retry logic
14 tests passing, 2 failing

Structured summary after compression:

## Session Intent

Debug 401 Unauthorized error on /api/auth/login despite valid credentials.

## Root Cause

Stale Redis connection in session store. JWT generated correctly but session could not be persisted.

## Files Modified

-   auth.controller.ts: No changes (read only)
-   middleware/cors.ts: No changes (examined)
-   config/redis.ts: Fixed connection pooling configuration
-   services/session.service.ts: Added retry logic for transient failures
-   tests/auth.test.ts: Updated mock setup

## Test Status

14 passing, 2 failing (mock setup issues)

## Next Steps

1. Fix remaining test failures (mock session service)
2. Run full test suite
3. Deploy to staging

Example 2: Probe Response Quality

After compression, asking "What was the original error?":

Good response (structured summarization):

"The original error was a 401 Unauthorized response from the /api/auth/login endpoint. Users received this error with valid credentials. Root cause was stale Redis connection in session store."

Poor response (aggressive compression):

"We were debugging an authentication issue. The login was failing. We fixed some configuration problems."

Guidelines

Optimize for tokens-per-task, not tokens-per-request
Use structured summaries with explicit sections for file tracking
Trigger compression at 70-80% context utilization
Implement incremental merging rather than full regeneration
Test compression quality with probe-based evaluation
Track artifact trail separately if file tracking is critical
Accept slightly lower compression ratios for better quality retention
Monitor re-fetching frequency as a compression quality signal

Practice Task

Write a structured summary template for your project using the sections above
Design 3 probe questions to verify whether critical facts were retained

Integration

This skill connects to:

context-degradation - Compression is a mitigation strategy
context-optimization - Compression is one optimization technique
evaluation - Probe-based evaluation applies to compression testing
memory-systems - Compression relates to scratchpad and summary memory patterns

References

Related skills:

context-degradation - Understanding what compression prevents
context-optimization - Broader optimization strategies
evaluation - Building evaluation frameworks

External resources:

Factory Research: Evaluating Context Compression for AI Agents (December 2025)
Research on LLM-as-judge evaluation methodology (Zheng et al., 2023)

Skill Metadata

Created: 2025-12-22 Last Updated: 2025-12-22 Author: Agent Skills for Context Engineering Contributors Version: 1.0.0

📚 相关资源

❓ 常见问题

关于本章主题最常被搜索的问题，点击展开答案

压缩 context 应该追求 token 最少吗？

不要。目标是 tokens-per-task（任务总成本），不是 tokens-per-request（单次最少）。压得太狠丢了 file paths、error messages 这类关键信息，agent 会重新检索/重复探索，整体反而更贵。节省 0.5% tokens 但带来 20% re-fetch 成本，账算下来负数。

三种 compression 策略各自适合什么场景？

(1) Anchored Iterative Summarization：结构化 summary 持续更新，压缩 98.6% / 质量分 3.70，适合长 session（100+ messages）+ file tracking 关键场景；(2) Opaque Compression：99.3% 压缩率但可解释性低，适合 sessions 短 + re-fetch 成本低；(3) Regenerative Full Summary：可读性高但多轮压缩会丢细节，适合 phase boundary 清晰的任务。

压缩什么时候触发？70% 还是 90%？

70-80% 是甜点。常见触发策略：fixed threshold（70-80% utilization，简单但可能过早）、sliding window（保留最近 N 轮 + summary，可预测）、importance-based（先压低 relevance 内容，复杂但保信号）、task-boundary（任务边界压缩，可读但不可预测）。Coding agent 推荐 sliding window + structured summary 组合最平衡。

structured summary 应该有哪些 section？

5 个固定 section：## Session Intent（用户目标）、## Files Modified（带具体修改说明）、## Decisions Made（关键决策）、## Current State（测试通过/失败、当前进度）、## Next Steps（下一步动作列表）。结构化的目的是「强制覆盖」关键信息——artifact trail（文件轨迹）是评测得分最低的维度（2.2-2.5/5），必须显式列出来。

怎么验证压缩后没丢关键信息？

用 probe-based evaluation，不要只看 ROUGE / embedding similarity（只看「像不像」，不看功能保真）。问 4 类 probe：Recall（原始错误信息是什么？）、Artifact（改了哪些文件？）、Continuation（下一步该做什么？）、Decision（关于 Redis 的决策是什么？）。再按 6 维评估：Accuracy / Context Awareness / Artifact Trail / Completeness / Continuity / Instruction Following。