logo
08

Context Compression & Optimization

⏱️ 35 min

Context Compression Strategies

When agent sessions generate massive conversation history, compression becomes a must. The intuitive approach is minimizing tokens-per-request, but the correct target is tokens-per-task: the total tokens needed to finish the job, including re-fetch costs from losing critical information to compression.

The right goal isn't "shortest single request" — it's "lowest total cost to complete the task."

  • Optimize tokens-per-task, not tokens-per-request.
  • Structured summaries beat aggressive compression for long tasks.
  • Artifact trail is the hardest information to preserve.
  • Trigger compression at 70-80% context.
  • Use probe questions to evaluate quality.

What You'll Learn

  • Trade-offs between three mainstream compression strategies
  • Why "structured summarization" is the safest engineering practice
  • How to evaluate compression quality with probe questions

When to Activate

Activate this skill when:

  • Agent sessions exceed context window limits
  • Designing conversation summarization strategies
  • Evaluating different compression approaches for production systems
  • Debugging cases where agents "forget" what files they modified
  • Building evaluation frameworks for compression quality
  • Optimizing long-running coding or debugging sessions

Core Concepts

Context compression is a trade-off between token savings and information loss. Three production-ready approaches:

  1. Anchored Iterative Summarization: Maintains a structured, continuously updated summary containing session intent, file modifications, decisions, and next steps. On trigger, only the newly truncated portion gets summarized and merged. The structure itself forces retention of critical information.

  2. Opaque Compression: Chases maximum compression ratios (99%+), but interpretability is low and you can't verify what was retained.

  3. Regenerative Full Summary: Generates a complete summary each time. Readable, but multi-round compression keeps shedding details.

Key conclusion: structured summaries "force retention" and prevent silent information drift.

Detailed Topics

Why Tokens-Per-Task Matters

Traditional metrics only look at tokens-per-request — that's the wrong optimization target. Once compression drops a file path or error message, the agent re-fetches, re-explores, and ends up consuming more tokens overall.

The correct metric is tokens-per-task: total consumption from task start to completion. Saving 0.5% on tokens but incurring 20% re-fetch overhead makes things more expensive, not less.

The Artifact Trail Problem

Artifact trail is the weakest dimension across all compression methods, scoring only 2.2–2.5/5 in evaluations. Even structured summaries struggle to consistently preserve complete file trails.

Coding agents need to know:

  • Which files were created
  • Which files were modified and what changed
  • Which files were read but not modified
  • Function names, variable names, error messages

This usually requires dedicated mechanisms beyond natural language summaries.

Structured Summary Sections

An effective structured summary should include:

## Session Intent

[What the user is trying to accomplish]

## Files Modified

-   auth.controller.ts: Fixed JWT token generation
-   config/redis.ts: Updated connection pooling
-   tests/auth.test.ts: Added mock setup for new config

## Decisions Made

-   Using Redis connection pool instead of per-request connections
-   Retry logic with exponential backoff for transient failures

## Current State

-   14 tests passing, 2 failing
-   Remaining: mock setup for session service tests

## Next Steps

1. Fix remaining test failures
2. Run full test suite
3. Update documentation

The point of structure is to "force coverage of critical information" and prevent omissions.

Compression Trigger Strategies

When to compress matters as much as how:

StrategyTrigger PointTrade-off
Fixed threshold70-80% context utilizationSimple but may trigger early
Sliding windowKeep last N turns + summaryPredictable
Importance-basedCompress low-relevance firstComplex but preserves signal
Task-boundaryCompress at task boundariesReadable but unpredictable

For coding agents, sliding window + structured summary is usually the best balance.

Probe-Based Evaluation

ROUGE/embedding similarity can't measure functional fidelity. A summary might "look similar" but be missing a critical file path.

Probe-based evaluation tests retention through questions:

Probe TypeWhat It TestsExample Question
RecallFactual retention"What was the original error message?"
ArtifactFile tracking"Which files have we modified?"
ContinuationTask planning"What should we do next?"
DecisionReasoning chain"What did we decide about the Redis issue?"

Evaluation Dimensions

Six dimensions for measuring compression quality:

  1. Accuracy: Are technical details correct?
  2. Context Awareness: Does it match the current conversation state?
  3. Artifact Trail: Is the file trail complete?
  4. Completeness: Does it cover the key points?
  5. Continuity: Can the task resume seamlessly?
  6. Instruction Following: Are constraints respected?

Accuracy has the widest variance. Artifact Trail is consistently the weakest.

Practical Guidance

Implementing Anchored Iterative Summarization

  1. Define summary sections (tailored to your task type)
  2. First compression: generate a complete structured summary
  3. Subsequent compressions: only summarize the newly truncated portion and merge
  4. Don't regenerate from scratch — that causes detail drift
  5. Record summary provenance for debugging

When to Use Each Approach

Use anchored iterative summarization when:

  • Sessions are long (100+ messages)
  • File tracking is critical
  • You need verifiable information retention

Use opaque compression when:

  • Maximum compression ratio is the priority
  • Sessions are relatively short
  • Re-fetch cost is low

Use regenerative summaries when:

  • Summary readability is paramount
  • There are clear phase boundaries
  • You can accept repeated review passes

Compression Ratio Considerations

MethodCompression RatioQuality ScoreTrade-off
Anchored Iterative98.6%3.70Best quality, slightly less compression
Regenerative98.7%3.44Medium quality
Opaque99.3%3.35Best compression, quality drops

Examples

Example 1: Debugging Session Compression

Original context (89,000 tokens, 178 messages):

  • 401 error on /api/auth/login endpoint
  • Traced through auth controller, middleware, session store
  • Found stale Redis connection
  • Fixed connection pooling, added retry logic
  • 14 tests passing, 2 failing

Structured summary after compression:

## Session Intent

Debug 401 Unauthorized error on /api/auth/login despite valid credentials.

## Root Cause

Stale Redis connection in session store. JWT generated correctly but session could not be persisted.

## Files Modified

-   auth.controller.ts: No changes (read only)
-   middleware/cors.ts: No changes (examined)
-   config/redis.ts: Fixed connection pooling configuration
-   services/session.service.ts: Added retry logic for transient failures
-   tests/auth.test.ts: Updated mock setup

## Test Status

14 passing, 2 failing (mock setup issues)

## Next Steps

1. Fix remaining test failures (mock session service)
2. Run full test suite
3. Deploy to staging

Example 2: Probe Response Quality

After compression, asking "What was the original error?":

Good response (structured summarization):

"The original error was a 401 Unauthorized response from the /api/auth/login endpoint. Users received this error with valid credentials. Root cause was stale Redis connection in session store."

Poor response (aggressive compression):

"We were debugging an authentication issue. The login was failing. We fixed some configuration problems."

Guidelines

  1. Optimize for tokens-per-task, not tokens-per-request
  2. Use structured summaries with explicit sections for file tracking
  3. Trigger compression at 70-80% context utilization
  4. Implement incremental merging rather than full regeneration
  5. Test compression quality with probe-based evaluation
  6. Track artifact trail separately if file tracking is critical
  7. Accept slightly lower compression ratios for better quality retention
  8. Monitor re-fetching frequency as a compression quality signal

Practice Task

  • Write a structured summary template for your project using the sections above
  • Design 3 probe questions to verify whether critical facts were retained

Integration

This skill connects to:

  • context-degradation - Compression is a mitigation strategy
  • context-optimization - Compression is one optimization technique
  • evaluation - Probe-based evaluation applies to compression testing
  • memory-systems - Compression relates to scratchpad and summary memory patterns

References

Related skills:

  • context-degradation - Understanding what compression prevents
  • context-optimization - Broader optimization strategies
  • evaluation - Building evaluation frameworks

External resources:

  • Factory Research: Evaluating Context Compression for AI Agents (December 2025)
  • Research on LLM-as-judge evaluation methodology (Zheng et al., 2023)

Skill Metadata

Created: 2025-12-22 Last Updated: 2025-12-22 Author: Agent Skills for Context Engineering Contributors Version: 1.0.0