What are the three layers in Claude Code's context model?

(1) Fixed layer — CLAUDE.md and durable system constraints, loaded every turn; (2) Task layer — current goals, acceptance criteria, key facts; (3) Dynamic layer — search results, tool outputs, execution records. Keep the top two layers small and stable; the dynamic layer should be loaded just-in-time and compacted as the task progresses. About 90% of context chaos comes from leaking dynamic-layer content into the fixed layer.

Why are tool outputs called the biggest token cost driver?

A single rg dump, cat on a large file, or API list call easily returns 5K-20K tokens — multiples of the conversation itself. The control trio: summarize instead of returning verbatim, paginate / filter / truncate, prefer structured outputs (table / JSON). Claude Code caps tool responses at 25,000 tokens by default precisely so the dynamic layer cannot devour the entire context.

What is a sensible threshold for triggering compaction?

70-80% is the conservative trigger and leaves headroom for the compaction itself; 90-95% is aggressive and the model may already be losing focus. The chapter recommends starting at 70-80% with a structured summary template (Files touched / Decisions made / Open questions / Next actions) so signal density survives while tokens get reclaimed. Claude Code itself runs 95% auto-compact.

How is progressive disclosure done concretely in Claude Code?

Three steps: rg --files to grab paths, head / tail or offset / limit to read fragments, keep only what the current decision actually needs. The principle is "reference first, fragment later" — the file system acts as cheap external memory; a path costs tens of characters while the full thousand-line file is only pulled when actually needed. The anti-pattern is cat-ing the entire docs/ tree up front.

How do I know my context management has slid into anti-patterns?

The chapter's four red flags: (1) loading an entire doc library at once; (2) one giant tool output blanket-covering context; (3) long tasks running without compaction until everything is unmanageable; (4) re-reading the same file repeatedly without summarizing. Hitting any of them means walking back through the checklist — was a budget set, is progressive disclosure in place, are tool outputs throttled, is a compaction trigger configured.

Claude Code Internals: Context & Memory Management

⏱️ 45 min

Claude Code Context Management

This chapter explains how Claude Code manages context during long tasks, and translates those practices into reusable engineering patterns. The goal: when you're building agents or debugging long tasks, you can reliably control context structure, cost, and signal density.

Context isn't "more is better" -- it's "minimum high-signal set."
Claude Code prefers just-in-time reads over bulk loading.
Compaction isn't losing information -- it's reorganizing it.
The file system is cheap external memory.
Tool outputs are the biggest token cost source and need throttling.

What You'll Learn

Claude Code's context management principles and techniques
How to set a context budget and compaction trigger
How to use the file system for progressive disclosure
How to control token cost from tool outputs

Core Model

Claude Code's context management breaks down into three layers:

Fixed layer: Long-lived stable rules (e.g., CLAUDE.md, system constraints)
Task layer: Current task goals, acceptance criteria, key facts
Dynamic layer: Search results, tool outputs, execution logs

The core principle: keep the fixed and task layers lightweight and stable. The dynamic layer loads on demand and gets compressed as the task progresses.

Key Practices

1) Context Budgeting

Set a budget before starting the task
Break the task into small phases, each with a token cost limit
Trigger compact when context exceeds a threshold (e.g., 70-80%)

2) Progressive Disclosure

Don't load big chunks all at once. Get file paths or headings first, then read on demand.

Use rg to find relevant files
Use head/tail or partial reads
Only keep information needed for the current decision

3) Tool Output Throttling

Most context cost comes from tool outputs. Control strategies include:

Summarize outputs instead of keeping full text
Paginate / filter / truncate
Prefer structured output (tables / JSON / lists)

4) Compaction Strategy

Compaction's goal is "keep critical info + remove redundancy." Suggested structure:

Files touched
Decisions made
Open questions
Next actions

Example Workflow

Load rules: CLAUDE.md + AGENTS.md
Locate target files: rg --files -g "*.md" src/content/learn/ai-engineer
Read only what's needed: Check similar file structures first
Generate content and write
Review context: Any redundant outputs?

Anti-Patterns

Loading an entire doc library at once
Letting huge tool outputs overwrite context
Not compacting during long tasks until it's unmanageable
Re-reading the same content without summarizing

Checklist

Is the context budget defined?
Is there progressive disclosure?
Are tool outputs capped?
Is there a compaction trigger?
Are rules centralized in CLAUDE.md/AGENTS.md?

Practice Task

Use this chapter's three-layer model to map context for a real project
Set an 80% compaction trigger threshold and design a summary template
Pick a tool output scenario and implement pagination or summarization

Skill Metadata

Created: 2025-12-26 Last Updated: 2025-12-26 Author: JR Academy Version: 1.0.0

📚 相关资源

❓ 常见问题

关于本章主题最常被搜索的问题，点击展开答案

Claude Code 的 context 三层模型是哪三层？

(1) 固定层 — CLAUDE.md、长期系统约束，每次推理都加载；(2) 任务层 — 本次任务目标、验收标准、关键事实；(3) 动态层 — 搜索结果、工具输出、执行记录。前两层保持轻量稳定，动态层按需加载并随进度被压缩。乱了的根源 90% 是把动态层的东西塞进固定层。

为什么说 tool outputs 是最大的 token 成本来源？

一次 rg 全文搜索、cat 大文件、API 列表查询，单次返回轻松 5K-20K token，比对话本身 token 多几倍。控制策略三件套：摘要而非全文、分页/筛选/截断、结构化输出（表格/JSON）。Claude Code 默认对 tool response 设 25,000 token 上限，就是为了不让动态层吞掉整个 context。

Compact 触发阈值设多少合适？

70-80% 触发偏保守，留缓冲处理 compaction 本身的 token；90-95% 触发偏激进，模型可能已经开始失焦。本章建议 70-80% 起步，配合结构化 summary 模板（Files touched / Decisions made / Open questions / Next actions），保留信息密度同时回收 token。Claude Code 自身用的是 95% auto-compact。

Progressive disclosure 在 Claude Code 里具体怎么做？

三步：rg --files 找文件路径、head/tail 或 offset/limit 读片段、只把当前决策必需的内容留在 context。原则是「先拿引用、再读片段」，对应文件系统当低成本外部记忆 — 路径只占几十字符，原文上千行只在用得着时拉。反模式是开场就把整个 docs/ 全 cat 进来。

怎么判断我的 context 管理 anti-pattern 了？

本章四条红线：(1) 一次性加载整个文档库；(2) 用超长 tool output 覆盖 context；(3) 长任务不 compact，累积到不可控；(4) 重复读取同一文件却不总结。出现任何一条就回到 checklist：是否设了 budget？是否有 progressive disclosure？是否限制 tool outputs？是否设了 compaction trigger？