36

Claude Code Internals: Context & Memory Management

⏱️ 45 min

Claude Code Context Management

This chapter explains how Claude Code manages context during long tasks, and translates those practices into reusable engineering patterns. The goal: when you're building agents or debugging long tasks, you can reliably control context structure, cost, and signal density.

  • Context isn't "more is better" -- it's "minimum high-signal set."
  • Claude Code prefers just-in-time reads over bulk loading.
  • Compaction isn't losing information -- it's reorganizing it.
  • The file system is cheap external memory.
  • Tool outputs are the biggest token cost source and need throttling.

What You'll Learn

  • Claude Code's context management principles and techniques
  • How to set a context budget and compaction trigger
  • How to use the file system for progressive disclosure
  • How to control token cost from tool outputs

Core Model

Claude Code's context management breaks down into three layers:

  1. Fixed layer: Long-lived stable rules (e.g., CLAUDE.md, system constraints)
  2. Task layer: Current task goals, acceptance criteria, key facts
  3. Dynamic layer: Search results, tool outputs, execution logs

The core principle: keep the fixed and task layers lightweight and stable. The dynamic layer loads on demand and gets compressed as the task progresses.

Key Practices

1) Context Budgeting

  • Set a budget before starting the task
  • Break the task into small phases, each with a token cost limit
  • Trigger compact when context exceeds a threshold (e.g., 70-80%)

2) Progressive Disclosure

Don't load big chunks all at once. Get file paths or headings first, then read on demand.

  • Use rg to find relevant files
  • Use head/tail or partial reads
  • Only keep information needed for the current decision

3) Tool Output Throttling

Most context cost comes from tool outputs. Control strategies include:

  • Summarize outputs instead of keeping full text
  • Paginate / filter / truncate
  • Prefer structured output (tables / JSON / lists)

4) Compaction Strategy

Compaction's goal is "keep critical info + remove redundancy." Suggested structure:

  • Files touched
  • Decisions made
  • Open questions
  • Next actions

Example Workflow

  1. Load rules: CLAUDE.md + AGENTS.md
  2. Locate target files: rg --files -g "*.md" src/content/learn/ai-engineer
  3. Read only what's needed: Check similar file structures first
  4. Generate content and write
  5. Review context: Any redundant outputs?

Anti-Patterns

  • Loading an entire doc library at once
  • Letting huge tool outputs overwrite context
  • Not compacting during long tasks until it's unmanageable
  • Re-reading the same content without summarizing

Checklist

  • Is the context budget defined?
  • Is there progressive disclosure?
  • Are tool outputs capped?
  • Is there a compaction trigger?
  • Are rules centralized in CLAUDE.md/AGENTS.md?

Practice Task

  • Use this chapter's three-layer model to map context for a real project
  • Set an 80% compaction trigger threshold and design a summary template
  • Pick a tool output scenario and implement pagination or summarization

Skill Metadata

Created: 2025-12-26 Last Updated: 2025-12-26 Author: JR Academy Version: 1.0.0

📚 相关资源

❓ 常见问题

关于本章主题最常被搜索的问题,点击展开答案

Claude Code 的 context 三层模型是哪三层?

(1) 固定层 — CLAUDE.md、长期系统约束,每次推理都加载;(2) 任务层 — 本次任务目标、验收标准、关键事实;(3) 动态层 — 搜索结果、工具输出、执行记录。前两层保持轻量稳定,动态层按需加载并随进度被压缩。乱了的根源 90% 是把动态层的东西塞进固定层。

为什么说 tool outputs 是最大的 token 成本来源?

一次 rg 全文搜索、cat 大文件、API 列表查询,单次返回轻松 5K-20K token,比对话本身 token 多几倍。控制策略三件套:摘要而非全文、分页/筛选/截断、结构化输出(表格/JSON)。Claude Code 默认对 tool response 设 25,000 token 上限,就是为了不让动态层吞掉整个 context。

Compact 触发阈值设多少合适?

70-80% 触发偏保守,留缓冲处理 compaction 本身的 token;90-95% 触发偏激进,模型可能已经开始失焦。本章建议 70-80% 起步,配合结构化 summary 模板(Files touched / Decisions made / Open questions / Next actions),保留信息密度同时回收 token。Claude Code 自身用的是 95% auto-compact。

Progressive disclosure 在 Claude Code 里具体怎么做?

三步:rg --files 找文件路径、head/tail 或 offset/limit 读片段、只把当前决策必需的内容留在 context。原则是「先拿引用、再读片段」,对应文件系统当低成本外部记忆 — 路径只占几十字符,原文上千行只在用得着时拉。反模式是开场就把整个 docs/ 全 cat 进来。

怎么判断我的 context 管理 anti-pattern 了?

本章四条红线:(1) 一次性加载整个文档库;(2) 用超长 tool output 覆盖 context;(3) 长任务不 compact,累积到不可控;(4) 重复读取同一文件却不总结。出现任何一条就回到 checklist:是否设了 budget?是否有 progressive disclosure?是否限制 tool outputs?是否设了 compaction trigger?