logo
09

Multi-Agent Context Isolation

⏱️ 20 min

Context Isolation in Multi-Agent Systems

Chapter 6 solved "how does a single agent run a long task without blowing up context". Production hits another scale — a main agent dispatching sub-agents. The choice between shared, isolated, or partially shared context decides whether the system can scale.

Three Multi-Agent Context Topologies

1. Shared Context — One LLM Runs the Whole Thing

Every step runs in the same messages array.

messages = [
  user: "调研 Kubernetes 三个最流行的 ingress controller 各自优劣"
  assistant: 我先列三个候选 → ingress-nginx, traefik, contour
  tool_use: WebFetch ingress-nginx docs
  tool_result: ...
  tool_use: WebFetch traefik docs
  tool_result: ...
  ... (累积 50K context)
  assistant: 综合分析后, ingress-nginx 优在生态成熟, traefik 优在...
]

Pro: easiest to implement (no state management); zero information loss.

Con: context grows linearly, the Lost in the Middle problem bites around step 30; every step pays the accumulated token cost.

Fits: under 10 steps with strong dependencies between steps.

2. Isolated Context — Sub-Agents Fully Independent

The sub-agent gets a fresh context — just task description + needed reference material. After running, it returns a final answer + short summary, not the process.

主 agent context:
  user: "调研 K8s 三个 ingress controller"
  assistant: 我会派 3 个 sub-agent 并行调研
  tool_use: spawn_subagent(name="research_ingress_nginx", task="...")
  tool_result: { summary: "ingress-nginx 优在生态成熟...", refs: [...] }
  tool_use: spawn_subagent(name="research_traefik", task="...")
  tool_result: { summary: "traefik 优在配置灵活...", refs: [...] }
  ...
  assistant: 综合三个 sub-agent 的总结...

# 每个 sub-agent 自己的 context(独立):
  user: "调研 ingress-nginx:架构、性能、生态、坑"
  tool_use: WebFetch ingress-nginx docs
  ... (自己累积 30K context, 完成后丢弃)
  assistant: 给主 agent 一段 500 字摘要

Pro: main agent context stays small; sub-agents can run in parallel; one blowing up doesn't affect others.

Con: lossy handoff; main can't see intermediate reasoning, has to trust the result; needs spawn / coordinate infrastructure.

Fits: sub-tasks independent and parallelizable; main agent needs to scale past 30 steps.

3. Partial Sharing — Summary Plus Key Raw Data

Sub-agents return a summary, plus the key supporting evidence verbatim. Main agent gets summary + a few key quotes and can verify the sub-agent.

Anthropic's Multi-agent Research System blog pattern — main is Lead Researcher, sub-agents are Search Subagents. Handoff includes source URL + key quotes for verification.

Cost: handoff payload 5-10× larger than pure summary, but trust goes way up.

Anthropic Multi-Agent Research System — Official Case

Anthropic's 2025-04 blog describes the multi-agent research feature they built into Claude.ai:

Architecture:

LeadResearcher (主)
  └─ 拆解用户 query → 决定派几个 sub-agent
  └─ 派 SearchSubagent 1: "找 X 主题的最新 paper"
  └─ 派 SearchSubagent 2: "找 Y 公司的官方文档"
  └─ 派 SearchSubagent 3: "找用户评测和实际案例"
  └─ 收集所有 sub-agent 摘要 → 综合写最终 report

Anthropic's engineering takeaways (direct quotes):

  • "Sub-agent context can't be shared — parallel sub-agents not knowing what others do actually avoids duplicate work and group think"
  • "Optimal sub-agent count is 3-5; past 5, lead coordination cost outweighs the gain"
  • "Sub-agents handing raw search results to lead is wrong — they have to distill into findings first"
  • "The whole system uses 4× more tokens than a single-agent baseline, but the quality lift far outweighs the token cost"

Claude Code's Agent Tool — Sub-Agent in Production

Claude Code's built-in Agent tool is the sub-agent pattern (this course was very likely written by Claude Code — main agent dispatches an Explore subagent to look up prompt-master configs. The main agent doesn't read the 200-line config raw, just the structured summary returned).

Config:

  • subagent_type — different sub-agents get different tool sets (Explore can only read, code-architect can design but not edit)
  • isolation: "worktree" — high-blast-radius tasks dispatch to a sub-agent in a separate git worktree, then merge or discard
  • Description and Prompt — sub-agent only sees its prompt + own tool list at startup

"Isolated context + lossy handoff" productized — main delegates, protects its own context.

JR Real Case: Isolation Across omni-report's 17 Routines

JR Academy's omni-report runs 17 independent routines (AI Visibility / Competitor Weekly / Marketing Topics / Growth Playbook / Daily Jobs ×4 / various daily reports). Each is its own cron job — independent context, git commit, Notion sync.

Why not merge into a single "omni-report master agent"?

Cross-routine flow: async via git commits. Marketing Topics runs Monday and writes to marketing-topics/$DATE.md. Growth Playbook runs Tuesday and Reads last week's report into its own context. "Filesystem as sub-agent handoff channel" — no message queue, git is enough.

Shared / Isolated / Partial — Trade-off

DimensionShared contextIsolated + summaryPartial sharing
Main agent context growthLinear, painful past step 30Almost flatSlow growth (summary + a little raw)
Information loss0Large (only summary)Medium (summary + key citations)
Sub-agent parallelismNo (serial only)Perfect parallelPerfect parallel
Infrastructure0 (one LLM)spawn / coordinate / summarizer+ citation extraction
Token costMedium (linear growth)High (multiple LLMs)Higher (summary + citations)
Trust (can main verify sub)PerfectWeakMedium (can see raw evidence)
Fits< 10 steps, strong dependency30+ steps, parallelizableSerious reasoning, traceability needed

JR's internal rule: under 10 steps single LLM. 10-30 partial sharing. 30+ isolated sub-agent. The curve Anthropic verified building their own research system.

Takeaway

Multi-agent isn't "split as fine as possible". Optimal sub-agent count is 3-5 — past 5, coordination cost eats the gain. Sub-agents have to distill findings before handing back; raw data still blows up the main agent. Anthropic's own research system uses 4× tokens but the quality lift far outweighs cost — proving "more tokens but layered" beats "fewer tokens but tangled".


References

  1. Anthropic. (2025-04-15). How we built our multi-agent research system — Lead + Search subagent architecture and the 3-5 sub-agent rule of thumb.
  2. Anthropic. Claude Code documentation — Agent tool — sub-agent type / isolation / handoff implementation.
  3. Anthropic. (2024-12-20). Building Effective Agents — original orchestrator-workers pattern.
  4. LangGraph. Multi-agent supervisor pattern — open-source equivalent.
  5. AutoGen. GitHub — Microsoft's multi-agent framework, comparative reference.

Production case: JR Academy omni-report — 17 independent routines using git commits for async cross-agent handoff, filesystem as sub-agent communication channel.

📚 相关资源

❓ 常见问题

关于本章主题最常被搜索的问题,点击展开答案

Sub-agent context 该独立还是共享?

按步数和并行需求分:< 10 步强信息依赖用 shared(一个 LLM 跑全程),30+ 步可并行用 isolated(sub-agent 独立 context + 摘要回传),严肃推理 + 要 traceability 用 partial sharing(摘要 + 关键引用原文)。Anthropic Multi-Agent Research System 用 partial sharing。

Sub-agent 数量越多越好吗?

不是。Anthropic 自建 research system 数据:3-5 个 sub-agent 最佳,超过 5 个 lead agent 协调成本超过收益。整个 system 比 single-agent 多用 4× token,但难题质量提升远大于 token 成本。

Sub-agent 之间怎么通信?

JR omni-report 用 filesystem 作 sub-agent handoff 通道:17 个独立 routine 通过 git commit 异步传信息(Marketing Topics 周一跑完写 marketing-topics/$DATE.md,Growth Playbook 周二 Read 进自己 context)。比 message queue 简单 10 倍,低频 cross-agent 通信够用。

多 agent 系统跑一天大概多贵?

按 Anthropic 自家 research system 数据:单次复杂 query lead agent + 4 sub-agent 总耗 ~80K input + ~15K output token = Sonnet $0.30/query。100 query/天 = $30/天 ≈ $900/月。比 single-agent 贵 4× 但难题正确率提升远超成本。

我没用 Claude Code 的 Agent tool,自己用 LangGraph 实现行吗?

完全可以:LangGraph 是 LangChain 官方的多 agent 框架,节点 = 单 agent、边 = handoff 协议、state 字典 = shared / partial / isolated context 都能配。OpenAI Swarm(轻量)、CrewAI(角色驱动)也是常见选择。三个都和模型解耦。

多 agent 最常见的失败模式是什么?

Sub-agent 信息丢失:lead agent 把 task 拆给 sub-agent,sub-agent 跑完只回一句「done」,lead 不知道做了什么、不能基于结果继续推理。强制要求每个 sub-agent 返回结构化结果(JSON:result + key_findings + sources),不收自由文本。Anthropic Multi-Agent Research System 的硬性约束。