Should sub-agent context be isolated or shared?

Pick by step count + parallelism: < 10 steps with strong info dependencies use shared (one LLM runs all), 30+ steps that parallelize use isolated (sub-agent has own context, returns summary), serious reasoning needing traceability uses partial sharing (summary + verbatim key quotes). Anthropic Multi-Agent Research System uses partial sharing.

Is more sub-agents always better?

No. Anthropic's own research system data: 3-5 sub-agents is the sweet spot, past 5 the lead agent's coordination overhead outweighs the gains. The whole system costs 4× the tokens of single-agent, but the quality jump on hard problems is far bigger than the token bill.

How do sub-agents communicate with each other?

JR omni-report uses filesystem as the sub-agent handoff channel: 17 independent routines pass info asynchronously via git commits (Marketing Topics finishes Monday → writes marketing-topics/$DATE.md, Growth Playbook on Tuesday Reads it into its own context). 10× simpler than a message queue, enough for low-frequency cross-agent traffic.

What does a multi-agent system cost per day to run?

Anthropic's own research-system numbers: one complex query with lead agent + 4 sub-agents burns ~80K input + ~15K output tokens = $0.30/query on Sonnet. 100 queries/day = $30/day ≈ $900/mo. 4× the single-agent cost, but the accuracy lift on hard problems far outweighs it.

I am not on Claude Code's Agent tool — can I implement multi-agent with LangGraph instead?

Yes: LangGraph is LangChain's official multi-agent framework — nodes = single agents, edges = handoff protocols, state dict = configurable shared / partial / isolated context. OpenAI Swarm (lightweight) and CrewAI (role-driven) are also common picks. All three are model-agnostic.

What is the most common multi-agent failure mode?

Sub-agent info loss: the lead splits a task to sub-agents, sub-agents return only "done," and the lead cannot reason further. Enforce structured returns from every sub-agent (JSON: result + key_findings + sources); refuse free-text responses. This is a hard constraint in Anthropic's Multi-Agent Research System.

Multi-Agent Context Isolation

Q: Is more sub-agents always better?

No. Anthropic's own research system data: 3-5 sub-agents is the sweet spot, past 5 the lead agent's coordination overhead outweighs the gains. The whole system costs 4× the tokens of single-agent, but the quality jump on hard problems is far bigger than the token bill.

Q: How do sub-agents communicate with each other?

JR omni-report uses filesystem as the sub-agent handoff channel: 17 independent routines pass info asynchronously via git commits (Marketing Topics finishes Monday → writes marketing-topics/$DATE.md, Growth Playbook on Tuesday Reads it into its own context). 10× simpler than a message queue, enough for low-frequency cross-agent traffic.

Q: What does a multi-agent system cost per day to run?

Anthropic's own research-system numbers: one complex query with lead agent + 4 sub-agents burns ~80K input + ~15K output tokens = $0.30/query on Sonnet. 100 queries/day = $30/day ≈ $900/mo. 4× the single-agent cost, but the accuracy lift on hard problems far outweighs it.

Q: I am not on Claude Code's Agent tool — can I implement multi-agent with LangGraph instead?

Yes: LangGraph is LangChain's official multi-agent framework — nodes = single agents, edges = handoff protocols, state dict = configurable shared / partial / isolated context. OpenAI Swarm (lightweight) and CrewAI (role-driven) are also common picks. All three are model-agnostic.

Q: What is the most common multi-agent failure mode?

Sub-agent info loss: the lead splits a task to sub-agents, sub-agents return only "done," and the lead cannot reason further. Enforce structured returns from every sub-agent (JSON: result + key_findings + sources); refuse free-text responses. This is a hard constraint in Anthropic's Multi-Agent Research System.

⏱️ 20 min

Context Isolation in Multi-Agent Systems

Chapter 6 solved "how does a single agent run a long task without blowing up context". Production hits another scale — a main agent dispatching sub-agents. The choice between shared, isolated, or partially shared context decides whether the system can scale.

Three Multi-Agent Context Topologies

1. Shared Context — One LLM Runs the Whole Thing

Every step runs in the same messages array.

messages = [
  user: "调研 Kubernetes 三个最流行的 ingress controller 各自优劣"
  assistant: 我先列三个候选 → ingress-nginx, traefik, contour
  tool_use: WebFetch ingress-nginx docs
  tool_result: ...
  tool_use: WebFetch traefik docs
  tool_result: ...
  ... (累积 50K context)
  assistant: 综合分析后, ingress-nginx 优在生态成熟, traefik 优在...
]

Pro: easiest to implement (no state management); zero information loss.

Con: context grows linearly, the Lost in the Middle problem bites around step 30; every step pays the accumulated token cost.

Fits: under 10 steps with strong dependencies between steps.

2. Isolated Context — Sub-Agents Fully Independent

The sub-agent gets a fresh context — just task description + needed reference material. After running, it returns a final answer + short summary, not the process.

主 agent context:
  user: "调研 K8s 三个 ingress controller"
  assistant: 我会派 3 个 sub-agent 并行调研
  tool_use: spawn_subagent(name="research_ingress_nginx", task="...")
  tool_result: { summary: "ingress-nginx 优在生态成熟...", refs: [...] }
  tool_use: spawn_subagent(name="research_traefik", task="...")
  tool_result: { summary: "traefik 优在配置灵活...", refs: [...] }
  ...
  assistant: 综合三个 sub-agent 的总结...

# 每个 sub-agent 自己的 context（独立）：
  user: "调研 ingress-nginx：架构、性能、生态、坑"
  tool_use: WebFetch ingress-nginx docs
  ... (自己累积 30K context, 完成后丢弃)
  assistant: 给主 agent 一段 500 字摘要

Pro: main agent context stays small; sub-agents can run in parallel; one blowing up doesn't affect others.

Con: lossy handoff; main can't see intermediate reasoning, has to trust the result; needs spawn / coordinate infrastructure.

Fits: sub-tasks independent and parallelizable; main agent needs to scale past 30 steps.

Sub-agents return a summary, plus the key supporting evidence verbatim. Main agent gets summary + a few key quotes and can verify the sub-agent.

Anthropic's Multi-agent Research System blog pattern — main is Lead Researcher, sub-agents are Search Subagents. Handoff includes source URL + key quotes for verification.

Cost: handoff payload 5-10× larger than pure summary, but trust goes way up.

Anthropic Multi-Agent Research System — Official Case

Anthropic's 2025-04 blog describes the multi-agent research feature they built into Claude.ai:

Architecture:

LeadResearcher (主)
  └─ 拆解用户 query → 决定派几个 sub-agent
  └─ 派 SearchSubagent 1: "找 X 主题的最新 paper"
  └─ 派 SearchSubagent 2: "找 Y 公司的官方文档"
  └─ 派 SearchSubagent 3: "找用户评测和实际案例"
  └─ 收集所有 sub-agent 摘要 → 综合写最终 report

Anthropic's engineering takeaways (direct quotes):

"Sub-agent context can't be shared — parallel sub-agents not knowing what others do actually avoids duplicate work and group think"
"Optimal sub-agent count is 3-5; past 5, lead coordination cost outweighs the gain"
"Sub-agents handing raw search results to lead is wrong — they have to distill into findings first"
"The whole system uses 4× more tokens than a single-agent baseline, but the quality lift far outweighs the token cost"

Claude Code's Agent Tool — Sub-Agent in Production

Claude Code's built-in Agent tool is the sub-agent pattern (this course was very likely written by Claude Code — main agent dispatches an Explore subagent to look up prompt-master configs. The main agent doesn't read the 200-line config raw, just the structured summary returned).

Config:

subagent_type — different sub-agents get different tool sets (Explore can only read, code-architect can design but not edit)
isolation: "worktree" — high-blast-radius tasks dispatch to a sub-agent in a separate git worktree, then merge or discard
Description and Prompt — sub-agent only sees its prompt + own tool list at startup

"Isolated context + lossy handoff" productized — main delegates, protects its own context.

JR Real Case: Isolation Across omni-report's 17 Routines

JR Academy's omni-report runs 17 independent routines (AI Visibility / Competitor Weekly / Marketing Topics / Growth Playbook / Daily Jobs ×4 / various daily reports). Each is its own cron job — independent context, git commit, Notion sync.

Why not merge into a single "omni-report master agent"?

Context isolation — Competitor Weekly's 50K context shouldn't pollute Daily Jobs
Failure isolation — one crashing doesn't stop the other 16
Observability — each monitored separately
Re-runnable — any can be triggered alone (the 13th routine's idle-timeout retry only affected that one)

Cross-routine flow: async via git commits. Marketing Topics runs Monday and writes to marketing-topics/$DATE.md. Growth Playbook runs Tuesday and Reads last week's report into its own context. "Filesystem as sub-agent handoff channel" — no message queue, git is enough.

Shared / Isolated / Partial — Trade-off

Dimension	Shared context	Isolated + summary	Partial sharing
Main agent context growth	Linear, painful past step 30	Almost flat	Slow growth (summary + a little raw)
Information loss	0	Large (only summary)	Medium (summary + key citations)
Sub-agent parallelism	No (serial only)	Perfect parallel	Perfect parallel
Infrastructure	0 (one LLM)	spawn / coordinate / summarizer	+ citation extraction
Token cost	Medium (linear growth)	High (multiple LLMs)	Higher (summary + citations)
Trust (can main verify sub)	Perfect	Weak	Medium (can see raw evidence)
Fits	< 10 steps, strong dependency	30+ steps, parallelizable	Serious reasoning, traceability needed

JR's internal rule: under 10 steps single LLM. 10-30 partial sharing. 30+ isolated sub-agent. The curve Anthropic verified building their own research system.

Takeaway

Multi-agent isn't "split as fine as possible". Optimal sub-agent count is 3-5 — past 5, coordination cost eats the gain. Sub-agents have to distill findings before handing back; raw data still blows up the main agent. Anthropic's own research system uses 4× tokens but the quality lift far outweighs cost — proving "more tokens but layered" beats "fewer tokens but tangled".

References

Anthropic. (2025-04-15). How we built our multi-agent research system — Lead + Search subagent architecture and the 3-5 sub-agent rule of thumb.
Anthropic. Claude Code documentation — Agent tool — sub-agent type / isolation / handoff implementation.
Anthropic. (2024-12-20). Building Effective Agents — original orchestrator-workers pattern.
LangGraph. Multi-agent supervisor pattern — open-source equivalent.
AutoGen. GitHub — Microsoft's multi-agent framework, comparative reference.

Production case: JR Academy omni-report — 17 independent routines using git commits for async cross-agent handoff, filesystem as sub-agent communication channel.

📚 相关资源

Anthropic Multi-Agent Research System

❓ 常见问题

关于本章主题最常被搜索的问题，点击展开答案

Sub-agent context 该独立还是共享？

按步数和并行需求分：< 10 步强信息依赖用 shared（一个 LLM 跑全程），30+ 步可并行用 isolated（sub-agent 独立 context + 摘要回传），严肃推理 + 要 traceability 用 partial sharing（摘要 + 关键引用原文）。Anthropic Multi-Agent Research System 用 partial sharing。

Sub-agent 数量越多越好吗？

不是。Anthropic 自建 research system 数据：3-5 个 sub-agent 最佳，超过 5 个 lead agent 协调成本超过收益。整个 system 比 single-agent 多用 4× token，但难题质量提升远大于 token 成本。

Sub-agent 之间怎么通信？

JR omni-report 用 filesystem 作 sub-agent handoff 通道：17 个独立 routine 通过 git commit 异步传信息（Marketing Topics 周一跑完写 marketing-topics/$DATE.md，Growth Playbook 周二 Read 进自己 context）。比 message queue 简单 10 倍，低频 cross-agent 通信够用。

多 agent 系统跑一天大概多贵？

按 Anthropic 自家 research system 数据：单次复杂 query lead agent + 4 sub-agent 总耗 ~80K input + ~15K output token = Sonnet $0.30/query。100 query/天 = $30/天 ≈ $900/月。比 single-agent 贵 4× 但难题正确率提升远超成本。

我没用 Claude Code 的 Agent tool，自己用 LangGraph 实现行吗？

完全可以：LangGraph 是 LangChain 官方的多 agent 框架，节点 = 单 agent、边 = handoff 协议、state 字典 = shared / partial / isolated context 都能配。OpenAI Swarm（轻量）、CrewAI（角色驱动）也是常见选择。三个都和模型解耦。

多 agent 最常见的失败模式是什么？

Sub-agent 信息丢失：lead agent 把 task 拆给 sub-agent，sub-agent 跑完只回一句「done」，lead 不知道做了什么、不能基于结果继续推理。强制要求每个 sub-agent 返回结构化结果（JSON：result + key_findings + sources），不收自由文本。Anthropic Multi-Agent Research System 的硬性约束。