What's the actual value of multi-agent — is it "role-playing"?

Not role-playing. The real value is context isolation + parallelization. Each sub-agent works in a clean context for its subtask, avoiding the single-agent failure mode where history, docs, and tool outputs pile up and trigger lost-in-middle, attention scarcity, context poisoning. If you treat multi-agent as a "researcher + analyst + editor" role-play, you've only made the system more complex without making it better. The litmus test: is the sub-agent isolating context, or just playing a role?

How do I choose between Supervisor, Swarm, and Hierarchical patterns?

Supervisor/Orchestrator: a central node dispatches tasks and aggregates results — strong control, but the supervisor's context becomes a bottleneck and the telephone-game effect kicks in. Good for clear tasks needing multi-domain coordination + human oversight. Peer-to-peer/Swarm: no center, agents handoff directly (OpenAI Swarm is the canonical example) — no single bottleneck but prone to divergence. Good for exploratory tasks with shifting requirements. Hierarchical: strategy → planning → execution layers — fits large projects, enterprise flows, long-term planning. Start with supervisor at small scale; move to hierarchical when divergence shows; swarm when elasticity rules.

How many more tokens does multi-agent burn, and when is it worth it?

Chapter token multipliers: single-agent chat 1× baseline, single-agent with tools ~4×, multi-agent system ~15×. 15× is not a rounding error — simple queries should never use multi-agent. Research shows performance variance is driven mainly by token usage, tool calls, model choice; a stronger model (Claude Sonnet 4.5, GPT-5.2 thinking) often beats stacking sub-agents. Worth-it test: can it actually parallelize? Is the single agent really hitting the context ceiling? Do subtasks need different system prompts / tool sets? Three yeses, then maybe.

What's the telephone game problem in multi-agent, and how do I avoid it?

In Supervisor mode, the sub-agent's answer goes through the supervisor's "re-summarization" before reaching the user. Every synthesis loses detail, and after a few hops it becomes a telephone game. LangGraph benchmarks confirmed this. Fix: let sub-agents pass-through directly without supervisor synthesis. The chapter's template: `forward_message(message, to_user=True)` ships the sub-agent's response as a `direct_response` straight to the user, skipping supervisor synthesis. Aggregate when you must, but content that can be forwarded directly (detailed data, long-form output) takes the pass-through lane.

Why is naive majority voting unreliable in multi-agent systems?

Because it weights a weak model's hallucination the same as a strong model's reasoning — three agents vote, two GPT-3.5 random guesses can outvote one correct Claude Sonnet answer. Two reliable patterns: (1) weighted voting by model capability; (2) debate protocols where agents challenge each other's evidence. Add trigger-based intervention (stall trigger + sycophancy trigger) to stop collective agreement. The chapter's principle: monitoring systematic bias matters more than chasing absolute agreement rate.

Multi-Agent Patterns

⏱️ 35 min

Multi-Agent Architecture Patterns

Multi-agent architectures distribute work across multiple language model instances, each with its own Context window. Done well, they break past single-agent limitations. Done poorly, they just add coordination overhead. Here's the key insight: sub-agents' core value is Context isolation, not role-playing.

If you treat multi-agent as "role-playing," you'll probably end up with a more complex but not better system. The real value is: Context isolation + parallelization.

Use multi-agent to isolate Context, not to role-play.
Supervisor / swarm / hierarchical are the mainstream patterns.
Token costs are high — only complex tasks justify the overhead.
Avoid the telephone game; allow direct pass-through.
Define explicit handoff and convergence rules.

What You'll Learn

When you need multi-agent and when you don't
Pros and cons of three architectural patterns
How to design collaboration and convergence mechanisms

When to Activate

Activate this skill when:

Single-agent Context limits constrain task complexity
Tasks decompose naturally into parallel subtasks
Different subtasks require different tool sets or system prompts
Building systems that must handle multiple domains simultaneously
Scaling agent capabilities beyond single-context limits
Designing production agent systems with multiple specialized components

Core Concepts

Multi-agent systems solve single-agent limitations through Context distribution. Three mainstream patterns: supervisor/orchestrator, peer-to-peer/swarm, and hierarchical. The core design principle is Context isolation.

Effective multi-agent systems need explicit coordination protocols, consensus mechanisms that avoid sycophancy, and awareness of bottlenecks, divergence, and error propagation.

Detailed Topics

Why Multi-Agent Architectures

The Context Bottleneck A single agent hits ceilings in reasoning, Context management, and tool coordination. As task complexity grows, Context fills up with history, docs, and tool outputs, leading to lost-in-middle effects, attention scarcity, and Context poisoning.

Multi-agent splits tasks across multiple Context windows, reducing the load on any single Context.

The Token Economics Reality Multi-agent consumes significantly more tokens:

Architecture	Token Multiplier	Use Case
Single agent chat	1x baseline	Simple queries
Single agent with tools	~4x baseline	Tool-using tasks
Multi-agent system	~15x baseline	Complex research/coordination

Research shows performance variance is driven primarily by token usage, tool calls, and model choice. Stronger models (like Claude Sonnet 4.5, GPT-5.2 thinking mode) tend to be more effective than just throwing more tokens at the problem.

The Parallelization Argument Many tasks can be split for parallel execution: multi-source retrieval, multi-document analysis, comparing different approaches. A single agent must handle these sequentially; multi-agent can run them in parallel, with total time approaching the longest subtask rather than the sum.

The Specialization Argument Different tasks need different system prompts and tool sets. Multi-agent allows specialization without burdening a single agent with every possible configuration.

Architectural Patterns

Pattern 1: Supervisor/Orchestrator A central supervisor controls flow, dispatches tasks, and aggregates results.

User Query -> Supervisor -> [Specialist, Specialist, Specialist] -> Aggregation -> Final Output

Good for: well-defined tasks, multi-domain coordination, human oversight requirements.

Strength: strong control.

Weakness: supervisor Context easily becomes a bottleneck; prone to the telephone game.

The Telephone Game Problem and Solution LangGraph benchmarks show supervisor architectures tend to lose detail.

The fix: let sub-agents pass responses directly through:

def forward_message(message: str, to_user: bool = True):
    """
    Forward sub-agent response directly to user without supervisor synthesis.
    """
    if to_user:
        return {"type": "direct_response", "content": message}
    return {"type": "supervisor_input", "content": message}

Pattern 2: Peer-to-Peer/Swarm No central control — agents hand off directly to each other.

def transfer_to_agent_b():
    return agent_b

agent_a = Agent(
    name="Agent A",
    functions=[transfer_to_agent_b]
)

Good for: exploratory tasks, unstable requirements, elastic collaboration.

Strength: no single-point bottleneck.

Weakness: coordination is complex, tends to diverge.

Pattern 3: Hierarchical Multi-level decomposition: strategy / planning / execution.

Strategy Layer -> Planning Layer -> Execution Layer

Good for: large-scale projects, enterprise workflows, tasks requiring long-term planning.

Context Isolation as Design Principle

Context isolation is the core value of multi-agent. Each agent completes its subtask in a clean Context.

Isolation Mechanisms

Full context delegation
Instruction passing
File system memory

The trade-offs depend on task complexity and latency requirements.

Consensus and Coordination

The Voting Problem Simple majority voting treats weak model hallucinations and strong model reasoning as equal weight.

Weighted Voting / Debate Protocols More reliable approaches use weighted voting or debate.

Trigger-Based Intervention Set up stall triggers and sycophancy triggers.

Practical Guidance

Failure Modes and Mitigations

Supervisor bottleneck -> output schema + checkpointing
Coordination overhead -> clear handoff + batching
Divergence -> convergence checks + TTL
Error propagation -> validate outputs + retry

Examples

Example 1: Research Team Architecture

Supervisor
├── Researcher
├── Analyzer
├── Fact-checker
└── Writer

Example 2: Handoff Protocol

def handle_customer_request(request):
    if request.type == "billing":
        return transfer_to(billing_agent)
    elif request.type == "technical":
        return transfer_to(technical_agent)
    elif request.type == "sales":
        return transfer_to(sales_agent)
    else:
        return handle_general(request)

Decision Helper: Do You Need Multi-Agent?

Can the task be split into parallel subtasks?
Is the single agent already hitting Context limits?
Do subtasks need different tool sets or system prompts?
Is the cost acceptable (tokens + latency)?

If 3 or more are "yes," then consider multi-agent.

Guidelines

Design for Context isolation as the primary benefit
Choose architecture based on coordination needs, not metaphor
Implement explicit handoff protocols
Use weighted voting or debate
Monitor for supervisor bottlenecks
Validate outputs before passing
Set TTL limits
Test failure scenarios

Practice Task

Draw a multi-agent architecture diagram for your project
Label each agent's Context boundaries and tool sets

Integration

This skill builds on context-fundamentals and context-degradation. It connects to:

memory-systems
tool-design
context-optimization

References

External resources:

LangGraph Documentation
AutoGen Framework
CrewAI Documentation
Research on Multi-Agent Coordination

Skill Metadata

Created: 2025-12-20 Last Updated: 2025-12-20 Author: Agent Skills for Context Engineering Contributors Version: 1.0.0

📚 相关资源

❓ 常见问题

关于本章主题最常被搜索的问题，点击展开答案

Multi-agent 的核心价值到底是什么？是 "角色扮演" 吗？

不是角色扮演。核心价值是 Context isolation + 并行化。每个 sub-agent 在干净的 context 中完成子任务，避免单 agent 把 history、docs、tool outputs 全部塞满后出现 lost-in-middle、attention scarcity、context poisoning。如果你把 multi-agent 当成 "研究员 + 分析师 + 编辑" 的角色游戏，你只是把系统做复杂了，没做更好。真正的判断标准：sub-agent 是不是在 isolate context，还是只在演角色？

Supervisor、Swarm、Hierarchical 三种模式怎么选？

Supervisor / Orchestrator：中心节点分派任务、汇总结果，控制力强，但 supervisor context 易成瓶颈，容易 telephone game；适合任务清晰、需协调多领域、需要 human oversight。Peer-to-peer / Swarm：无中心，agent 之间直接 handoff（OpenAI Swarm 是典型），无单点瓶颈但易发散；适合探索型任务、需求不稳定。Hierarchical：strategy → planning → execution 多层拆分；适合大型项目、企业流程、长期规划。规模小先试 supervisor，发散就往 hierarchical，弹性需求高用 swarm。

Multi-agent 多花多少 token？这个成本什么时候值得？

本章给出 token 倍数：单 agent chat 1× baseline、单 agent + tools ~4×、multi-agent system ~15×。15 倍开销不是小数 —— 简单 query 完全不该用 multi-agent。研究显示性能差异主要由 token usage、tool calls、model choice 决定，更强模型（Claude Sonnet 4.5、GPT-5.2 thinking）往往比堆 sub-agent 更有效。值不值的判断：能不能并行拆？单 agent 真的撞 context 上限了吗？需不需要不同的 system prompt / tool sets？三条 yes 才考虑。

Telephone game 问题是什么？怎么避免？

Supervisor 模式下，sub-agent 的回答经过 supervisor "再总结" 后转给用户，每次合成都丢细节，多层之后变成 "传话游戏"。LangGraph benchmarks 验证过这个问题。解决方法：让 sub-agent 直接 pass-through，不让 supervisor 介入合成。本章给的代码模板：`forward_message(message, to_user=True)` 直接把 sub-agent 的 response 作为 `direct_response` 发给用户，跳过 supervisor。该汇总的还是汇总，但能直传的内容（如详细数据、长格式输出）走直传通道。

Multi-agent 的简单多数投票为什么不可靠？

因为它把弱模型的 hallucination 和强模型的 reasoning 当成同等权重 —— 三个 agent 投票，两个用 GPT-3.5 的乱猜票就能压过一个 Claude Sonnet 的正确答案。可靠做法两种：(1) Weighted voting，按模型能力加权；(2) Debate protocols，让 agent 互相挑战论据。再加上 trigger-based intervention（stall trigger + sycophancy trigger）防止 agent 集体附和。本章给出的设计原则：监控 systematic bias 比追求绝对一致率更重要。