18

Multi-Agent Patterns

⏱️ 35 min

Multi-Agent Architecture Patterns

Multi-agent architectures distribute work across multiple language model instances, each with its own Context window. Done well, they break past single-agent limitations. Done poorly, they just add coordination overhead. Here's the key insight: sub-agents' core value is Context isolation, not role-playing.

If you treat multi-agent as "role-playing," you'll probably end up with a more complex but not better system. The real value is: Context isolation + parallelization.

  • Use multi-agent to isolate Context, not to role-play.
  • Supervisor / swarm / hierarchical are the mainstream patterns.
  • Token costs are high — only complex tasks justify the overhead.
  • Avoid the telephone game; allow direct pass-through.
  • Define explicit handoff and convergence rules.

What You'll Learn

  • When you need multi-agent and when you don't
  • Pros and cons of three architectural patterns
  • How to design collaboration and convergence mechanisms

When to Activate

Activate this skill when:

  • Single-agent Context limits constrain task complexity
  • Tasks decompose naturally into parallel subtasks
  • Different subtasks require different tool sets or system prompts
  • Building systems that must handle multiple domains simultaneously
  • Scaling agent capabilities beyond single-context limits
  • Designing production agent systems with multiple specialized components

Core Concepts

Multi-agent systems solve single-agent limitations through Context distribution. Three mainstream patterns: supervisor/orchestrator, peer-to-peer/swarm, and hierarchical. The core design principle is Context isolation.

Effective multi-agent systems need explicit coordination protocols, consensus mechanisms that avoid sycophancy, and awareness of bottlenecks, divergence, and error propagation.

Detailed Topics

Why Multi-Agent Architectures

The Context Bottleneck A single agent hits ceilings in reasoning, Context management, and tool coordination. As task complexity grows, Context fills up with history, docs, and tool outputs, leading to lost-in-middle effects, attention scarcity, and Context poisoning.

Multi-agent splits tasks across multiple Context windows, reducing the load on any single Context.

The Token Economics Reality Multi-agent consumes significantly more tokens:

ArchitectureToken MultiplierUse Case
Single agent chat1x baselineSimple queries
Single agent with tools~4x baselineTool-using tasks
Multi-agent system~15x baselineComplex research/coordination

Research shows performance variance is driven primarily by token usage, tool calls, and model choice. Stronger models (like Claude Sonnet 4.5, GPT-5.2 thinking mode) tend to be more effective than just throwing more tokens at the problem.

The Parallelization Argument Many tasks can be split for parallel execution: multi-source retrieval, multi-document analysis, comparing different approaches. A single agent must handle these sequentially; multi-agent can run them in parallel, with total time approaching the longest subtask rather than the sum.

The Specialization Argument Different tasks need different system prompts and tool sets. Multi-agent allows specialization without burdening a single agent with every possible configuration.

Architectural Patterns

Pattern 1: Supervisor/Orchestrator A central supervisor controls flow, dispatches tasks, and aggregates results.

User Query -> Supervisor -> [Specialist, Specialist, Specialist] -> Aggregation -> Final Output

Good for: well-defined tasks, multi-domain coordination, human oversight requirements.

Strength: strong control.

Weakness: supervisor Context easily becomes a bottleneck; prone to the telephone game.

The Telephone Game Problem and Solution LangGraph benchmarks show supervisor architectures tend to lose detail.

The fix: let sub-agents pass responses directly through:

def forward_message(message: str, to_user: bool = True):
    """
    Forward sub-agent response directly to user without supervisor synthesis.
    """
    if to_user:
        return {"type": "direct_response", "content": message}
    return {"type": "supervisor_input", "content": message}

Pattern 2: Peer-to-Peer/Swarm No central control — agents hand off directly to each other.

def transfer_to_agent_b():
    return agent_b

agent_a = Agent(
    name="Agent A",
    functions=[transfer_to_agent_b]
)

Good for: exploratory tasks, unstable requirements, elastic collaboration.

Strength: no single-point bottleneck.

Weakness: coordination is complex, tends to diverge.

Pattern 3: Hierarchical Multi-level decomposition: strategy / planning / execution.

Strategy Layer -> Planning Layer -> Execution Layer

Good for: large-scale projects, enterprise workflows, tasks requiring long-term planning.

Context Isolation as Design Principle

Context isolation is the core value of multi-agent. Each agent completes its subtask in a clean Context.

Isolation Mechanisms

  • Full context delegation
  • Instruction passing
  • File system memory

The trade-offs depend on task complexity and latency requirements.

Consensus and Coordination

The Voting Problem Simple majority voting treats weak model hallucinations and strong model reasoning as equal weight.

Weighted Voting / Debate Protocols More reliable approaches use weighted voting or debate.

Trigger-Based Intervention Set up stall triggers and sycophancy triggers.

Practical Guidance

Failure Modes and Mitigations

  • Supervisor bottleneck -> output schema + checkpointing
  • Coordination overhead -> clear handoff + batching
  • Divergence -> convergence checks + TTL
  • Error propagation -> validate outputs + retry

Examples

Example 1: Research Team Architecture

Supervisor
├── Researcher
├── Analyzer
├── Fact-checker
└── Writer

Example 2: Handoff Protocol

def handle_customer_request(request):
    if request.type == "billing":
        return transfer_to(billing_agent)
    elif request.type == "technical":
        return transfer_to(technical_agent)
    elif request.type == "sales":
        return transfer_to(sales_agent)
    else:
        return handle_general(request)

Decision Helper: Do You Need Multi-Agent?

  • Can the task be split into parallel subtasks?
  • Is the single agent already hitting Context limits?
  • Do subtasks need different tool sets or system prompts?
  • Is the cost acceptable (tokens + latency)?

If 3 or more are "yes," then consider multi-agent.

Guidelines

  1. Design for Context isolation as the primary benefit
  2. Choose architecture based on coordination needs, not metaphor
  3. Implement explicit handoff protocols
  4. Use weighted voting or debate
  5. Monitor for supervisor bottlenecks
  6. Validate outputs before passing
  7. Set TTL limits
  8. Test failure scenarios

Practice Task

  • Draw a multi-agent architecture diagram for your project
  • Label each agent's Context boundaries and tool sets

Integration

This skill builds on context-fundamentals and context-degradation. It connects to:

  • memory-systems
  • tool-design
  • context-optimization

References

External resources:

  • LangGraph Documentation
  • AutoGen Framework
  • CrewAI Documentation
  • Research on Multi-Agent Coordination

Skill Metadata

Created: 2025-12-20 Last Updated: 2025-12-20 Author: Agent Skills for Context Engineering Contributors Version: 1.0.0

📚 相关资源

❓ 常见问题

关于本章主题最常被搜索的问题,点击展开答案

Multi-agent 的核心价值到底是什么?是 "角色扮演" 吗?

不是角色扮演。核心价值是 Context isolation + 并行化。每个 sub-agent 在干净的 context 中完成子任务,避免单 agent 把 history、docs、tool outputs 全部塞满后出现 lost-in-middle、attention scarcity、context poisoning。如果你把 multi-agent 当成 "研究员 + 分析师 + 编辑" 的角色游戏,你只是把系统做复杂了,没做更好。真正的判断标准:sub-agent 是不是在 isolate context,还是只在演角色?

Supervisor、Swarm、Hierarchical 三种模式怎么选?

Supervisor / Orchestrator:中心节点分派任务、汇总结果,控制力强,但 supervisor context 易成瓶颈,容易 telephone game;适合任务清晰、需协调多领域、需要 human oversight。Peer-to-peer / Swarm:无中心,agent 之间直接 handoff(OpenAI Swarm 是典型),无单点瓶颈但易发散;适合探索型任务、需求不稳定。Hierarchical:strategy → planning → execution 多层拆分;适合大型项目、企业流程、长期规划。规模小先试 supervisor,发散就往 hierarchical,弹性需求高用 swarm。

Multi-agent 多花多少 token?这个成本什么时候值得?

本章给出 token 倍数:单 agent chat 1× baseline、单 agent + tools ~4×、multi-agent system ~15×。15 倍开销不是小数 —— 简单 query 完全不该用 multi-agent。研究显示性能差异主要由 token usage、tool calls、model choice 决定,更强模型(Claude Sonnet 4.5、GPT-5.2 thinking)往往比堆 sub-agent 更有效。值不值的判断:能不能并行拆?单 agent 真的撞 context 上限了吗?需不需要不同的 system prompt / tool sets?三条 yes 才考虑。

Telephone game 问题是什么?怎么避免?

Supervisor 模式下,sub-agent 的回答经过 supervisor "再总结" 后转给用户,每次合成都丢细节,多层之后变成 "传话游戏"。LangGraph benchmarks 验证过这个问题。解决方法:让 sub-agent 直接 pass-through,不让 supervisor 介入合成。本章给的代码模板:`forward_message(message, to_user=True)` 直接把 sub-agent 的 response 作为 `direct_response` 发给用户,跳过 supervisor。该汇总的还是汇总,但能直传的内容(如详细数据、长格式输出)走直传通道。

Multi-agent 的简单多数投票为什么不可靠?

因为它把弱模型的 hallucination 和强模型的 reasoning 当成同等权重 —— 三个 agent 投票,两个用 GPT-3.5 的乱猜票就能压过一个 Claude Sonnet 的正确答案。可靠做法两种:(1) Weighted voting,按模型能力加权;(2) Debate protocols,让 agent 互相挑战论据。再加上 trigger-based intervention(stall trigger + sycophancy trigger)防止 agent 集体附和。本章给出的设计原则:监控 systematic bias 比追求绝对一致率更重要。