Multi-Agent Patterns
Multi-Agent Architecture Patterns
Multi-agent architectures distribute work across multiple language model instances, each with its own Context window. Done well, they break past single-agent limitations. Done poorly, they just add coordination overhead. Here's the key insight: sub-agents' core value is Context isolation, not role-playing.
If you treat multi-agent as "role-playing," you'll probably end up with a more complex but not better system. The real value is: Context isolation + parallelization.
- Use multi-agent to isolate Context, not to role-play.
- Supervisor / swarm / hierarchical are the mainstream patterns.
- Token costs are high — only complex tasks justify the overhead.
- Avoid the telephone game; allow direct pass-through.
- Define explicit handoff and convergence rules.
What You'll Learn
- When you need multi-agent and when you don't
- Pros and cons of three architectural patterns
- How to design collaboration and convergence mechanisms
When to Activate
Activate this skill when:
- Single-agent Context limits constrain task complexity
- Tasks decompose naturally into parallel subtasks
- Different subtasks require different tool sets or system prompts
- Building systems that must handle multiple domains simultaneously
- Scaling agent capabilities beyond single-context limits
- Designing production agent systems with multiple specialized components
Core Concepts
Multi-agent systems solve single-agent limitations through Context distribution. Three mainstream patterns: supervisor/orchestrator, peer-to-peer/swarm, and hierarchical. The core design principle is Context isolation.
Effective multi-agent systems need explicit coordination protocols, consensus mechanisms that avoid sycophancy, and awareness of bottlenecks, divergence, and error propagation.
Detailed Topics
Why Multi-Agent Architectures
The Context Bottleneck A single agent hits ceilings in reasoning, Context management, and tool coordination. As task complexity grows, Context fills up with history, docs, and tool outputs, leading to lost-in-middle effects, attention scarcity, and Context poisoning.
Multi-agent splits tasks across multiple Context windows, reducing the load on any single Context.
The Token Economics Reality Multi-agent consumes significantly more tokens:
| Architecture | Token Multiplier | Use Case |
|---|---|---|
| Single agent chat | 1x baseline | Simple queries |
| Single agent with tools | ~4x baseline | Tool-using tasks |
| Multi-agent system | ~15x baseline | Complex research/coordination |
Research shows performance variance is driven primarily by token usage, tool calls, and model choice. Stronger models (like Claude Sonnet 4.5, GPT-5.2 thinking mode) tend to be more effective than just throwing more tokens at the problem.
The Parallelization Argument Many tasks can be split for parallel execution: multi-source retrieval, multi-document analysis, comparing different approaches. A single agent must handle these sequentially; multi-agent can run them in parallel, with total time approaching the longest subtask rather than the sum.
The Specialization Argument Different tasks need different system prompts and tool sets. Multi-agent allows specialization without burdening a single agent with every possible configuration.
Architectural Patterns
Pattern 1: Supervisor/Orchestrator A central supervisor controls flow, dispatches tasks, and aggregates results.
User Query -> Supervisor -> [Specialist, Specialist, Specialist] -> Aggregation -> Final Output
Good for: well-defined tasks, multi-domain coordination, human oversight requirements.
Strength: strong control.
Weakness: supervisor Context easily becomes a bottleneck; prone to the telephone game.
The Telephone Game Problem and Solution LangGraph benchmarks show supervisor architectures tend to lose detail.
The fix: let sub-agents pass responses directly through:
def forward_message(message: str, to_user: bool = True):
"""
Forward sub-agent response directly to user without supervisor synthesis.
"""
if to_user:
return {"type": "direct_response", "content": message}
return {"type": "supervisor_input", "content": message}
Pattern 2: Peer-to-Peer/Swarm No central control — agents hand off directly to each other.
def transfer_to_agent_b():
return agent_b
agent_a = Agent(
name="Agent A",
functions=[transfer_to_agent_b]
)
Good for: exploratory tasks, unstable requirements, elastic collaboration.
Strength: no single-point bottleneck.
Weakness: coordination is complex, tends to diverge.
Pattern 3: Hierarchical Multi-level decomposition: strategy / planning / execution.
Strategy Layer -> Planning Layer -> Execution Layer
Good for: large-scale projects, enterprise workflows, tasks requiring long-term planning.
Context Isolation as Design Principle
Context isolation is the core value of multi-agent. Each agent completes its subtask in a clean Context.
Isolation Mechanisms
- Full context delegation
- Instruction passing
- File system memory
The trade-offs depend on task complexity and latency requirements.
Consensus and Coordination
The Voting Problem Simple majority voting treats weak model hallucinations and strong model reasoning as equal weight.
Weighted Voting / Debate Protocols More reliable approaches use weighted voting or debate.
Trigger-Based Intervention Set up stall triggers and sycophancy triggers.
Practical Guidance
Failure Modes and Mitigations
- Supervisor bottleneck -> output schema + checkpointing
- Coordination overhead -> clear handoff + batching
- Divergence -> convergence checks + TTL
- Error propagation -> validate outputs + retry
Examples
Example 1: Research Team Architecture
Supervisor
├── Researcher
├── Analyzer
├── Fact-checker
└── Writer
Example 2: Handoff Protocol
def handle_customer_request(request):
if request.type == "billing":
return transfer_to(billing_agent)
elif request.type == "technical":
return transfer_to(technical_agent)
elif request.type == "sales":
return transfer_to(sales_agent)
else:
return handle_general(request)
Decision Helper: Do You Need Multi-Agent?
- Can the task be split into parallel subtasks?
- Is the single agent already hitting Context limits?
- Do subtasks need different tool sets or system prompts?
- Is the cost acceptable (tokens + latency)?
If 3 or more are "yes," then consider multi-agent.
Guidelines
- Design for Context isolation as the primary benefit
- Choose architecture based on coordination needs, not metaphor
- Implement explicit handoff protocols
- Use weighted voting or debate
- Monitor for supervisor bottlenecks
- Validate outputs before passing
- Set TTL limits
- Test failure scenarios
Practice Task
- Draw a multi-agent architecture diagram for your project
- Label each agent's Context boundaries and tool sets
Related Pages
Integration
This skill builds on context-fundamentals and context-degradation. It connects to:
- memory-systems
- tool-design
- context-optimization
References
External resources:
- LangGraph Documentation
- AutoGen Framework
- CrewAI Documentation
- Research on Multi-Agent Coordination
Skill Metadata
Created: 2025-12-20 Last Updated: 2025-12-20 Author: Agent Skills for Context Engineering Contributors Version: 1.0.0
📚 相关资源
❓ 常见问题
关于本章主题最常被搜索的问题,点击展开答案
Multi-agent 的核心价值到底是什么?是 "角色扮演" 吗?
不是角色扮演。核心价值是 Context isolation + 并行化。每个 sub-agent 在干净的 context 中完成子任务,避免单 agent 把 history、docs、tool outputs 全部塞满后出现 lost-in-middle、attention scarcity、context poisoning。如果你把 multi-agent 当成 "研究员 + 分析师 + 编辑" 的角色游戏,你只是把系统做复杂了,没做更好。真正的判断标准:sub-agent 是不是在 isolate context,还是只在演角色?
Supervisor、Swarm、Hierarchical 三种模式怎么选?
Supervisor / Orchestrator:中心节点分派任务、汇总结果,控制力强,但 supervisor context 易成瓶颈,容易 telephone game;适合任务清晰、需协调多领域、需要 human oversight。Peer-to-peer / Swarm:无中心,agent 之间直接 handoff(OpenAI Swarm 是典型),无单点瓶颈但易发散;适合探索型任务、需求不稳定。Hierarchical:strategy → planning → execution 多层拆分;适合大型项目、企业流程、长期规划。规模小先试 supervisor,发散就往 hierarchical,弹性需求高用 swarm。
Multi-agent 多花多少 token?这个成本什么时候值得?
本章给出 token 倍数:单 agent chat 1× baseline、单 agent + tools ~4×、multi-agent system ~15×。15 倍开销不是小数 —— 简单 query 完全不该用 multi-agent。研究显示性能差异主要由 token usage、tool calls、model choice 决定,更强模型(Claude Sonnet 4.5、GPT-5.2 thinking)往往比堆 sub-agent 更有效。值不值的判断:能不能并行拆?单 agent 真的撞 context 上限了吗?需不需要不同的 system prompt / tool sets?三条 yes 才考虑。
Telephone game 问题是什么?怎么避免?
Supervisor 模式下,sub-agent 的回答经过 supervisor "再总结" 后转给用户,每次合成都丢细节,多层之后变成 "传话游戏"。LangGraph benchmarks 验证过这个问题。解决方法:让 sub-agent 直接 pass-through,不让 supervisor 介入合成。本章给的代码模板:`forward_message(message, to_user=True)` 直接把 sub-agent 的 response 作为 `direct_response` 发给用户,跳过 supervisor。该汇总的还是汇总,但能直传的内容(如详细数据、长格式输出)走直传通道。
Multi-agent 的简单多数投票为什么不可靠?
因为它把弱模型的 hallucination 和强模型的 reasoning 当成同等权重 —— 三个 agent 投票,两个用 GPT-3.5 的乱猜票就能压过一个 Claude Sonnet 的正确答案。可靠做法两种:(1) Weighted voting,按模型能力加权;(2) Debate protocols,让 agent 互相挑战论据。再加上 trigger-based intervention(stall trigger + sycophancy trigger)防止 agent 集体附和。本章给出的设计原则:监控 systematic bias 比追求绝对一致率更重要。