Tool Design Principles
Tool Design for Agents
Tools are the primary mechanism for agents to interact with the outside world. They define the contract between deterministic systems and non-deterministic agents. Unlike traditional APIs, tool APIs must be designed for language models: the model needs to understand intent from natural language, infer parameters, and generate calls. Bad tool design creates failure modes that no amount of prompt engineering can fix.
The essence of tool design is "reduce guessing for the model." Clearer tools mean more stable behavior.
- Tools are contracts, not regular APIs.
- Consolidation reduces ambiguity.
- Good descriptions answer what/when/inputs/returns.
- Error messages must be recoverable.
- Prefer minimal, general-purpose tools.
What You'll Learn
- How to write tool descriptions that models can call correctly
- When to consolidate tools vs. when to split them
- How to design response formats and error handling
When to Activate
Activate this skill when:
- Creating new tools for agent systems
- Debugging tool-related failures or misuse
- Optimizing existing tool sets for better agent performance
- Designing tool APIs from scratch
- Evaluating third-party tools for agent integration
- Standardizing tool conventions across a codebase
Core Concepts
Tools are contracts between deterministic systems and non-deterministic agents. The consolidation principle states that if a human engineer cannot definitively say which tool should be used in a given situation, an agent cannot be expected to do better. Effective tool descriptions are prompt engineering that shapes agent behavior.
Key principles: clear descriptions (what/when/returns), response formats for token efficiency, error messages for recovery, and consistent conventions that reduce cognitive load.
Detailed Topics
The Tool-Agent Interface
Tools as Contracts Tools are contracts. When a human calls an API, they understand the contract. But a model has to infer the contract from the description — so the description must be explicit, unambiguous, and convey correct usage through examples.
Tool Description as Prompt Tool descriptions are essentially prompt engineering. They determine how an agent picks and uses tools. Bad descriptions force guessing; good descriptions include usage context, examples, and defaults.
Namespacing and Organization As your tool collection grows, namespacing reduces selection cost. Different namespaces map to different functional domains, helping the agent locate the right tool faster.
The Consolidation Principle
Single Comprehensive Tools Here's the consolidation principle: if a human can't decide which tool to use, the model won't do any better. Prefer one tool that handles a complete workflow over multiple fragmented tools.
Why Consolidation Works More tools means more descriptions eating up Context, and more ambiguity. Consolidation cuts token consumption and reduces selection complexity.
When Not to Consolidate Don't force consolidation when behaviors differ significantly, use cases don't overlap, or tools can be called independently.
Architectural Reduction
Push consolidation to its extreme and you get architectural reduction: fewer specialized tools, more general-purpose primitives.
The File System Agent Pattern
Instead of building complex tools, give the agent filesystem access + command execution and let the model use grep/cat/find/ls as general-purpose tools.
When Reduction Outperforms Complexity Reduction works best when:
- The data layer is well-documented
- The model's reasoning capability is strong enough
- Existing tools are constraining rather than enhancing the model
Reduction fails when: data is messy, domain knowledge is lacking, security constraints are strict, or workflows are highly complex.
Stop Constraining Reasoning Many guardrails are meant to "protect the model" but end up restricting its reasoning space. Keep asking yourself: is this tool enhancing the model, or boxing it in?
Build for Future Models Models iterate faster than tools. Over-engineered tool architectures lock you out of future improvements. Smaller architectures tend to be more resilient.
Tool Description Engineering
Description Structure A good description answers four questions:
- What does the tool do?
- When should it be used?
- What inputs does it accept?
- What does it return?
Default Parameter Selection Defaults should cover common scenarios and lower the cost of making a call.
Response Format Optimization
Response format directly affects Context tokens. Offer both concise and detailed formats, and specify when to use each.
Error Message Design
Error messages must be actionable: tell the model "what went wrong and how to fix it."
Tool Definition Schema
A unified schema (verb-noun naming, parameter naming, return fields) significantly reduces model misuse rates.
Tool Collection Design
More tools doesn't necessarily mean better. Keep it to 10-20 tools and use namespacing for grouping.
MCP Tool Naming Requirements
When using MCP tools, you must use fully qualified tool names:
Format: ServerName:tool_name
# Correct
"Use the BigQuery:bigquery_schema tool to retrieve table schemas."
"Use the GitHub:create_issue tool to create issues."
# Incorrect
"Use the bigquery_schema tool..."
Using Agents to Optimize Tools
You can use agents to reverse-optimize tool descriptions: improve descriptions based on failure examples, creating a feedback loop.
Testing Tool Design
Test tool calls with representative tasks, evaluating unambiguity, completeness, recoverability, and consistency.
Practical Guidance
Anti-Patterns to Avoid
- Vague descriptions
- Cryptic parameter names
- Missing error handling
- Inconsistent naming
Tool Selection Framework
- Identify workflows
- Group actions into comprehensive tools
- Ensure clear purpose
- Document error cases
- Test with agent interactions
Minimal Tool Spec Template
Tool Name: <verb_noun>
When to use: <trigger + context>
Inputs:
- param_a: type, constraints, example
- param_b: type, default
Returns:
- format: concise | detailed
Errors:
- ERROR_CODE: recovery hint
Examples
Example 1: Well-Designed Tool
def get_customer(customer_id: str, format: str = "concise"):
"""
Retrieve customer information by ID.
Use when:
- User asks about specific customer details
- Need customer context for decision-making
- Verifying customer identity
Args:
customer_id: Format "CUST-######" (e.g., "CUST-000001")
format: "concise" for key fields, "detailed" for complete record
Returns:
Customer object with requested fields
Errors:
NOT_FOUND: Customer ID not found
INVALID_FORMAT: ID must match CUST-###### pattern
"""
Example 2: Poor Tool Design
def search(query):
"""Search the database."""
pass
Problems with this design:
- Vague name: "search" is ambiguous
- Missing parameters
- No return description
- No usage context
- No error handling
Guidelines
- Write descriptions that answer what/when/returns
- Use consolidation to reduce ambiguity
- Implement response formats
- Design actionable error messages
- Enforce naming conventions
- Limit tool count and use namespacing
- Test tool designs with real agent interactions
- Iterate based on observed failures
- Prefer minimal architectures when possible
Practice Task
- Write a tool spec for your own business scenario (follow the template above)
- Identify 2 potential misuse points and add them to the description and errors sections
Related Pages
- Claude Code Examples
- Context Engineering Fundamentals
- Multi-Agent Architecture Patterns
- Advanced Evaluation
Integration
This skill connects to:
- context-fundamentals
- multi-agent-patterns
- evaluation
References
External resources:
- MCP (Model Context Protocol) documentation
- Framework tool conventions
- API design best practices for agents
- Vercel d0 agent architecture case study
Skill Metadata
Created: 2025-12-20 Last Updated: 2025-12-23 Author: Agent Skills for Context Engineering Contributors Version: 1.1.0
📚 相关资源
❓ 常见问题
关于本章主题最常被搜索的问题,点击展开答案
为什么 tool description 比 tool 实现更影响 agent 调用成功率?
因为模型不读你的代码,只能从 description 推断契约。一个好的 description 必须回答四件事:what(这工具干什么)、when(什么时候用)、inputs(参数和约束)、returns(返回什么格式)。description 模糊(如 'def search(query): """Search the database"""')会强迫模型猜,然后乱传参;写清楚 "customer_id 必须匹配 CUST-###### 格式" 这种细节,调用错误率立刻下来。
什么时候应该把多个 tool 合并成一个?什么时候不该合?
Consolidation 原则:如果连人类工程师都说不清在某个场景该用哪个 tool,模型也做不到。优先合并能完成完整 workflow 的工具,减少歧义和 description token 消耗。但当行为差异大、使用场景明显不同、且各工具能独立调用时,不要硬合 —— 比如 lookup_order(只读)和 create_ticket(写)就不该合,副作用差太多。
tool 数量多少合适?工具集变大后怎么管?
建议控制在 10-20 个,超过就用 namespacing 分组。工具越多,description 越占 context 预算,模型选错概率越高。MCP 场景必须用 fully qualified name(`ServerName:tool_name`,如 `BigQuery:bigquery_schema`),不带 server 前缀模型会在多个同名工具之间乱选。Architectural reduction 是另一条路:与其加专用工具,不如给 filesystem + 命令执行,让模型自己用 grep/cat/find。
tool 报错时返回什么内容能让 agent 自己恢复?
Error message 必须 actionable —— 告诉模型哪里错了、怎么修。最低标配两个字段:error code(如 NOT_FOUND、INVALID_FORMAT)+ recovery hint("ID must match CUST-###### pattern")。返回 "Internal error" 或 "500" 模型只会瞎重试。好的 error 模板让模型读完就知道下一步是改参数、换 ID 还是放弃任务。
为什么过度复杂的 tool 架构会成为模型升级的负担?
模型迭代速度比工具快很多。今天为 GPT-4 做的 guardrail(限制参数、强制结构、限制选项)到了 Claude 4.5 或 GPT-5 thinking 这种推理更强的模型上,反而在束缚而不是增强。每次问自己:"这个工具是在帮模型推理,还是在替它做决定?" 更小的 architecture 在模型升级时更有韧性 —— 这就是 Vercel d0 团队总结的 "build for future models"。