Why does tool description matter more than tool implementation for agent call success?

Because the model can't read your code — it can only infer the contract from the description. A good description answers four things: what (what the tool does), when (when to use it), inputs (params + constraints), returns (return format). A vague description like `def search(query): """Search the database"""` forces the model to guess and pass garbage; spelling out details like "customer_id must match CUST-###### format" cuts call errors immediately.

When should I consolidate multiple tools into one, and when should I not?

Consolidation rule: if a human engineer can't definitively say which tool to use in a given situation, the model can't either. Merge tools that complete a full workflow — reduces ambiguity and description token cost. Don't force-merge when behaviors diverge, contexts differ clearly, and tools stand alone — e.g., lookup_order (read-only) and create_ticket (write) shouldn't combine; the side-effect profile is too different.

How many tools is too many, and how do I manage a growing tool set?

Aim for 10-20 tools — beyond that, use namespacing. More tools means more description bytes burning context budget and higher tool-selection error. For MCP, you must use fully qualified names (`ServerName:tool_name`, e.g., `BigQuery:bigquery_schema`); without the server prefix the model picks wrong between duplicates. Architectural reduction is another path: instead of adding specialized tools, give filesystem + shell access and let the model use grep/cat/find.

What should a tool error message contain so the agent can recover on its own?

Error messages must be actionable — tell the model what went wrong and how to fix it. Minimum two fields: error code (NOT_FOUND, INVALID_FORMAT) + recovery hint ("ID must match CUST-###### pattern"). Returning "Internal error" or bare "500" leaves the model retrying blindly. A good error template lets the model decide its next move — fix the param, swap the ID, or abandon the task.

Why does an over-engineered tool architecture become a liability when models upgrade?

Models iterate way faster than tooling. Guardrails built for GPT-4 — restricted params, forced structure, narrow options — start constraining instead of helping when stronger reasoners like Claude 4.5 or GPT-5 thinking arrive. Ask yourself: "is this tool augmenting reasoning or replacing it?" Minimal architectures survive model upgrades better — this is what the Vercel d0 team called "build for future models."

Tool Design Principles

⏱️ 35 min

Tool Design for Agents

Tools are the primary mechanism for agents to interact with the outside world. They define the contract between deterministic systems and non-deterministic agents. Unlike traditional APIs, tool APIs must be designed for language models: the model needs to understand intent from natural language, infer parameters, and generate calls. Bad tool design creates failure modes that no amount of prompt engineering can fix.

The essence of tool design is "reduce guessing for the model." Clearer tools mean more stable behavior.

Tools are contracts, not regular APIs.
Consolidation reduces ambiguity.
Good descriptions answer what/when/inputs/returns.
Error messages must be recoverable.
Prefer minimal, general-purpose tools.

What You'll Learn

How to write tool descriptions that models can call correctly
When to consolidate tools vs. when to split them
How to design response formats and error handling

When to Activate

Activate this skill when:

Creating new tools for agent systems
Debugging tool-related failures or misuse
Optimizing existing tool sets for better agent performance
Designing tool APIs from scratch
Evaluating third-party tools for agent integration
Standardizing tool conventions across a codebase

Core Concepts

Tools are contracts between deterministic systems and non-deterministic agents. The consolidation principle states that if a human engineer cannot definitively say which tool should be used in a given situation, an agent cannot be expected to do better. Effective tool descriptions are prompt engineering that shapes agent behavior.

Key principles: clear descriptions (what/when/returns), response formats for token efficiency, error messages for recovery, and consistent conventions that reduce cognitive load.

Detailed Topics

The Tool-Agent Interface

Tools as Contracts Tools are contracts. When a human calls an API, they understand the contract. But a model has to infer the contract from the description — so the description must be explicit, unambiguous, and convey correct usage through examples.

Tool Description as Prompt Tool descriptions are essentially prompt engineering. They determine how an agent picks and uses tools. Bad descriptions force guessing; good descriptions include usage context, examples, and defaults.

Namespacing and Organization As your tool collection grows, namespacing reduces selection cost. Different namespaces map to different functional domains, helping the agent locate the right tool faster.

The Consolidation Principle

Single Comprehensive Tools Here's the consolidation principle: if a human can't decide which tool to use, the model won't do any better. Prefer one tool that handles a complete workflow over multiple fragmented tools.

Why Consolidation Works More tools means more descriptions eating up Context, and more ambiguity. Consolidation cuts token consumption and reduces selection complexity.

When Not to Consolidate Don't force consolidation when behaviors differ significantly, use cases don't overlap, or tools can be called independently.

Architectural Reduction

Push consolidation to its extreme and you get architectural reduction: fewer specialized tools, more general-purpose primitives.

The File System Agent Pattern Instead of building complex tools, give the agent filesystem access + command execution and let the model use grep/cat/find/ls as general-purpose tools.

When Reduction Outperforms Complexity Reduction works best when:

The data layer is well-documented
The model's reasoning capability is strong enough
Existing tools are constraining rather than enhancing the model

Reduction fails when: data is messy, domain knowledge is lacking, security constraints are strict, or workflows are highly complex.

Stop Constraining Reasoning Many guardrails are meant to "protect the model" but end up restricting its reasoning space. Keep asking yourself: is this tool enhancing the model, or boxing it in?

Build for Future Models Models iterate faster than tools. Over-engineered tool architectures lock you out of future improvements. Smaller architectures tend to be more resilient.

Tool Description Engineering

Description Structure A good description answers four questions:

What does the tool do?
When should it be used?
What inputs does it accept?
What does it return?

Default Parameter Selection Defaults should cover common scenarios and lower the cost of making a call.

Response Format Optimization

Response format directly affects Context tokens. Offer both concise and detailed formats, and specify when to use each.

Error Message Design

Error messages must be actionable: tell the model "what went wrong and how to fix it."

Tool Definition Schema

A unified schema (verb-noun naming, parameter naming, return fields) significantly reduces model misuse rates.

Tool Collection Design

More tools doesn't necessarily mean better. Keep it to 10-20 tools and use namespacing for grouping.

MCP Tool Naming Requirements

When using MCP tools, you must use fully qualified tool names:

Format: ServerName:tool_name

# Correct
"Use the BigQuery:bigquery_schema tool to retrieve table schemas."
"Use the GitHub:create_issue tool to create issues."

# Incorrect
"Use the bigquery_schema tool..."

Using Agents to Optimize Tools

You can use agents to reverse-optimize tool descriptions: improve descriptions based on failure examples, creating a feedback loop.

Testing Tool Design

Test tool calls with representative tasks, evaluating unambiguity, completeness, recoverability, and consistency.

Practical Guidance

Anti-Patterns to Avoid

Vague descriptions
Cryptic parameter names
Missing error handling
Inconsistent naming

Tool Selection Framework

Identify workflows
Group actions into comprehensive tools
Ensure clear purpose
Document error cases
Test with agent interactions

Minimal Tool Spec Template

Tool Name: <verb_noun>
When to use: <trigger + context>
Inputs:

-   param_a: type, constraints, example
-   param_b: type, default
    Returns:
-   format: concise | detailed
    Errors:
-   ERROR_CODE: recovery hint

Examples

Example 1: Well-Designed Tool

def get_customer(customer_id: str, format: str = "concise"):
    """
    Retrieve customer information by ID.

    Use when:
    - User asks about specific customer details
    - Need customer context for decision-making
    - Verifying customer identity

    Args:
        customer_id: Format "CUST-######" (e.g., "CUST-000001")
        format: "concise" for key fields, "detailed" for complete record

    Returns:
        Customer object with requested fields

    Errors:
        NOT_FOUND: Customer ID not found
        INVALID_FORMAT: ID must match CUST-###### pattern
    """

Example 2: Poor Tool Design

def search(query):
    """Search the database."""
    pass

Problems with this design:

Vague name: "search" is ambiguous
Missing parameters
No return description
No usage context
No error handling

Guidelines

Write descriptions that answer what/when/returns
Use consolidation to reduce ambiguity
Implement response formats
Design actionable error messages
Enforce naming conventions
Limit tool count and use namespacing
Test tool designs with real agent interactions
Iterate based on observed failures
Prefer minimal architectures when possible

Practice Task

Write a tool spec for your own business scenario (follow the template above)
Identify 2 potential misuse points and add them to the description and errors sections

Integration

This skill connects to:

context-fundamentals
multi-agent-patterns
evaluation

References

External resources:

MCP (Model Context Protocol) documentation
Framework tool conventions
API design best practices for agents
Vercel d0 agent architecture case study

Skill Metadata

Created: 2025-12-20 Last Updated: 2025-12-23 Author: Agent Skills for Context Engineering Contributors Version: 1.1.0

📚 相关资源

❓ 常见问题

关于本章主题最常被搜索的问题，点击展开答案

为什么 tool description 比 tool 实现更影响 agent 调用成功率？

因为模型不读你的代码，只能从 description 推断契约。一个好的 description 必须回答四件事：what（这工具干什么）、when（什么时候用）、inputs（参数和约束）、returns（返回什么格式）。description 模糊（如 'def search(query): """Search the database"""'）会强迫模型猜，然后乱传参；写清楚 "customer_id 必须匹配 CUST-###### 格式" 这种细节，调用错误率立刻下来。

什么时候应该把多个 tool 合并成一个？什么时候不该合？

Consolidation 原则：如果连人类工程师都说不清在某个场景该用哪个 tool，模型也做不到。优先合并能完成完整 workflow 的工具，减少歧义和 description token 消耗。但当行为差异大、使用场景明显不同、且各工具能独立调用时，不要硬合 —— 比如 lookup_order（只读）和 create_ticket（写）就不该合，副作用差太多。

tool 数量多少合适？工具集变大后怎么管？

建议控制在 10-20 个，超过就用 namespacing 分组。工具越多，description 越占 context 预算，模型选错概率越高。MCP 场景必须用 fully qualified name（`ServerName:tool_name`，如 `BigQuery:bigquery_schema`），不带 server 前缀模型会在多个同名工具之间乱选。Architectural reduction 是另一条路：与其加专用工具，不如给 filesystem + 命令执行，让模型自己用 grep/cat/find。

tool 报错时返回什么内容能让 agent 自己恢复？

Error message 必须 actionable —— 告诉模型哪里错了、怎么修。最低标配两个字段：error code（如 NOT_FOUND、INVALID_FORMAT）+ recovery hint（"ID must match CUST-###### pattern"）。返回 "Internal error" 或 "500" 模型只会瞎重试。好的 error 模板让模型读完就知道下一步是改参数、换 ID 还是放弃任务。

为什么过度复杂的 tool 架构会成为模型升级的负担？

模型迭代速度比工具快很多。今天为 GPT-4 做的 guardrail（限制参数、强制结构、限制选项）到了 Claude 4.5 或 GPT-5 thinking 这种推理更强的模型上，反而在束缚而不是增强。每次问自己："这个工具是在帮模型推理，还是在替它做决定？" 更小的 architecture 在模型升级时更有韧性 —— 这就是 Vercel d0 团队总结的 "build for future models"。