10

Function Calling & Tool Use

⏱️ 35 min

Function calling turns LLMs into orchestration engines. This chapter outlines patterns for stable tool use.

1) When to Use

  • Structured actions: DB queries, API calls, code exec, search.
  • Constrained outputs: prefer tool calls over free-text for reliability.
  • Auditable actions: log tool name/args/results.

2) Tool Schema Design

  • Clear names/descriptions; types for every param; enums for constrained values.
  • Required vs optional fields; defaults kept server-side.
  • Validate inputs server-side; reject/repair before execution.

3) Prompting for Tools

  • System: "Prefer calling tools when helpful; don't guess params; ask for missing info."
  • Few-shot: include examples of good tool calls and refusals.
  • Disallow hallucination: remind model to refuse if no suitable tool exists.

4) Execution Loop

while not done:
  ask model → get tool call(s)
  validate/repair args
  run tool in sandbox with timeout
  append tool result back to model
  stop if final answer / max steps / time budget

5) Safety & Limits

  • Timeouts per tool; circuit-break noisy tools.
  • Allowlist domains/APIs; no raw shell without sandbox.
  • PII stripping before tool calls; redact secrets from logs.
  • Idempotency for mutating tools; confirmation steps for risky actions.

6) Error Handling

  • Distinguish user errors (bad params) vs system errors (tool down).
  • Provide concise tool error back to model; let model replan or ask user.
  • Retry with backoff for transient failures; limit total attempts.

7) Multimodal Tools

  • Tools that accept files/URLs: validate size/type; pre-process (OCR/transcript).
  • Return handles/IDs instead of raw blobs; store artifacts with TTL.

8) Testing & Evals

  • Contract tests: schema compliance, required params present.
  • Golden cases: correct tool selection, refusal when no tool fits.
  • Load/chaos: inject tool errors and ensure graceful degradation.

9) Minimal Checklist

  • Strong schemas + validation + allowlists.
  • Sandbox + timeouts + retries + circuit breakers.
  • Logs: tool, args (scrubbed), duration, success/fail, tokens.

📚 相关资源

❓ 常见问题

关于本章主题最常被搜索的问题,点击展开答案

什么时候该用 function calling 而不是让模型直接生成文本?

三种场景必须用:(1) 结构化动作——DB 查询、API 调用、code execution、search;(2) 受约束输出——tool call 比自由文本可靠;(3) 可审计动作——必须 log tool name/args/result。Tool call 把「模型决策 + 实际执行」拆开,参数 schema + 服务端验证比 prompt「请输出 JSON」稳定一个量级。

tool schema 应该怎么写?

原则:name 和 description 清晰、每个 param 有 type、constrained values 用 enum、明确 required vs optional、defaults 留服务端。description 要包含 usage context、examples、defaults——consolidation principle 说:人类工程师都拿不准该用哪个 tool,模型更不可能选对。所有 input 都要服务端再次校验,reject 或 repair 后再执行。

tool execution loop 应该长什么样?

标准 loop:while not done → 问模型拿 tool call(s) → 校验/修复 args → 在 sandbox 跑 + timeout → tool result 回灌给模型 → 检查是否最终答案 / 是否到 max steps / 是否超 time budget → 否则继续。每个 tool 必须有 timeout、circuit breaker(噪声 tool 自动熔断)、allowlist domain / API、no raw shell(必须 sandbox)。

tool 报错怎么处理?要不要重试?

区分错误类型:用户错误(bad params)→ 把错误简洁返给模型让它 replan 或问用户;系统错误(tool down)→ 指数退避重试,限制总次数。Mutating tool 必须 idempotent(重试不重复扣款);高风险动作(删数据、转账)加确认步骤。所有 tool 错误日志含 tool 名、args(脱敏)、duration、success/fail、tokens。

tool 系统怎么测?

三类测试:(1) Contract tests——schema 合规性、required params 存在;(2) Golden cases——正确选 tool、没合适 tool 时拒答;(3) Load/chaos——故意注入 tool error 验证优雅降级。Multimodal tools(接受 file/URL)要验 size/type、预处理(OCR/transcript)、返 handle/ID 不返 raw blob,artifact 存 TTL。Prompt 要写「Prefer calling tools when helpful; don't guess params; ask for missing info」。