Function Calling & Tool Use
Function calling turns LLMs into orchestration engines. This chapter outlines patterns for stable tool use.
1) When to Use
- Structured actions: DB queries, API calls, code exec, search.
- Constrained outputs: prefer tool calls over free-text for reliability.
- Auditable actions: log tool name/args/results.
2) Tool Schema Design
- Clear names/descriptions; types for every param; enums for constrained values.
- Required vs optional fields; defaults kept server-side.
- Validate inputs server-side; reject/repair before execution.
3) Prompting for Tools
- System: "Prefer calling tools when helpful; don't guess params; ask for missing info."
- Few-shot: include examples of good tool calls and refusals.
- Disallow hallucination: remind model to refuse if no suitable tool exists.
4) Execution Loop
while not done:
ask model → get tool call(s)
validate/repair args
run tool in sandbox with timeout
append tool result back to model
stop if final answer / max steps / time budget
5) Safety & Limits
- Timeouts per tool; circuit-break noisy tools.
- Allowlist domains/APIs; no raw shell without sandbox.
- PII stripping before tool calls; redact secrets from logs.
- Idempotency for mutating tools; confirmation steps for risky actions.
6) Error Handling
- Distinguish user errors (bad params) vs system errors (tool down).
- Provide concise tool error back to model; let model replan or ask user.
- Retry with backoff for transient failures; limit total attempts.
7) Multimodal Tools
- Tools that accept files/URLs: validate size/type; pre-process (OCR/transcript).
- Return handles/IDs instead of raw blobs; store artifacts with TTL.
8) Testing & Evals
- Contract tests: schema compliance, required params present.
- Golden cases: correct tool selection, refusal when no tool fits.
- Load/chaos: inject tool errors and ensure graceful degradation.
9) Minimal Checklist
- Strong schemas + validation + allowlists.
- Sandbox + timeouts + retries + circuit breakers.
- Logs: tool, args (scrubbed), duration, success/fail, tokens.
📚 相关资源
❓ 常见问题
关于本章主题最常被搜索的问题,点击展开答案
什么时候该用 function calling 而不是让模型直接生成文本?
三种场景必须用:(1) 结构化动作——DB 查询、API 调用、code execution、search;(2) 受约束输出——tool call 比自由文本可靠;(3) 可审计动作——必须 log tool name/args/result。Tool call 把「模型决策 + 实际执行」拆开,参数 schema + 服务端验证比 prompt「请输出 JSON」稳定一个量级。
tool schema 应该怎么写?
原则:name 和 description 清晰、每个 param 有 type、constrained values 用 enum、明确 required vs optional、defaults 留服务端。description 要包含 usage context、examples、defaults——consolidation principle 说:人类工程师都拿不准该用哪个 tool,模型更不可能选对。所有 input 都要服务端再次校验,reject 或 repair 后再执行。
tool execution loop 应该长什么样?
标准 loop:while not done → 问模型拿 tool call(s) → 校验/修复 args → 在 sandbox 跑 + timeout → tool result 回灌给模型 → 检查是否最终答案 / 是否到 max steps / 是否超 time budget → 否则继续。每个 tool 必须有 timeout、circuit breaker(噪声 tool 自动熔断)、allowlist domain / API、no raw shell(必须 sandbox)。
tool 报错怎么处理?要不要重试?
区分错误类型:用户错误(bad params)→ 把错误简洁返给模型让它 replan 或问用户;系统错误(tool down)→ 指数退避重试,限制总次数。Mutating tool 必须 idempotent(重试不重复扣款);高风险动作(删数据、转账)加确认步骤。所有 tool 错误日志含 tool 名、args(脱敏)、duration、success/fail、tokens。
tool 系统怎么测?
三类测试:(1) Contract tests——schema 合规性、required params 存在;(2) Golden cases——正确选 tool、没合适 tool 时拒答;(3) Load/chaos——故意注入 tool error 验证优雅降级。Multimodal tools(接受 file/URL)要验 size/type、预处理(OCR/transcript)、返 handle/ID 不返 raw blob,artifact 存 TTL。Prompt 要写「Prefer calling tools when helpful; don't guess params; ask for missing info」。