工具调用的 Context 成本 — MCP 时代怎么不爆

Q: Tool 多了怎么办——是不是该上 1M context 模型？

不是。500 tool eager loading 吃 100K（200K 的一半），1M 模型能塞 5000 tool 但成本涨 5×、延迟变差。正确方向：lazy loading（Claude Code 的 ToolSearch pattern）让 context 永远不爆，或 Anthropic 2025 推的 code execution（98.7% token reduction）。

Q: MCP 是什么 — 跟 function calling 区别在哪？

MCP（Anthropic 2024-11 推）是开放协议，定义 LLM client 怎么发现 + 调用外部 tool。MCP 前每个 client（Cursor / Claude Code / Cline）自己接入；MCP 后写一次 server 全部 client 通用。MCP 不解决 tool 多 context 爆的问题，它只标准化「怎么挂」不规定「何时塞 schema」。

Q: Lazy loading vs code execution 怎么选？

按 tool 数量分：< 30 用 eager，50-500 用 lazy（Claude Code 的 ToolSearch + deferred），500+ 上 code execution（Anthropic 2025 sandbox + Python interpreter，98.7% token 节省）。工程复杂度线性涨，不要为「未来可能」预上 code execution。

Q: MCP server 自己跑成本贵吗？

便宜：MCP server 是个普通 stdio/HTTP 进程，本地跑零成本、云端 Cloudflare Worker / AWS Lambda 跑 < $5/月。真正烧钱的是 server 调出去的 API（GitHub / Notion / Linear），按下游 API 计费。MCP 协议本身不要钱、不限调用。

Q: 我已经在用 OpenAI function calling，要切到 MCP 吗？

看场景：纯单 client（自己 ChatGPT app）继续用 function calling 没问题；要给 Cursor / Claude Code / Cline 多个 client 共用同一套 tool 才上 MCP，省下 N 个客户端 N 套 wrapper 的工作量。MCP 不替代 function calling，是它的「跨 client 标准化层」。

Q: Tool 多了最容易踩的坑是什么？

Tool description 写得太抽象，LLM 选不对工具：写「Search the database」而不是「Search the customers table by email field, returns customer_id + name + signup_date」。Anthropic 官方建议：tool description 写得像写给新员工的 onboarding 文档，每个参数都说清楚。

⏱️ 20 分钟

工具调用的 Context 成本 — MCP 时代怎么不爆

5 个 tool 是 LLM 应用，50 个是 agent 平台，500 个是 Cursor / Claude Code 这种通用 IDE。每个 100-300 token schema，500 个 100K+ 起步。Anthropic 2024-11 MCP 协议普及后，这是 production 必解的工程问题。

一个 tool schema 占多少 token

# tested: 2026-04-26 · anthropic@0.40.0
{
    "name": "get_weather",
    "description": "Get the current weather for a given location. Returns temperature, conditions, humidity. Use this when the user asks about weather, climate, or atmospheric conditions for a specific city or geographic location.",
    "input_schema": {
        "type": "object",
        "properties": {
            "location": {
                "type": "string",
                "description": "The city and state/country, e.g. 'San Francisco, CA' or 'Tokyo, Japan'"
            },
            "unit": {
                "type": "string",
                "enum": ["celsius", "fahrenheit"],
                "description": "Temperature unit"
            }
        },
        "required": ["location"]
    }
}

Anthropic token counter：≈ 180 token。简单 tool 100-150；复杂 tool（多 enum、嵌套、长 description）300-500。Cursor 内置 ~50，Claude Code 含 MCP 后 500+。

Tool 数量	Tool schema 总 token	占 200K 比例	每次 API call 成本（cached）
5	~750	0.4%	忽略
50	~10K	5%	$0.0003/call (cache hit)
500	~100K	50%	$0.003/call (cache hit)

500 tool 时 cache 后还是吃掉一半 context。

MCP — 2024-11 之后的 tool 标准

MCP（Model Context Protocol）是 Anthropic 2024-11 推出的开放协议（modelcontextprotocol.io），定义 LLM client 怎么发现 + 调用外部 tool。

MCP 之前，每个客户端（Cursor / Claude Code / Cline）一套接入方式
MCP 之后，一个 server 全部客户端通用——给 GitHub 写一个，Claude Code 和 Cursor 都能用

2025 MCP spec 更新 streamable HTTP：tool 列表远程托管 + 实时拉取。MCP 只标准化「怎么挂」，没规定「何时塞 schema」——后者是 client 的决策。

三种 tool loading 策略

1. Eager loading — 全塞

# tested: 2026-04-26 · anthropic@0.40.0
client.messages.create(
    model="claude-sonnet-4-6",
    tools=ALL_500_TOOLS,  # 100K token 全塞
    messages=[...]
)

优点：模型看到全部能力。缺点：500 tool cache hit 后仍占 50% context。适合：tool < 30 + 高频用所有 tool。

2. Lazy loading — 按需加载

Claude Code 默认策略。提供 meta tool ToolSearch，启动只挂 50 个核心 tool；复杂 task 时模型先调 ToolSearch("send slack") 拿匹配 schema，下一轮正式调用。

# Claude Code 实际行为
启动: 挂 50 个 core tool（Bash / Read / Write / Edit / Grep / ...）
当用户说: "发个 slack 给团队"
  Step 1: 模型调 ToolSearch("send slack message")
          → 返回 mcp__slack__send_message 的完整 schema
  Step 2: 模型用拿到的 schema 调用 mcp__slack__send_message

优点：context 不会爆。缺点：每个新 tool 多 1 轮 LLM 调用（首次 +500ms）；重复用同一 tool 无成本。适合：tool > 50，单 task 用到 < 5（大部分通用 agent）。

3. Code execution — 2025 Anthropic 新法

2025 Anthropic Code execution with MCP：模型不接收 schema，接收 sandboxed Python + MCP client 库；模型写 Python 调用 tool，interpreter 执行返回结果。

# 模型写出来的代码:
import mcp
slack = mcp.connect("slack")
result = slack.send_message(channel="#team", text="hello")
print(result)

Anthropic 实测：tool-heavy workload 下 token 降低 98.7%。

代价：需要 sandboxed interpreter、模型要会写 Python（小模型跑偏）、debug 看 code trace。适合：tool > 200 + 强模型（Sonnet 4+）+ 能跑 sandbox。

JR 真实案例：Claude Code 17 个 MCP server 不爆

JR Claude Code 配置（.claude/settings.json）挂 17 个 MCP server——Canva、Gmail、Notion、Google Drive/Calendar、jr-data、Chrome DevTools、Playwright、Context7、AWS Deploy/Pricing、Filesystem 等。每个 5-30 个 tool，合计 200+ × 200 tok = eager 烧 40K schema。

Claude Code 默认配置：

{
  "permissions": {
    "deferred": true   // 大部分 MCP tool 默认 deferred
  }
}

deferred = 启动不挂 schema，按需通过 ToolSearch 拉。结果：启动 schema ≈ 5K（core 50 个）、首次某 tool 多 1 次 round trip、重复用零成本。

production agent 跑 200+ tool 的关键工程动作。

Eager vs Lazy vs Code execution — Trade-off

维度	Eager loading	Lazy loading (`ToolSearch`)	Code execution
Tool 总数上限	< 30	50-500	500+
首次调用延迟	0	+1 round trip	+1 round trip + sandbox 启动
重复调用延迟	0	0	0
Token 节省	0%	70-90%	95-98%
实现复杂度	低	中（要写 ToolSearch）	高（要 sandbox + code-trained 模型）
Debug 难度	易（直接看 tool_use）	易（看 ToolSearch + tool_use）	中（看 code stdout）
适合	简单应用	通用 agent / Claude Code 类	tool 极多的企业级 agent

JR 内规：新 LLM 应用先 eager（< 10 tool），超过 30 改 lazy。Code execution 是 200+ tool 的终局，不是起点。

一句话带走

每个 tool schema 100-300 token。500 tool eager loading 吃掉一半 200K context。Lazy loading（ToolSearch + deferred）让 tool 数量从 30 上限提到 500；code execution 提到 5000。但工程复杂度也线性涨，不要过早优化。

引用来源

Anthropic. (2024-11-25). Introducing the Model Context Protocol — MCP 首发公告.
MCP team. Specification 2025-11-25 — streamable HTTP transport 的 2025 更新.
Anthropic. Code execution with MCP — code interpreter approach 实测 token 降低 98.7%.
Anthropic. Tool use documentation — tool schema 格式 + token counting.
modelcontextprotocol. GitHub specification repo — MCP 完整规范源.
Wikipedia. Model Context Protocol — MCP 时间线 + 生态背景.

Production case: JR Academy Claude Code 配置（.claude/settings.json）— 17 个 MCP server / 200+ tool / 默认 lazy loading via ToolSearch.

📚 相关资源

❓ 常见问题

关于本章主题最常被搜索的问题，点击展开答案

Tool 多了怎么办——是不是该上 1M context 模型？

不是。500 tool eager loading 吃 100K（200K 的一半），1M 模型能塞 5000 tool 但成本涨 5×、延迟变差。正确方向：lazy loading（Claude Code 的 ToolSearch pattern）让 context 永远不爆，或 Anthropic 2025 推的 code execution（98.7% token reduction）。

MCP 是什么 — 跟 function calling 区别在哪？

MCP（Anthropic 2024-11 推）是开放协议，定义 LLM client 怎么发现 + 调用外部 tool。MCP 前每个 client（Cursor / Claude Code / Cline）自己接入；MCP 后写一次 server 全部 client 通用。MCP 不解决 tool 多 context 爆的问题，它只标准化「怎么挂」不规定「何时塞 schema」。

Lazy loading vs code execution 怎么选？

按 tool 数量分：< 30 用 eager，50-500 用 lazy（Claude Code 的 ToolSearch + deferred），500+ 上 code execution（Anthropic 2025 sandbox + Python interpreter，98.7% token 节省）。工程复杂度线性涨，不要为「未来可能」预上 code execution。

MCP server 自己跑成本贵吗？

便宜：MCP server 是个普通 stdio/HTTP 进程，本地跑零成本、云端 Cloudflare Worker / AWS Lambda 跑 < $5/月。真正烧钱的是 server 调出去的 API（GitHub / Notion / Linear），按下游 API 计费。MCP 协议本身不要钱、不限调用。

我已经在用 OpenAI function calling，要切到 MCP 吗？

看场景：纯单 client（自己 ChatGPT app）继续用 function calling 没问题；要给 Cursor / Claude Code / Cline 多个 client 共用同一套 tool 才上 MCP，省下 N 个客户端 N 套 wrapper 的工作量。MCP 不替代 function calling，是它的「跨 client 标准化层」。

Tool 多了最容易踩的坑是什么？

Tool description 写得太抽象，LLM 选不对工具：写「Search the database」而不是「Search the customers table by email field, returns customer_id + name + signup_date」。Anthropic 官方建议：tool description 写得像写给新员工的 onboarding 文档，每个参数都说清楚。