Tool Call Context Cost — Surviving the MCP Era
The Context Cost of Tool Calls — How to Not Blow Up in the MCP Era
Five tools makes an LLM app. Fifty tools makes an agent platform. Five hundred tools is what Cursor and Claude Code carry as general-purpose IDEs. Every tool eats context — 100-300 tokens per schema, so 500 tools start you at 100K+. Once Anthropic's November 2024 MCP protocol became the standard, this turned into a production engineering problem.
How Many Tokens Does One Tool Schema Cost
# tested: 2026-04-26 · anthropic@0.40.0
{
"name": "get_weather",
"description": "Get the current weather for a given location. Returns temperature, conditions, humidity. Use this when the user asks about weather, climate, or atmospheric conditions for a specific city or geographic location.",
"input_schema": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state/country, e.g. 'San Francisco, CA' or 'Tokyo, Japan'"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature unit"
}
},
"required": ["location"]
}
}
Anthropic's token counter: ≈ 180 tokens. Simple tools land at 100-150. Complex ones (multiple enums, nested schemas, long descriptions) hit 300-500. Cursor ships ~50 built-in tools. Claude Code with MCP carries 500+.
| Tool count | Total schema tokens | % of 200K | Per API call (cached) |
|---|---|---|---|
| 5 | ~750 | 0.4% | negligible |
| 50 | ~10K | 5% | $0.0003/call (cache hit) |
| 500 | ~100K | 50% | $0.003/call (cache hit) |
At 500 tools you still burn half your context after caching.
MCP — The Tool Standard After Nov 2024
MCP (Model Context Protocol) is the open protocol Anthropic released in November 2024 (modelcontextprotocol.io). It defines how an LLM client discovers and calls external tools.
- Before MCP, every client (Cursor / Claude Code / Cline) had its own tool plug-in path
- After MCP, write a single server and every client can use it — write one for GitHub, both Claude Code and Cursor get it for free
In 2025 the MCP spec moved to streamable HTTP: tool lists can be hosted remotely and pulled in real time.
MCP only standardizes how to plug a tool in, not when to load its schema. That part is still the client's engineering call.
Three Tool-Loading Strategies
1. Eager Loading — Stuff Everything In
# tested: 2026-04-26 · anthropic@0.40.0
client.messages.create(
model="claude-sonnet-4-6",
tools=ALL_500_TOOLS, # 100K token 全塞
messages=[...]
)
Pro: model sees every capability up front. Con: at 500 tools you still occupy 50% of context after a cache hit. Fits: tools under 30, used frequently.
2. Lazy Loading — Load on Demand
Claude Code's default. Expose a meta tool ToolSearch. Client mounts ~50 core tools at startup. On a complex task, the model first calls ToolSearch("send slack") to fetch the matching schema, then makes the real call on the next turn.
# Claude Code 实际行为
启动: 挂 50 个 core tool(Bash / Read / Write / Edit / Grep / ...)
当用户说: "发个 slack 给团队"
Step 1: 模型调 ToolSearch("send slack message")
→ 返回 mcp__slack__send_message 的完整 schema
Step 2: 模型用拿到的 schema 调用 mcp__slack__send_message
Pro: context never blows up. Con: each new tool costs an extra LLM round-trip (~500ms first use); repeat use is free. Fits: tools over 50, per-task usage under 5 (most general-purpose agents).
3. Code Execution — Anthropic's 2025 Approach
In 2025 Anthropic published Code execution with MCP: the model never receives tool schemas, just a sandboxed Python environment + MCP client library. It writes Python to call tools, interpreter runs it, results come back.
# 模型写出来的代码:
import mcp
slack = mcp.connect("slack")
result = slack.send_message(channel="#team", text="hello")
print(result)
Anthropic's measurement: token usage drops 98.7% on tool-heavy workloads.
Cost: sandboxed interpreter required (not every client can run one); model has to write decent Python (smaller models drift); debug reads code traces, not tool-call traces.
Fits: tools over 200, strong model (Sonnet 4+), sandbox available.
JR Real Case: Claude Code with 17 MCP Servers Doesn't Blow Up
JR's Claude Code config (.claude/settings.json) has 17 MCP servers — Canva, Gmail, Notion, Google Drive/Calendar, jr-data, Chrome DevTools, Playwright, Context7, AWS Deploy/Pricing, Filesystem, and more. Each exposes 5-30 tools. 200+ tools × ~200 tokens = 40K of schema if eaten eagerly.
Claude Code default:
{
"permissions": {
"deferred": true // 大部分 MCP tool 默认 deferred
}
}
deferred = don't mount schema at startup, fetch via ToolSearch on demand. Result: startup schema ≈ 5K (50 core tools); one extra round-trip per first-time tool use; repeat use within a session is free.
Lazy loading in production — the move that lets a real agent run with 200+ tools attached.
Eager vs Lazy vs Code Execution — Trade-off
| Dimension | Eager loading | Lazy loading (ToolSearch) | Code execution |
|---|---|---|---|
| Tool count ceiling | < 30 | 50-500 | 500+ |
| First-call latency | 0 | +1 round trip | +1 round trip + sandbox start |
| Repeat-call latency | 0 | 0 | 0 |
| Token savings | 0% | 70-90% | 95-98% |
| Implementation effort | Low | Medium (write ToolSearch) | High (sandbox + code-trained model) |
| Debug difficulty | Easy (read tool_use directly) | Easy (ToolSearch + tool_use) | Medium (read code stdout) |
| Fits | Simple apps | General-purpose agents / Claude Code-style | Enterprise agents with massive tool counts |
JR's internal rule: new LLM apps start eager (under 10 tools). Cross 30, switch to lazy. Code execution is the endgame for 200+ tools, not a starting point.
Takeaway
Each tool schema is 100-300 tokens. 500 tools eager-loaded eats half your 200K context. Lazy loading (ToolSearch + deferred) pushes the ceiling from 30 to 500. Code execution pushes it to 5000. Engineering complexity scales linearly too — don't pre-optimize.
References
- Anthropic. (2024-11-25). Introducing the Model Context Protocol — MCP launch announcement.
- MCP team. Specification 2025-11-25 — 2025 update for streamable HTTP transport.
- Anthropic. Code execution with MCP — code interpreter approach measured at 98.7% token reduction.
- Anthropic. Tool use documentation — tool schema format and token counting.
- modelcontextprotocol. GitHub specification repo — full MCP spec source.
- Wikipedia. Model Context Protocol — MCP timeline and ecosystem context.
Production case: JR Academy Claude Code config (.claude/settings.json) — 17 MCP servers / 200+ tools / lazy loading by default via ToolSearch.
📚 相关资源
❓ 常见问题
关于本章主题最常被搜索的问题,点击展开答案
Tool 多了怎么办——是不是该上 1M context 模型?
不是。500 tool eager loading 吃 100K(200K 的一半),1M 模型能塞 5000 tool 但成本涨 5×、延迟变差。正确方向:lazy loading(Claude Code 的 ToolSearch pattern)让 context 永远不爆,或 Anthropic 2025 推的 code execution(98.7% token reduction)。
MCP 是什么 — 跟 function calling 区别在哪?
MCP(Anthropic 2024-11 推)是开放协议,定义 LLM client 怎么发现 + 调用外部 tool。MCP 前每个 client(Cursor / Claude Code / Cline)自己接入;MCP 后写一次 server 全部 client 通用。MCP 不解决 tool 多 context 爆的问题,它只标准化「怎么挂」不规定「何时塞 schema」。
Lazy loading vs code execution 怎么选?
按 tool 数量分:< 30 用 eager,50-500 用 lazy(Claude Code 的 ToolSearch + deferred),500+ 上 code execution(Anthropic 2025 sandbox + Python interpreter,98.7% token 节省)。工程复杂度线性涨,不要为「未来可能」预上 code execution。
MCP server 自己跑成本贵吗?
便宜:MCP server 是个普通 stdio/HTTP 进程,本地跑零成本、云端 Cloudflare Worker / AWS Lambda 跑 < $5/月。真正烧钱的是 server 调出去的 API(GitHub / Notion / Linear),按下游 API 计费。MCP 协议本身不要钱、不限调用。
我已经在用 OpenAI function calling,要切到 MCP 吗?
看场景:纯单 client(自己 ChatGPT app)继续用 function calling 没问题;要给 Cursor / Claude Code / Cline 多个 client 共用同一套 tool 才上 MCP,省下 N 个客户端 N 套 wrapper 的工作量。MCP 不替代 function calling,是它的「跨 client 标准化层」。
Tool 多了最容易踩的坑是什么?
Tool description 写得太抽象,LLM 选不对工具:写「Search the database」而不是「Search the customers table by email field, returns customer_id + name + signup_date」。Anthropic 官方建议:tool description 写得像写给新员工的 onboarding 文档,每个参数都说清楚。