When tools pile up, should I switch to a 1M context model?

No. 500 tools eager-loaded eat 100K (half of 200K); 1M lets you stuff 5000 tools but at 5× cost and worse latency. Right moves: lazy loading (Claude Code's ToolSearch pattern) so context never blows up, or Anthropic 2025 code execution (98.7% token reduction).

What is MCP, and how is it different from function calling?

MCP (Anthropic, 2024-11) is an open protocol defining how an LLM client discovers and invokes external tools. Before MCP each client (Cursor / Claude Code / Cline) had its own integration; after MCP you write one server and every client uses it. MCP does not solve the "too many tools blow context" problem; it standardizes how to plug in, not when to inject the schema.

Is running my own MCP server expensive?

Cheap: an MCP server is just a stdio/HTTP process — zero cost locally, < $5/mo on Cloudflare Worker / AWS Lambda. The real bill comes from APIs the server calls out to (GitHub / Notion / Linear), priced by the downstream service. The MCP protocol itself is free with no call quota.

I already use OpenAI function calling — should I switch to MCP?

Depends: pure single-client use (your own ChatGPT app) — keep function calling. Move to MCP only when Cursor / Claude Code / Cline all need to share one tool set, eliminating N client × N wrapper duplication. MCP doesn't replace function calling; it is the cross-client standardization layer above it.

Tool Call Context Cost — Surviving the MCP Era

Q: Lazy loading vs code execution — how do I choose?

Pick by tool count: < 30 use eager, 50-500 use lazy (Claude Code's ToolSearch + deferred), 500+ go to code execution (Anthropic 2025 sandbox + Python interpreter, 98.7% token reduction). Engineering complexity grows linearly; do not adopt code execution for "we might need it someday."

Q: I already use OpenAI function calling — should I switch to MCP?

Depends: pure single-client use (your own ChatGPT app) — keep function calling. Move to MCP only when Cursor / Claude Code / Cline all need to share one tool set, eliminating N client × N wrapper duplication. MCP doesn't replace function calling; it is the cross-client standardization layer above it.

Q: What is the easiest pitfall when tools pile up?

Writing tool descriptions too abstractly so the LLM picks the wrong one: "Search the database" instead of "Search the customers table by email field, returns customer_id + name + signup_date." Anthropic's guidance: write tool descriptions like onboarding docs for a new hire, every parameter spelled out.

⏱️ 20 min

The Context Cost of Tool Calls — How to Not Blow Up in the MCP Era

Five tools makes an LLM app. Fifty tools makes an agent platform. Five hundred tools is what Cursor and Claude Code carry as general-purpose IDEs. Every tool eats context — 100-300 tokens per schema, so 500 tools start you at 100K+. Once Anthropic's November 2024 MCP protocol became the standard, this turned into a production engineering problem.

How Many Tokens Does One Tool Schema Cost

# tested: 2026-04-26 · anthropic@0.40.0
{
    "name": "get_weather",
    "description": "Get the current weather for a given location. Returns temperature, conditions, humidity. Use this when the user asks about weather, climate, or atmospheric conditions for a specific city or geographic location.",
    "input_schema": {
        "type": "object",
        "properties": {
            "location": {
                "type": "string",
                "description": "The city and state/country, e.g. 'San Francisco, CA' or 'Tokyo, Japan'"
            },
            "unit": {
                "type": "string",
                "enum": ["celsius", "fahrenheit"],
                "description": "Temperature unit"
            }
        },
        "required": ["location"]
    }
}

Anthropic's token counter: ≈ 180 tokens. Simple tools land at 100-150. Complex ones (multiple enums, nested schemas, long descriptions) hit 300-500. Cursor ships ~50 built-in tools. Claude Code with MCP carries 500+.

Tool count	Total schema tokens	% of 200K	Per API call (cached)
5	~750	0.4%	negligible
50	~10K	5%	$0.0003/call (cache hit)
500	~100K	50%	$0.003/call (cache hit)

At 500 tools you still burn half your context after caching.

MCP — The Tool Standard After Nov 2024

MCP (Model Context Protocol) is the open protocol Anthropic released in November 2024 (modelcontextprotocol.io). It defines how an LLM client discovers and calls external tools.

Before MCP, every client (Cursor / Claude Code / Cline) had its own tool plug-in path
After MCP, write a single server and every client can use it — write one for GitHub, both Claude Code and Cursor get it for free

In 2025 the MCP spec moved to streamable HTTP: tool lists can be hosted remotely and pulled in real time.

MCP only standardizes how to plug a tool in, not when to load its schema. That part is still the client's engineering call.

Three Tool-Loading Strategies

1. Eager Loading — Stuff Everything In

# tested: 2026-04-26 · anthropic@0.40.0
client.messages.create(
    model="claude-sonnet-4-6",
    tools=ALL_500_TOOLS,  # 100K token 全塞
    messages=[...]
)

Pro: model sees every capability up front. Con: at 500 tools you still occupy 50% of context after a cache hit. Fits: tools under 30, used frequently.

2. Lazy Loading — Load on Demand

Claude Code's default. Expose a meta tool ToolSearch. Client mounts ~50 core tools at startup. On a complex task, the model first calls ToolSearch("send slack") to fetch the matching schema, then makes the real call on the next turn.

# Claude Code 实际行为
启动: 挂 50 个 core tool（Bash / Read / Write / Edit / Grep / ...）
当用户说: "发个 slack 给团队"
  Step 1: 模型调 ToolSearch("send slack message")
          → 返回 mcp__slack__send_message 的完整 schema
  Step 2: 模型用拿到的 schema 调用 mcp__slack__send_message

Pro: context never blows up. Con: each new tool costs an extra LLM round-trip (~500ms first use); repeat use is free. Fits: tools over 50, per-task usage under 5 (most general-purpose agents).

3. Code Execution — Anthropic's 2025 Approach

In 2025 Anthropic published Code execution with MCP: the model never receives tool schemas, just a sandboxed Python environment + MCP client library. It writes Python to call tools, interpreter runs it, results come back.

# 模型写出来的代码:
import mcp
slack = mcp.connect("slack")
result = slack.send_message(channel="#team", text="hello")
print(result)

Anthropic's measurement: token usage drops 98.7% on tool-heavy workloads.

Cost: sandboxed interpreter required (not every client can run one); model has to write decent Python (smaller models drift); debug reads code traces, not tool-call traces.

Fits: tools over 200, strong model (Sonnet 4+), sandbox available.

JR Real Case: Claude Code with 17 MCP Servers Doesn't Blow Up

JR's Claude Code config (.claude/settings.json) has 17 MCP servers — Canva, Gmail, Notion, Google Drive/Calendar, jr-data, Chrome DevTools, Playwright, Context7, AWS Deploy/Pricing, Filesystem, and more. Each exposes 5-30 tools. 200+ tools × ~200 tokens = 40K of schema if eaten eagerly.

Claude Code default:

{
  "permissions": {
    "deferred": true   // 大部分 MCP tool 默认 deferred
  }
}

deferred = don't mount schema at startup, fetch via ToolSearch on demand. Result: startup schema ≈ 5K (50 core tools); one extra round-trip per first-time tool use; repeat use within a session is free.

Lazy loading in production — the move that lets a real agent run with 200+ tools attached.

Eager vs Lazy vs Code Execution — Trade-off

Dimension	Eager loading	Lazy loading (`ToolSearch`)	Code execution
Tool count ceiling	< 30	50-500	500+
First-call latency	0	+1 round trip	+1 round trip + sandbox start
Repeat-call latency	0	0	0
Token savings	0%	70-90%	95-98%
Implementation effort	Low	Medium (write ToolSearch)	High (sandbox + code-trained model)
Debug difficulty	Easy (read tool_use directly)	Easy (ToolSearch + tool_use)	Medium (read code stdout)
Fits	Simple apps	General-purpose agents / Claude Code-style	Enterprise agents with massive tool counts

JR's internal rule: new LLM apps start eager (under 10 tools). Cross 30, switch to lazy. Code execution is the endgame for 200+ tools, not a starting point.

Takeaway

Each tool schema is 100-300 tokens. 500 tools eager-loaded eats half your 200K context. Lazy loading (ToolSearch + deferred) pushes the ceiling from 30 to 500. Code execution pushes it to 5000. Engineering complexity scales linearly too — don't pre-optimize.

References

Anthropic. (2024-11-25). Introducing the Model Context Protocol — MCP launch announcement.
MCP team. Specification 2025-11-25 — 2025 update for streamable HTTP transport.
Anthropic. Code execution with MCP — code interpreter approach measured at 98.7% token reduction.
Anthropic. Tool use documentation — tool schema format and token counting.
modelcontextprotocol. GitHub specification repo — full MCP spec source.
Wikipedia. Model Context Protocol — MCP timeline and ecosystem context.

Production case: JR Academy Claude Code config (.claude/settings.json) — 17 MCP servers / 200+ tools / lazy loading by default via ToolSearch.

📚 相关资源

❓ 常见问题

关于本章主题最常被搜索的问题，点击展开答案

Tool 多了怎么办——是不是该上 1M context 模型？

不是。500 tool eager loading 吃 100K（200K 的一半），1M 模型能塞 5000 tool 但成本涨 5×、延迟变差。正确方向：lazy loading（Claude Code 的 ToolSearch pattern）让 context 永远不爆，或 Anthropic 2025 推的 code execution（98.7% token reduction）。

MCP 是什么 — 跟 function calling 区别在哪？

MCP（Anthropic 2024-11 推）是开放协议，定义 LLM client 怎么发现 + 调用外部 tool。MCP 前每个 client（Cursor / Claude Code / Cline）自己接入；MCP 后写一次 server 全部 client 通用。MCP 不解决 tool 多 context 爆的问题，它只标准化「怎么挂」不规定「何时塞 schema」。

Lazy loading vs code execution 怎么选？

按 tool 数量分：< 30 用 eager，50-500 用 lazy（Claude Code 的 ToolSearch + deferred），500+ 上 code execution（Anthropic 2025 sandbox + Python interpreter，98.7% token 节省）。工程复杂度线性涨，不要为「未来可能」预上 code execution。

MCP server 自己跑成本贵吗？

便宜：MCP server 是个普通 stdio/HTTP 进程，本地跑零成本、云端 Cloudflare Worker / AWS Lambda 跑 < $5/月。真正烧钱的是 server 调出去的 API（GitHub / Notion / Linear），按下游 API 计费。MCP 协议本身不要钱、不限调用。

我已经在用 OpenAI function calling，要切到 MCP 吗？

看场景：纯单 client（自己 ChatGPT app）继续用 function calling 没问题；要给 Cursor / Claude Code / Cline 多个 client 共用同一套 tool 才上 MCP，省下 N 个客户端 N 套 wrapper 的工作量。MCP 不替代 function calling，是它的「跨 client 标准化层」。

Tool 多了最容易踩的坑是什么？

Tool description 写得太抽象，LLM 选不对工具：写「Search the database」而不是「Search the customers table by email field, returns customer_id + name + signup_date」。Anthropic 官方建议：tool description 写得像写给新员工的 onboarding 文档，每个参数都说清楚。