Context Engineering 是什么 — Karpathy 的重命名

⏱️ 15 分钟

Context Engineering 是什么 — Karpathy 的重命名

2025-06-25，Karpathy 在 X 发推：「+1 for 'context engineering' over 'prompt engineering'. People associate prompts with short task descriptions you'd give an LLM in your day-to-day use. When in every industrial-strength LLM app, context engineering is the delicate art and science of filling the context window with just the right information for the next step.」

— @karpathy, 2025-06-25

这是对 Shopify CEO Tobi Lutke 7 天前那条「I really like the term 'context engineering' over prompt engineering」（@tobi, 2025-06-18）的接力。LLM 应用每次调用塞 50K-200K token 的上下文，prompt 只是一小片。

你是谁

按工程困境跳读：

你卡在	跳到
RAG 召回了相关文档但 LLM 答错	第 3、5 章
Agent 跑几步 context 就爆	第 4、6 章
选 Cursor / Claude Code / Cline	第 8 章
自己 build production RAG	第 10 章
刚学完 Prompt Engineering	1→2→3→4→5

Prompt 和 Context 的边界

Tobi Lutke 的定义——「the art of providing all the context for the task to be plausibly solvable by the LLM」。

Prompt 是你打的那几行指令。Context 是 LLM 看到的全部 token：

System instruction — 你在 API 里设的 system prompt
Tool definitions — 每个 function calling / MCP 工具的 schema 描述
Memory / 历史对话 — 之前轮次的内容（或摘要）
Retrieved context — RAG 拉来的文档、向量召回段落、网页抓取结果
User input — 这一轮用户实际打进去的话

Prompt engineering 是写好第 5；context engineering 是决定 1-5 加起来怎么填满 200K 窗口而不爆。

JR 的 omni-report 跑 17 个 routine——「JR AI Visibility Weekly」每周扫 20 个学员 query 看 AI 引擎是否推荐 JR Academy。它的 prompt 在做 4 件 context engineering：

# 真实节选：omni-report/JR AI Visibility Weekly routine prompt
# tested: 2026-04-26 · model: claude-sonnet-4-6

【Phase 1：写骨架 + commit】
Write `ai-visibility/$DATE.md` 骨架（10 个 _TBD_ 占位）
commit + push: feat(ai-visibility): scaffold $DATE

【Phase 2：4 batch × 5 query × 2 layer】
每 batch 处理 5 个 query → Edit 对应表格 → commit + push
（避免 stream idle timeout）

Context budget — task 拆 6 phase 强制 commit，避免输出累积超 stream timeout
Context selection — 用 ls 把上游路径塞进 context，不让 LLM 猜
Context scaffolding — 先写骨架（10 个 TBD 占位），给 Edit 结构 anchor
Context isolation — 每 batch 独立 commit，前面错误不污染后续

都不是 prompt 措辞，是结构决策。

为什么 2025 年才出来这个词

ChatGPT 是 2022 年底发布的，两年半后才有人喊「应该改名」。三件事叠加：

第 1 件：RAG 普及（2023-2024）

LangChain 2024 年 1 月发 v0.1，LlamaIndex 同期。RAG 把 LLM 输入侧从「一段话」变成「一段话 + 5 段召回 + 3 轮摘要」。Stanford 2023 年 7 月 Lost in the Middle（arXiv:2307.03172, Liu et al.）实验证明 LLM 对长 context 中段注意力衰减——召回 ≠ 模型会用。

第 2 件：Agent 兴起（2024）

Anthropic 2024-12-20 Building Effective Agents 把 agentic system 拆成 prompt chaining、routing、parallelization、orchestrator-workers、evaluator-optimizer 五种。每种的 challenge 是「让每一步只看到它该看到的」。

第 3 件：Long context 模型（2024-2025）

Claude 3.5 给到 200K，Gemini 1.5 Pro 直接 1M。但塞太多召回准确率反而下降（Lost in the Middle 在 100K 更严重），成本线性涨，延迟变差。Token 多了，怎么分配成了新工程。

三条线汇总到 2025 年中，Tobi 和 Karpathy 在 6 月差 7 天说同一句话。Simon Willison 2025-06-27 博文复盘——不是命名营销，是社区 2 年肌肉记忆有了名字。

Prompt Engineering vs Context Engineering — 不是替代，是上层

很多人误读成「prompt eng 过时了」。Prompt 是 context 的一部分。两者关系：

维度	Prompt Engineering	Context Engineering
关注对象	一句指令的措辞、few-shot	全部 5 层 context 怎么组装
典型场景	打开 ChatGPT 对话框	写生产 LLM 应用
优化目标	这一句话让模型答对	pipeline 在 1000 个真实 query 上稳定
失败模式	模型理解错你要什么	模型「理解对了但答错」（污染、衰减、超限）
必要前置	无	必须先会 prompt engineering
不适用	一次性、无状态对话	玩具 demo

Prompt engineering 解决「沟通问题」，context engineering 解决「系统问题」。前者像写邮件，后者像设计邮件协议。

为什么是「engineering」不是「design」

第三方常说成「context design」「prompt craftsmanship」。Karpathy 刻意用 engineering，有信号：

可度量 — eval set 跑 1000 query 看准确率，不是「我觉得不错」
可复用 — context selection 策略写成代码，新项目直接搬
可以崩 — 改了 retrieval 阈值第二天准确率掉 15%，这是工程事故
trade-off 明确 — 多塞 5 段召回 vs token 成本翻倍 vs 延迟 +800ms，三选一

Anthropic prompt caching 文档里 5 分钟 TTL 的设计就是工程取舍——cache 命中省 90% 成本，但 5 分钟过了得重建。

接下来 9 章

#	章节	学什么
2	与 Prompt Engineering 的边界	5 层 context 调试
3	Context Selection — 召回 ≠ 答对	注意力衰减 + Lost in the Middle
4	Token Budget — 200K 怎么分	5 层抢池子，分优先级
5	Rerank — 召回变选择	bi-encoder + cross-encoder + LLM-judge
6	Agent Memory 三层架构	scratchpad / working / persistent
7	工具调用的 Context 成本	tool schema token、MCP 动态发现
8	Cursor / Claude Code / Cline 对比	体验差 80% 在 context 策略
9	多 Agent context 隔离	sub-agent + 摘要回传
10	7 天 build production RAG	评估集、监控、checklist

每章配真实代码 + JR omni-report production 案例。

一句话带走

Prompt 是「你说什么」，Context 是「LLM 看到什么」。从「你说什么」到「LLM 看到什么」，中间还有 system prompt、工具定义、memory、retrieval 4 层——这 4 层是 production LLM 应用 80% 的工程量所在。

引用来源

Karpathy. (2025-06-25). Tweet on context engineering.
Tobi Lutke. (2025-06-18). Tweet on context engineering.
Anthropic. (2024-12-20). Building Effective Agents.
Liu et al. (2023-07-06). Lost in the Middle. arXiv:2307.03172.
Simon Willison. (2025-06-27). Context engineering.
Anthropic. Prompt caching docs.

Production case: JR Academy omni-report — 17 个 routine 的实践。

📚 相关资源

❓ 常见问题

关于本章主题最常被搜索的问题，点击展开答案

Context Engineering 和 Prompt Engineering 的区别是什么？

Context engineering 包含 5 层：system prompt、工具定义、memory、检索结果、user input。Prompt engineering 只优化第 5 层措辞；context engineering 优化全部 5 层怎么组装。Karpathy 2025-06-25 推文：prompt 只是 context 一小片。

为什么 2025 年才有这个词？

RAG 和 Agent 普及后，LLM 单次输入动辄 50K-200K token、来自多个来源，context 选/排/压缩变成独立工程问题。Tobi Lutke (Shopify CEO) 2025-06-18 推、Karpathy 2025-06-25 接力，词才落地。

学这个之前要先学 Prompt Engineering 吗？

要。Prompt 是 context 5 层中的第 5 层，跳过 prompt 基础学 context engineering 等于学钢琴不识谱。先走完 JR Academy 的 Prompt 大师方向。

Context Engineering 是不是又一个 hype 词？

不是。Anthropic 2024-12-20 发布的「Building Effective Agents」blog 已经在做 context 工程化，只是没命名。Cursor / Claude Code / Cline 调同一模型体验差 80%，差异全在 context 策略。

学完整个方向自己跑一遍 demo 大概要花多少钱？

$15-25 总成本：Anthropic API $5-10（10 章 demo + 第 10 章 7 天 RAG 实战）、Cohere rerank $2-5、向量库 Pinecone/Qdrant 免费档够用。本地跑 chunking + eval 脚本不烧钱，CPU 笔记本即可。

我每天能学 1 小时，多久能把 10 章走完？

3-4 周：10 章每章 15-25 分钟阅读 + 30-45 分钟动手验证，加第 10 章 7 天 RAG 实战。每天 1 小时节奏走 21 天，每天 2 小时压到 14 天。建议把第 10 章实战集中在最后一周做，前面章节边读边攒理解。

不学 context engineering，直接 fine-tune 一个模型行不行？

不行。Fine-tune 解决「风格 / 格式 / 领域语气」，解决不了「实时数据 / 工具调用 / 多轮记忆 / 大文档检索」——这些全是 context engineering 的活。OpenAI 和 Anthropic 官方文档都明确推荐：先做 RAG + prompt，证明不够再 fine-tune，95% 场景根本走不到 fine-tune。