Context Engineering

从 Prompt 到 Context — Karpathy 命名的下一代 LLM 工程

👤适合：写过 RAG 但发现「召回了相关文档 LLM 还是答错」的工程师 / 做 Agent 时被 context window 爆掉折磨过的开发者 / 想理解 Cursor / Claude Code / Cline 体验差异从何而来的技术 lead / 刚学完 Prompt Engineering 想往工程化进阶的学员

⏱️3-4 周

📊中级

2025 年 6 月 25 日，Karpathy 在 X 发了一条推文，把整个 LLM 应用层的工作重命名了：他建议把 "prompt engineering" 改叫 "context engineering"。理由是——prompt 只是你日常和 ChatGPT 聊天那两句话，但任何工业级 LLM 应用，喂给模型的远不止 prompt：system instruction、工具定义、检索结果、对话历史、用户输入，全部加起来才是 context，而 prompt 只是其中很小一片。

这个方向不教你"咒语模板"，教的是当你的应用每次调用 LLM 都要塞 50K 到 200K token 上下文时，怎么决定哪些进、哪些不进、按什么顺序、怎么压缩、怎么记忆——这是工程问题，不是 prompt 措辞问题。

10 章按真实工程困境组织：context selection 怎么做、token budget 怎么分、Agent memory 怎么分层、Cursor 和 Claude Code 的 context 策略到底差在哪、自己怎么 7 天搭一个有评估集的 production RAG。每章配真实的论文 / Anthropic 官方文档 / JR omni-report production 案例，不是 hello world。

30秒快速体验

下面这段是 JR omni-report 项目里一个真实的 routine prompt 骨架。读它，然后想想：哪些部分是 prompt，哪些部分是 context engineering？

# Phase 0：准备 + 读上游
1. 读 PRD_AI_VISIBILITY.md 了解格式
2. ls ai-visibility/ 看历史（拿上周数据做环比）
3. TZ='Australia/Brisbane' date +%Y-%m-%d → $DATE

# Phase 1：写骨架 + commit
Write ai-visibility/$DATE.md 骨架（10 个 section _TBD_ 占位）
commit + push: feat(ai-visibility): scaffold $DATE

# Phase 2：4 batch × 5 query × 2 layer
每 batch 处理 5 个 query，每个 query 跑两层测试（Web + LLM 自答），
做完立即 Edit 对应表格 + commit + push。
...

整个 prompt 看起来像一长段指令——但真正在做的事是：(1) 把 task 拆成 6 个 phase 强制 commit，避免 stream idle timeout；(2) 上游数据通过 ls 命令注入 context；(3) 输出文件路径变量化，防止 LLM 自己瞎想。这些都不是 prompt 措辞，是 context budget + selection + scaffolding。这就是 context engineering。

你将学会什么

在这个教程中，你将学会：

✓看到一个 LLM 应用，能把它的 context 拆成 5 层（system / tools / memory / retrieval / user）并指出哪一层最先出问题
✓写 RAG 时不再只关心召回率，知道用 rerank + LLM-as-judge 把召回变成选择
✓做 Agent 时能算出每个 tool schema 占多少 token、哪个 tool 该懒加载、什么时候该用 sub-agent 隔离 context
✓理解 Cursor / Claude Code / Cline 三个工具用同样模型却体验差很多的工程原因，能给团队选工具
✓7 天能 build 一个带评估集的 production RAG，不是 hello world

后续章节总览

按大章节快速预览，直接跳到你想学的部分。

Section

入门

2025 年 6 月 Karpathy 与 Tobi Lutke 把 prompt engineering 改名 context engineering。区别是什么、为什么是工程问题、跟 RAG/Agent 的关系怎么算

3 节阅读/图文

进入入门 →

Section

工程化

5 层 context 抢同一个 200K 池子。Anthropic 4 档 token 计价（cache hit 1× / output 50×）+ 5 个 budget 技术（cache stable / 摘要 history / lazy-load tools / rerank retrieval / output cap）

4 节阅读/图文

进入工程化 →

Section

应用

同模型同 task，三个工具体验差 80%——差异全在 context 策略。Cursor 走 vector RAG / Claude Code 走 agentic search / Cline 走 auto sub-task。实测 Claude Code 比 Cursor 少用 5.5× token

3 节阅读/图文

进入应用 →