How is context engineering different from prompt engineering?

Context engineering covers 5 layers: system prompt, tool definitions, memory, retrieval results, user input. Prompt engineering only tunes layer 5 wording; context engineering tunes how all 5 assemble. Karpathy tweet 2025-06-25: prompt is only a small slice of context.

Why did this term only emerge in 2025?

After RAG and Agents went mainstream, a single LLM input routinely carries 50K-200K tokens from multiple sources, making context selection/ordering/compression an independent engineering problem. Tobi Lutke (Shopify CEO) tweeted 2025-06-18, Karpathy followed 2025-06-25, the term stuck.

Should I learn prompt engineering first?

Yes. Prompt is layer 5 of the 5-layer context stack; skipping prompt fundamentals is like learning piano without reading sheet music. Finish the Prompt Master track on JR Academy first.

Is context engineering just another hype term?

No. Anthropic's "Building Effective Agents" blog (2024-12-20) was already engineering context, just unnamed. Cursor / Claude Code / Cline call the same model but UX differs 80%, and the gap is entirely context strategy.

How much will it cost to run all the demos end to end?

$15-25 total: Anthropic API $5-10 (10-chapter demos + chapter 10's 7-day RAG build), Cohere rerank $2-5, Pinecone/Qdrant free tier covers vector storage. Chunking and eval scripts run locally on a CPU laptop, no GPU bill.

I have 1 hour per day — how long to finish all 10 chapters?

3-4 weeks: each of the 10 chapters takes 15-25 min reading + 30-45 min hands-on, plus chapter 10's 7-day RAG build. At 1 hour/day expect 21 days; 2 hours/day cuts it to 14. Schedule chapter 10's build for the final week and let earlier chapters compound understanding.

Can I just fine-tune a model instead of learning context engineering?

No. Fine-tuning solves style / format / domain tone, not real-time data / tool use / multi-turn memory / large-doc retrieval — those are context engineering jobs. Both OpenAI and Anthropic docs explicitly recommend: do RAG + prompt first, prove it is insufficient, then fine-tune. 95% of cases never reach the fine-tune step.

What is Context Engineering — Karpathy's Rename

⏱️ 15 min

What Context Engineering Is — Karpathy's Renaming

2025-06-25, Karpathy posts on X: "+1 for 'context engineering' over 'prompt engineering'. People associate prompts with short task descriptions you'd give an LLM in your day-to-day use. When in every industrial-strength LLM app, context engineering is the delicate art and science of filling the context window with just the right information for the next step."

— @karpathy, 2025-06-25

That tweet was a relay handoff from Shopify CEO Tobi Lutke's post 7 days earlier — "I really like the term 'context engineering' over prompt engineering" (@tobi, 2025-06-18). Once an LLM app stuffs 50K to 200K tokens of context per call, the prompt is just a small slice of the whole thing.

Pick Your Path First

Jump based on the engineering problem:

You're stuck on	Jump to
RAG retrieves right docs but LLM answers wrong	Ch 3, Ch 5
Agent burns through context window in a few steps	Ch 4, Ch 6
Picking Cursor / Claude Code / Cline	Ch 8
Building production RAG yourself	Ch 10
Just finished Prompt Engineering	Read 1→2→3→4→5

Where Prompt Ends and Context Begins

Tobi Lutke's definition — "the art of providing all the context for the task to be plausibly solvable by the LLM".

Prompt is the few lines you typed. Context is every token the LLM actually sees:

System instruction — the system prompt set in the API
Tool definitions — schema for every function calling / MCP tool
Memory / chat history — earlier turns (or summaries)
Retrieved context — RAG-fetched docs, vector passages, scraped pages
User input — what the user typed this turn

Prompt engineering writes item 5; context engineering decides how 1-5 stack into a 200K window without blowing it up.

JR's omni-report runs 17 routines — "JR AI Visibility Weekly" sweeps 20 real student queries weekly to check whether AI engines recommend JR Academy. Its prompt is doing 4 context engineering moves:

# 真实节选：omni-report/JR AI Visibility Weekly routine prompt
# tested: 2026-04-26 · model: claude-sonnet-4-6

【Phase 1：写骨架 + commit】
Write `ai-visibility/$DATE.md` 骨架（10 个 _TBD_ 占位）
commit + push: feat(ai-visibility): scaffold $DATE

【Phase 2：4 batch × 5 query × 2 layer】
每 batch 处理 5 个 query → Edit 对应表格 → commit + push
（避免 stream idle timeout）

Context budget — task split into 6 phases with forced commits, output never piles past stream timeout
Context selection — use ls to push upstream paths into context, don't let LLM guess
Context scaffolding — skeleton file first (10 TBD placeholders) gives Edits structural anchors
Context isolation — each batch commits independently, earlier mistakes don't pollute later

None are prompt wording. All structural decisions.

Why This Term Only Showed Up in 2025

ChatGPT shipped at the end of 2022. It took two and a half years for someone to say "we should rename this" — engineering reality shifted. Three things stacked up:

Thing 1: RAG went mainstream (2023-2024)

LangChain shipped v0.1 in January 2024, LlamaIndex same time. RAG turned LLM input from "a paragraph" into "a paragraph + 5 retrieved passages + summary of last 3 turns". Stanford's Lost in the Middle in July 2023 (arXiv:2307.03172, Liu et al.) proved LLM attention drops hard in the middle — docs in the prompt ≠ model uses them.

Thing 2: Agents took off (2024)

Anthropic dropped Building Effective Agents on 2024-12-20, breaking agentic systems into five patterns: prompt chaining, routing, parallelization, orchestrator-workers, evaluator-optimizer. The engineering challenge behind each is "make sure each step only sees what it should see".

Thing 3: Long-context models (2024-2025)

Claude 3.5 gave us 200K context windows. Gemini 1.5 Pro went straight to 1M. But being able to stuff isn't the same as needing to — stuffing too much drops retrieval accuracy (Lost in the Middle gets worse at 100K), cost climbs linearly, latency degrades. More tokens means token allocation became the new engineering problem.

Three threads converged by mid-2025, and Tobi and Karpathy said nearly the same line within 7 days in June. Simon Willison's 2025-06-27 post recaps — not a naming gimmick, 2 years of engineer muscle memory finally getting a name.

Prompt Engineering vs Context Engineering — Not Replacement, It's a Layer Above

A lot of people misread this as "prompt eng is obsolete". Prompt is part of context. Their relationship:

Dimension	Prompt Engineering	Context Engineering
Focus	Wording of one instruction, few-shot	How all 5 context layers assemble
Typical scenario	ChatGPT chat box	Production LLM app
Optimization goal	Nail this one prompt	Pipeline stays reliable across 1000 queries
Failure mode	Model misunderstands you	Model "understood right but answered wrong" (pollution, decay, overflow)
Prerequisite	None	Must know prompt engineering first
Doesn't apply when	One-shot, stateless	Toy demo

Prompt engineering solves the "communication problem". Context engineering solves the "system problem". First is like writing an email. Second is like designing the email protocol.

Why "Engineering" Instead of "Design"

Other people call this "context design" or "prompt craftsmanship". Karpathy deliberately picked the word engineering — that's a signal:

Measurable — context quality runs through an eval set of 1000 queries, not "I think it looks good"
Reusable — context selection strategy is code, drops into the next project
Can break — one deploy tweaks retrieval threshold, accuracy drops 15% — incident, not taste
Trade-offs explicit — 5 more passages vs cost 2x vs latency +800ms, pick one

Anthropic's prompt caching docs — the 5-minute TTL is engineering trade-off. A cache hit saves 90% on cost, but past 5 minutes you rebuild it.

What's Coming Across the Next 9 Chapters

#	Chapter	What
2	Boundary with Prompt Engineering	5 context layers debugging
3	Context Selection — retrieval ≠ correct answer	Attention decay + Lost in the Middle
4	Token Budget — split 200K	5 layers fight for the pool
5	Rerank — retrieval to selection	bi-encoder + cross-encoder + LLM-judge
6	Agent Memory three-layer	scratchpad / working / persistent
7	Context cost of tool calls	Tool schema tokens, MCP dynamic discovery
8	Cursor / Claude Code / Cline compared	80% experience gap is context strategy
9	Multi-agent context isolation	sub-agent + summary-back
10	Build production RAG in 7 days	Eval set, monitoring, checklist

Every chapter unpacks a real engineering problem with code + JR omni-report cases.

Takeaway

Prompt is "what you say". Context is "what the LLM sees". Between "what you say" and "what the LLM sees" sits 4 more layers — system prompt, tool definitions, memory, retrieval. Those 4 layers are where 80% of the engineering work in a production LLM app lives.

References

Karpathy. (2025-06-25). Tweet on context engineering.
Tobi Lutke. (2025-06-18). Tweet on context engineering.
Anthropic. (2024-12-20). Building Effective Agents.
Liu et al. (2023-07-06). Lost in the Middle. arXiv:2307.03172.
Simon Willison. (2025-06-27). Context engineering.
Anthropic. Prompt caching docs.

Production case: JR Academy omni-report — context engineering across 17 routines.

📚 相关资源

❓ 常见问题

关于本章主题最常被搜索的问题，点击展开答案

Context Engineering 和 Prompt Engineering 的区别是什么？

Context engineering 包含 5 层：system prompt、工具定义、memory、检索结果、user input。Prompt engineering 只优化第 5 层措辞；context engineering 优化全部 5 层怎么组装。Karpathy 2025-06-25 推文：prompt 只是 context 一小片。

为什么 2025 年才有这个词？

RAG 和 Agent 普及后，LLM 单次输入动辄 50K-200K token、来自多个来源，context 选/排/压缩变成独立工程问题。Tobi Lutke (Shopify CEO) 2025-06-18 推、Karpathy 2025-06-25 接力，词才落地。

学这个之前要先学 Prompt Engineering 吗？

要。Prompt 是 context 5 层中的第 5 层，跳过 prompt 基础学 context engineering 等于学钢琴不识谱。先走完 JR Academy 的 Prompt 大师方向。

Context Engineering 是不是又一个 hype 词？

不是。Anthropic 2024-12-20 发布的「Building Effective Agents」blog 已经在做 context 工程化，只是没命名。Cursor / Claude Code / Cline 调同一模型体验差 80%，差异全在 context 策略。

学完整个方向自己跑一遍 demo 大概要花多少钱？

$15-25 总成本：Anthropic API $5-10（10 章 demo + 第 10 章 7 天 RAG 实战）、Cohere rerank $2-5、向量库 Pinecone/Qdrant 免费档够用。本地跑 chunking + eval 脚本不烧钱，CPU 笔记本即可。

我每天能学 1 小时，多久能把 10 章走完？

3-4 周：10 章每章 15-25 分钟阅读 + 30-45 分钟动手验证，加第 10 章 7 天 RAG 实战。每天 1 小时节奏走 21 天，每天 2 小时压到 14 天。建议把第 10 章实战集中在最后一周做，前面章节边读边攒理解。

不学 context engineering，直接 fine-tune 一个模型行不行？

不行。Fine-tune 解决「风格 / 格式 / 领域语气」，解决不了「实时数据 / 工具调用 / 多轮记忆 / 大文档检索」——这些全是 context engineering 的活。OpenAI 和 Anthropic 官方文档都明确推荐：先做 RAG + prompt，证明不够再 fine-tune，95% 场景根本走不到 fine-tune。