Context Engineering
From Prompt to Context — the next-gen LLM engineering Karpathy named
On 2025-06-25, Andrej Karpathy posted a tweet that renamed the entire LLM application layer: he proposed switching from "prompt engineering" to "context engineering." The reason — prompt is just the few sentences you type into ChatGPT, but any industrial-strength LLM app feeds the model far more: system instructions, tool definitions, retrieval results, conversation history, user input. All of that combined is context, and prompt is only a small slice.
This track does not teach "magic templates." It teaches you how to decide what goes in, what stays out, in what order, how to compress, and how to remember — when every LLM call carries 50K to 200K tokens of context. That is an engineering problem, not a wording problem.
10 chapters organized around real engineering pain: how context selection works, how to allocate token budget, how Agent memory is layered, where exactly Cursor and Claude Code's context strategies diverge, and how to build a production RAG with eval set in 7 days. Each chapter is grounded in real papers, Anthropic's official docs, and JR omni-report production cases — no toy examples.
30-Second Quick Start
Below is the actual prompt scaffold from one of JR omni-report's production routines. Read it, then ask yourself: which parts are prompt, which parts are context engineering?
# Phase 0:准备 + 读上游
1. 读 PRD_AI_VISIBILITY.md 了解格式
2. ls ai-visibility/ 看历史(拿上周数据做环比)
3. TZ='Australia/Brisbane' date +%Y-%m-%d → $DATE
# Phase 1:写骨架 + commit
Write ai-visibility/$DATE.md 骨架(10 个 section _TBD_ 占位)
commit + push: feat(ai-visibility): scaffold $DATE
# Phase 2:4 batch × 5 query × 2 layer
每 batch 处理 5 个 query,每个 query 跑两层测试(Web + LLM 自答),
做完立即 Edit 对应表格 + commit + push。
...The whole prompt looks like one long instruction — but what it actually does: (1) split task into 6 phases with forced commits to avoid stream idle timeout; (2) inject upstream data via ls commands as context; (3) parameterize output paths so the LLM cannot hallucinate them. None of this is prompt wording — it is context budget + selection + scaffolding. That is context engineering.
What You Will Learn
In this tutorial, you will learn:
- ✓Look at any LLM app and decompose its context into 5 layers (system / tools / memory / retrieval / user), and identify which layer breaks first
- ✓When building RAG, stop optimizing recall alone — use rerank + LLM-as-judge to turn recall into selection
- ✓When building Agents, calculate token cost per tool schema, decide which tool to lazy-load, and know when to isolate context with a sub-agent
- ✓Understand the engineering reason Cursor / Claude Code / Cline diverge despite calling the same models — and pick the right one for your team
- ✓Build a production RAG with eval set in 7 days, not a toy demo
Chapter Overview
Quick preview by section - jump directly to what interests you.
In June 2025 Karpathy and Tobi Lutke renamed prompt engineering to context engineering. What changes, why it is engineering, and how it relates to RAG / Agents
- What is Context Engineering — Karpathy's Rename15 min
- The Boundary with Prompt Engineering — Tuning Each of the 5 Context Layers20 min
- Context Selection — Why RAG Recall Does Not Equal Accuracy25 min
5 context layers fight for the same 200K pool. Anthropic's 4-tier token pricing (cache hit 1× / output 50×) + 5 budgeting techniques (cache stable layers, summarize history, lazy-load tools, rerank retrieval, cap output)
- Token Budget — How to Allocate Your 200K Window20 min
- Rerank — Turning Recall into Selection25 min
- Agent Memory — Three Layers and Their Tool Stacks25 min
- ... 1 more lessons
Same model, same task — 80% experience gap comes entirely from context strategy. Cursor uses vector RAG / Claude Code uses agentic search / Cline uses auto sub-task. Benchmark: Claude Code uses 5.5× fewer tokens than Cursor
- Cursor / Claude Code / Cline — Comparing Three Tools' Context Strategies20 min
- Multi-Agent Context Isolation20 min
- 7 Days to Build a Production RAG — Practice Roadmap7 days