logo

Context Engineering

From Prompt to Context — the next-gen LLM engineering Karpathy named

👤For: Engineers who built RAG and found "retrieval works, LLM still wrong" / Developers tortured by context window overflow when building Agents / Tech leads wanting to understand UX gap between Cursor / Claude Code / Cline / Learners finishing prompt engineering and ready for the engineering layer
⏱️3-4 weeks
📊Intermediate

On 2025-06-25, Andrej Karpathy posted a tweet that renamed the entire LLM application layer: he proposed switching from "prompt engineering" to "context engineering." The reason — prompt is just the few sentences you type into ChatGPT, but any industrial-strength LLM app feeds the model far more: system instructions, tool definitions, retrieval results, conversation history, user input. All of that combined is context, and prompt is only a small slice.

This track does not teach "magic templates." It teaches you how to decide what goes in, what stays out, in what order, how to compress, and how to remember — when every LLM call carries 50K to 200K tokens of context. That is an engineering problem, not a wording problem.

10 chapters organized around real engineering pain: how context selection works, how to allocate token budget, how Agent memory is layered, where exactly Cursor and Claude Code's context strategies diverge, and how to build a production RAG with eval set in 7 days. Each chapter is grounded in real papers, Anthropic's official docs, and JR omni-report production cases — no toy examples.


30-Second Quick Start

Below is the actual prompt scaffold from one of JR omni-report's production routines. Read it, then ask yourself: which parts are prompt, which parts are context engineering?

# Phase 0:准备 + 读上游 1. 读 PRD_AI_VISIBILITY.md 了解格式 2. ls ai-visibility/ 看历史(拿上周数据做环比) 3. TZ='Australia/Brisbane' date +%Y-%m-%d → $DATE # Phase 1:写骨架 + commit Write ai-visibility/$DATE.md 骨架(10 个 section _TBD_ 占位) commit + push: feat(ai-visibility): scaffold $DATE # Phase 2:4 batch × 5 query × 2 layer 每 batch 处理 5 个 query,每个 query 跑两层测试(Web + LLM 自答), 做完立即 Edit 对应表格 + commit + push。 ...

The whole prompt looks like one long instruction — but what it actually does: (1) split task into 6 phases with forced commits to avoid stream idle timeout; (2) inject upstream data via ls commands as context; (3) parameterize output paths so the LLM cannot hallucinate them. None of this is prompt wording — it is context budget + selection + scaffolding. That is context engineering.


What You Will Learn

In this tutorial, you will learn:

  • Look at any LLM app and decompose its context into 5 layers (system / tools / memory / retrieval / user), and identify which layer breaks first
  • When building RAG, stop optimizing recall alone — use rerank + LLM-as-judge to turn recall into selection
  • When building Agents, calculate token cost per tool schema, decide which tool to lazy-load, and know when to isolate context with a sub-agent
  • Understand the engineering reason Cursor / Claude Code / Cline diverge despite calling the same models — and pick the right one for your team
  • Build a production RAG with eval set in 7 days, not a toy demo


Chapter Overview

Quick preview by section - jump directly to what interests you.

Section
Introduction

In June 2025 Karpathy and Tobi Lutke renamed prompt engineering to context engineering. What changes, why it is engineering, and how it relates to RAG / Agents

3 lessonsReading / Visual
Enter Introduction
Section
Engineering

5 context layers fight for the same 200K pool. Anthropic's 4-tier token pricing (cache hit 1× / output 50×) + 5 budgeting techniques (cache stable layers, summarize history, lazy-load tools, rerank retrieval, cap output)

4 lessonsReading / Visual
Enter Engineering
Section
Application

Same model, same task — 80% experience gap comes entirely from context strategy. Cursor uses vector RAG / Claude Code uses agentic search / Cline uses auto sub-task. Benchmark: Claude Code uses 5.5× fewer tokens than Cursor

3 lessonsReading / Visual
Enter Application