Context Engineering

From Prompt to Context — the next-gen LLM engineering Karpathy named

👤For: Engineers who built RAG and found "retrieval works, LLM still wrong" / Developers tortured by context window overflow when building Agents / Tech leads wanting to understand UX gap between Cursor / Claude Code / Cline / Learners finishing prompt engineering and ready for the engineering layer

⏱️3-4 weeks

📊Intermediate

On 2025-06-25, Andrej Karpathy posted a tweet that renamed the entire LLM application layer: he proposed switching from "prompt engineering" to "context engineering." The reason — prompt is just the few sentences you type into ChatGPT, but any industrial-strength LLM app feeds the model far more: system instructions, tool definitions, retrieval results, conversation history, user input. All of that combined is context, and prompt is only a small slice.

This track does not teach "magic templates." It teaches you how to decide what goes in, what stays out, in what order, how to compress, and how to remember — when every LLM call carries 50K to 200K tokens of context. That is an engineering problem, not a wording problem.

10 chapters organized around real engineering pain: how context selection works, how to allocate token budget, how Agent memory is layered, where exactly Cursor and Claude Code's context strategies diverge, and how to build a production RAG with eval set in 7 days. Each chapter is grounded in real papers, Anthropic's official docs, and JR omni-report production cases — no toy examples.

30-Second Quick Start

Below is the actual prompt scaffold from one of JR omni-report's production routines. Read it, then ask yourself: which parts are prompt, which parts are context engineering?

# Phase 0：准备 + 读上游
1. 读 PRD_AI_VISIBILITY.md 了解格式
2. ls ai-visibility/ 看历史（拿上周数据做环比）
3. TZ='Australia/Brisbane' date +%Y-%m-%d → $DATE

# Phase 1：写骨架 + commit
Write ai-visibility/$DATE.md 骨架（10 个 section _TBD_ 占位）
commit + push: feat(ai-visibility): scaffold $DATE

# Phase 2：4 batch × 5 query × 2 layer
每 batch 处理 5 个 query，每个 query 跑两层测试（Web + LLM 自答），
做完立即 Edit 对应表格 + commit + push。
...

The whole prompt looks like one long instruction — but what it actually does: (1) split task into 6 phases with forced commits to avoid stream idle timeout; (2) inject upstream data via ls commands as context; (3) parameterize output paths so the LLM cannot hallucinate them. None of this is prompt wording — it is context budget + selection + scaffolding. That is context engineering.

What You Will Learn

In this tutorial, you will learn:

✓Look at any LLM app and decompose its context into 5 layers (system / tools / memory / retrieval / user), and identify which layer breaks first
✓When building RAG, stop optimizing recall alone — use rerank + LLM-as-judge to turn recall into selection
✓When building Agents, calculate token cost per tool schema, decide which tool to lazy-load, and know when to isolate context with a sub-agent
✓Understand the engineering reason Cursor / Claude Code / Cline diverge despite calling the same models — and pick the right one for your team
✓Build a production RAG with eval set in 7 days, not a toy demo

Chapter Overview

Quick preview by section - jump directly to what interests you.

Section

Introduction

In June 2025 Karpathy and Tobi Lutke renamed prompt engineering to context engineering. What changes, why it is engineering, and how it relates to RAG / Agents

3 lessonsReading / Visual

Enter Introduction

Section

Engineering

5 context layers fight for the same 200K pool. Anthropic's 4-tier token pricing (cache hit 1× / output 50×) + 5 budgeting techniques (cache stable layers, summarize history, lazy-load tools, rerank retrieval, cap output)

4 lessonsReading / Visual

Token Budget — How to Allocate Your 200K Window20 min
Rerank — Turning Recall into Selection25 min
Agent Memory — Three Layers and Their Tool Stacks25 min
... 1 more lessons

Enter Engineering

Section

Application

Same model, same task — 80% experience gap comes entirely from context strategy. Cursor uses vector RAG / Claude Code uses agentic search / Cline uses auto sub-task. Benchmark: Claude Code uses 5.5× fewer tokens than Cursor

3 lessonsReading / Visual

Enter Application

🪽

Hermes Agent

Build your own Agent on the open-source Nous Hermes model

View details →

🤖

AI Engineer

Move from using AI to building with it

View details →

🖼️

AI 视觉创作

gpt-image-2 deep dive · scale across 9 social platforms

View details →

30-Second Quick Start

What You Will Learn

Chapter Overview

You might also like

Hermes Agent

AI Engineer

AI 视觉创作