07

Context Engineering & Memory

⏱️ 35分钟

Context engineering and memory keep LLM responses relevant without blowing token budgets.

1) Goals

Provide just enough context (instructions + facts) for accuracy.
Control cost/latency by trimming or structuring history.
Maintain conversational continuity where needed.

2) Instruction Hierarchy

System: non-negotiable rules (role, language, safety).
Task/User: current request and constraints.
History: only necessary turns; summarize older content.
Tools: function specs and expectations.

3) History Management

Sliding window: keep recent N turns.
Summarization: compress older history into bullets; include IDs/time.
Topical caches: store per-topic summaries; swap in/out as topic changes.
Reset triggers: new topic? re-send core instructions; drop stale history.

4) Context Packing for RAG/Chat

Strict budget: target ≤ 60-70% of context limit; reserve for output.
Ordering: instructions → constraints → retrieved snippets (with IDs) → question.
Deduplicate snippets; group by source; include citation IDs.
Dynamic selection: choose top-k by relevance + recency + source diversity.

5) Structured Facts

Provide facts as bullet lists or key-value blocks, not prose.
Use IDs for each fact for citation/traceback.
For numbers/dates, keep canonical units and formats.

6) Session Memory Patterns

Short-term: recent dialog + working set.
Long-term: vector or key-value store of facts/preferences; retrieve by query + tenant/user.
Ephemeral: auto-expire or rotate; respect privacy/PII limits.

7) Safety & Leakage Prevention

Drop user-provided prompt fragments from summaries to avoid prompt injection persistence.
Redact secrets/PII before storing/retrieving.
Tag data by tenant/user/region; filter on retrieval.

8) Testing & Validation

Token audits: measure context size under typical/peak conditions.
Regression checks: ensure core instructions remain present after packing.
Topic-switch tests: verify summaries and resets behave.

9) Minimal Checklist

Instruction hierarchy enforced; core rules always included.
History trimmed/summarized with IDs; budgeted context ≤ 70% of limit.
Retrieved snippets deduped, cited, and filtered by tenant.

📚 相关资源

OpenAI API 文档