07
Context Engineering & Memory
Context engineering and memory keep LLM responses relevant without blowing token budgets.
1) Goals
- Provide just enough context (instructions + facts) for accuracy.
- Control cost/latency by trimming or structuring history.
- Maintain conversational continuity where needed.
2) Instruction Hierarchy
- System: non-negotiable rules (role, language, safety).
- Task/User: current request and constraints.
- History: only necessary turns; summarize older content.
- Tools: function specs and expectations.
3) History Management
- Sliding window: keep recent N turns.
- Summarization: compress older history into bullets; include IDs/time.
- Topical caches: store per-topic summaries; swap in/out as topic changes.
- Reset triggers: new topic? re-send core instructions; drop stale history.
4) Context Packing for RAG/Chat
- Strict budget: target ≤ 60-70% of context limit; reserve for output.
- Ordering: instructions → constraints → retrieved snippets (with IDs) → question.
- Deduplicate snippets; group by source; include citation IDs.
- Dynamic selection: choose top-k by relevance + recency + source diversity.
5) Structured Facts
- Provide facts as bullet lists or key-value blocks, not prose.
- Use IDs for each fact for citation/traceback.
- For numbers/dates, keep canonical units and formats.
6) Session Memory Patterns
- Short-term: recent dialog + working set.
- Long-term: vector or key-value store of facts/preferences; retrieve by query + tenant/user.
- Ephemeral: auto-expire or rotate; respect privacy/PII limits.
7) Safety & Leakage Prevention
- Drop user-provided prompt fragments from summaries to avoid prompt injection persistence.
- Redact secrets/PII before storing/retrieving.
- Tag data by tenant/user/region; filter on retrieval.
8) Testing & Validation
- Token audits: measure context size under typical/peak conditions.
- Regression checks: ensure core instructions remain present after packing.
- Topic-switch tests: verify summaries and resets behave.
9) Minimal Checklist
- Instruction hierarchy enforced; core rules always included.
- History trimmed/summarized with IDs; budgeted context ≤ 70% of limit.
- Retrieved snippets deduped, cited, and filtered by tenant.