25
安全与威胁建模
Security threat modeling ensures LLM systems don’t become attack surfaces.
1) Common Threats
- Prompt injection: user/asset tries to override instructions or exfiltrate data.
- Data exfiltration: leaking PII/secrets via outputs or tool calls.
- Abuse: generating harmful/toxic content; bypassing policies.
- Supply chain: compromised tools, models, or dependencies.
2) Trust Boundaries
- UI ↔ backend ↔ LLM provider ↔ tools/datastores.
- Multi-tenant isolation: tenant_id filters everywhere; no cross-tenant context.
- Frontend inputs are untrusted; documents/uploads are untrusted.
3) Controls
- Input filters: size/type limits; strip HTML/script; sanitize URLs.
- Output filters: secrets/PII regex; safety classifiers; allowlist formats.
- Tool allowlists: only approved domains/APIs; sandbox code; rate-limit tools.
- Memory hygiene: redact before storage; TTL; encrypt at rest; region-aware.
4) Prompt Safety
- System prompt hardening: state scope/limits; refuse to follow user overrides.
- Context tagging: label user-provided text; instruct model not to treat as instructions.
- Provenance: include source IDs; separate instructions from content blocks.
5) Secrets & Keys
- Never ship provider keys to clients; use server-side proxy.
- Rotate keys; scope by env/tenant; least privilege.
- Don’t log secrets or raw prompts with PII.
6) Tooling Risks
- Code exec: sandbox, time/memory/IO limits; no network unless required.
- Web search/scrape: URL allowlist/denylist; fetch via server, not model.
- DB tools: parameterized queries; RBAC per tenant; audit queries.
7) Monitoring & Response
- Alerts for spikes in refusals, safety filter hits, 429/5xx, and unusual token use.
- Incident playbooks for suspected exfiltration or jailbreak; revoke keys, rotate creds.
- Logging: trace IDs, tool calls (scrubbed), safety decisions; avoid raw user content where possible.
8) Testing & Red Teaming
- Adversarial prompts: jailbreaks, injections, self-referential instructions.
- Data exfil tests: attempt to pull secrets/PII; ensure refusal.
- Tool abuse tests: malicious URLs, SQL injection payloads, resource exhaustion.
9) Minimal Checklist
- Hardened system prompt + context tagging + input/output filters.
- Tenant-scoped data/tools; allowlists; sandbox risky tools.
- Keys server-side only; rotation + audit; alerts on anomalies.