logo
25

安全与威胁建模

⏱️ 35分钟

Security threat modeling ensures LLM systems don’t become attack surfaces.

1) Common Threats

  • Prompt injection: user/asset tries to override instructions or exfiltrate data.
  • Data exfiltration: leaking PII/secrets via outputs or tool calls.
  • Abuse: generating harmful/toxic content; bypassing policies.
  • Supply chain: compromised tools, models, or dependencies.

2) Trust Boundaries

  • UI ↔ backend ↔ LLM provider ↔ tools/datastores.
  • Multi-tenant isolation: tenant_id filters everywhere; no cross-tenant context.
  • Frontend inputs are untrusted; documents/uploads are untrusted.

3) Controls

  • Input filters: size/type limits; strip HTML/script; sanitize URLs.
  • Output filters: secrets/PII regex; safety classifiers; allowlist formats.
  • Tool allowlists: only approved domains/APIs; sandbox code; rate-limit tools.
  • Memory hygiene: redact before storage; TTL; encrypt at rest; region-aware.

4) Prompt Safety

  • System prompt hardening: state scope/limits; refuse to follow user overrides.
  • Context tagging: label user-provided text; instruct model not to treat as instructions.
  • Provenance: include source IDs; separate instructions from content blocks.

5) Secrets & Keys

  • Never ship provider keys to clients; use server-side proxy.
  • Rotate keys; scope by env/tenant; least privilege.
  • Don’t log secrets or raw prompts with PII.

6) Tooling Risks

  • Code exec: sandbox, time/memory/IO limits; no network unless required.
  • Web search/scrape: URL allowlist/denylist; fetch via server, not model.
  • DB tools: parameterized queries; RBAC per tenant; audit queries.

7) Monitoring & Response

  • Alerts for spikes in refusals, safety filter hits, 429/5xx, and unusual token use.
  • Incident playbooks for suspected exfiltration or jailbreak; revoke keys, rotate creds.
  • Logging: trace IDs, tool calls (scrubbed), safety decisions; avoid raw user content where possible.

8) Testing & Red Teaming

  • Adversarial prompts: jailbreaks, injections, self-referential instructions.
  • Data exfil tests: attempt to pull secrets/PII; ensure refusal.
  • Tool abuse tests: malicious URLs, SQL injection payloads, resource exhaustion.

9) Minimal Checklist

  • Hardened system prompt + context tagging + input/output filters.
  • Tenant-scoped data/tools; allowlists; sandbox risky tools.
  • Keys server-side only; rotation + audit; alerts on anomalies.

📚 相关资源