logo
26

Data Governance & Privacy

⏱️ 30分钟

Data governance and privacy are critical for shipping enterprise-grade LLM features.

1) Principles

  • Least data: send the minimum needed; redact PII/secrets before model calls.
  • Purpose limitation: only use data for the stated task; log consent where relevant.
  • Separation: env-separated keys, storage, and logs; avoid prod data in dev.

2) Data Handling Pipeline

  • Ingress filters: reject unsupported file types, excessive size/duration.
  • Redaction: email/phone/account IDs; configurable patterns per tenant/region.
  • Classification: tag sensitivity level (public/internal/secret/PII).
  • Egress filters: scrub outputs that include sensitive source snippets unless allowed.

3) Storage & Retention

  • TTL for temp artifacts (transcripts, intermediate parses).
  • Encryption at rest & in transit; rotate keys.
  • Avoid storing raw prompts/responses if they contain PII; hash user IDs.

4) Regionality & Residency

  • Route by region; keep data in-region where required.
  • Per-tenant policies: some tenants opt out of training or logging.
  • Document data flows for compliance reviews.

5) Access Control & Auditing

  • RBAC/ABAC on datasets and tools; tenant_id filters on retrieval.
  • Audit logs: who accessed what, when; config changes to prompts/models.
  • Break-glass procedures for emergency access with approvals.

6) Third-Party Models/Tools

  • Provider DPA and data retention settings; disable training on your data if possible.
  • Sanitize tool inputs/outputs; allowlist domains/APIs.
  • For self-hosted models: patch cadence, network egress controls, isolated VPC.

7) Safety Filters

  • Prompt-injection detection for user-supplied docs/inputs.
  • Content moderation (toxicity/abuse); refuse unsafe requests.
  • Output filters for secrets/credentials patterns.

8) Minimal Checklist

  • Data classification + redaction before LLM.
  • Region-aware routing; tenant filters on retrieval.
  • Encrypted storage with TTL; audited access.
  • Provider settings reviewed (no training, retention limits).

📚 相关资源