27
部署与成本优化
Deploying LLM features requires resilient rollout, multi-provider strategies, and cost discipline.
1) Deployment Patterns
- Config-driven: prompts/models/routing in config, reload without redeploy.
- Blue-green / canary: ship new configs to a slice; rollback fast.
- Feature flags: per-tenant/user gating for risky features.
2) Multi-Provider Strategy
- Aliases:
chat-default,embed-defaultmapped to providers. - Fallback: primary fails → secondary; log fallback usage.
- Regional routing: keep data in-region if required; respect compliance.
- Capability matrix: track context length, tool support, multimodal support per model.
3) Cost Controls
- Model tiers: use small/cheap by default; escalate to strong models only when needed.
- Max tokens: cap input/output; trim history; use summarization for long threads.
- Caching: deterministic tasks cached; retrieval cache for repeated queries.
- Budgets & alerts: per-env/per-tenant daily budgets; alert on spikes.
4) Performance & Reliability
- Connection pooling; HTTP/2; gzip/br compression where allowed.
- Retries/backoff + circuit breakers for flaky providers.
- Warm paths: preflight DNS/TLS; keep-alive to reduce cold latency.
5) Compliance & Data Handling
- Env-separated keys; no prod keys in dev.
- Data residency: route to compliant regions; scrub PII before sending.
- Logging hygiene: avoid storing sensitive payloads; hash IDs.
6) Rollout Checklist
- Health + latency SLOs monitored; alerts on 5xx/429 surge.
- Kill switch ready per model/prompt/config.
- Backward compatibility: schema/version tags in requests; old clients still work.
- Post-deploy review: cost delta, latency delta, error delta.