24
工作流与自动化
Workflow automation covers long-running jobs, retries, and state so LLM apps stay reliable beyond single requests.
1) When You Need Workflows
- Long tasks: multi-step research, batch summarization, doc pipelines.
- Tool calls: scrape → parse → embed → answer.
- Human-in-the-loop: review or labeling steps.
2) Building Blocks
- Queue + workers: decouple ingestion and processing; control concurrency.
- Scheduler: cron-like triggers for refresh/reindex.
- State machine: explicit step states (pending/running/success/failed/needs_review).
- Idempotency: task IDs; safe to retry.
- Dead letters: capture poison messages; alert for triage.
3) Patterns for LLM Workflows
- Fan-out/fan-in: split a corpus → parallel summarize → merge.
- Map-reduce summarization: chunk -> summarize -> synthesize.
- Tool+LLM loop: detect needed tool, call tool, feed result back to LLM.
- Checkpointing: persist intermediate context for resume after failure.
4) Reliability & Timeouts
- Step timeouts per tool/LLM call; enforce global SLA.
- Retries with backoff; cap attempts; mark for human review after N failures.
- Circuit breakers: pause tool/model that flaps; route to fallback.
5) Observability
- Trace per task: task_id, steps, attempts, durations.
- Metrics: success rate, P95 per step, retries, DLQ size, queue lag.
- Alerts: queue lag high, DLQ growth, success rate drop.
6) Human-in-the-Loop (HITL)
- Surfaces: review UI for flagged tasks (low confidence, schema fail).
- Capture feedback: corrections feed back to models/prompts/evals.
- Prioritize: SLA tiers; important tasks bypass queues or get higher worker count.
7) Data & Storage
- Store intermediate artifacts (parsed text, embeddings, tool outputs) with TTL if possible.
- Version data: record source doc version and processing code version.
- Clean-up jobs: expire temp data and logs per policy.
8) Minimal Checklist
- Idempotent steps + request IDs.
- Per-step timeouts + retries + DLQ.
- Metrics/alerts for lag, retries, failures.
- HITL path for low-confidence or repeated failures.