执行服务(Execution Service)中的 coordinator 与 worker 分别负责什么?如何处理 worker 故障以避免任务丢失或重复执行?
What are the responsibilities of the coordinator and workers in the Execution Service? How do you handle worker failures to avoid job loss or duplicate execution?
题目类型: 技术面试题
这是一道技术面试题,常见于澳洲IT公司面试中。
难度: hard
分类: system-design, reliability
标签: execution-service, coordinator, worker, requeue, failure-handling, job-checkpointing
参考答案摘要
TL;DR Execution Service 由 coordinator + worker pool 组成:coordinator 负责从队列分配 job、维护 worker 健康与负载、故障检测与重分配;worker 负责执行 job 并更新 Job Store。worker 故障时需要识别是 pending 还是 in-progress:pending 直接 re-queue,in-prog...
本题提供 STAR 原则详细解答和技术解析,登录匠人学院学习中心即可查看完整答案。