logo
System Design Case Studies

Design WhatsApp

case study: WhatsApp

我们来设计一个类似 WhatsApp 的即时通讯服务,对标 Facebook MessengerWeChat

What is WhatsApp?

WhatsApp 是即时通讯应用,覆盖 180+ 国家、20 亿用户,并支持 web。

Requirements

Functional requirements

  • 一对一聊天
  • 群聊(最多 100 人)
  • 支持文件分享(图片、视频等)

Non-functional requirements

  • 高 availability,低 latency
  • 可扩展、高效

Extended requirements

  • Sent / Delivered / Read 回执
  • Last seen
  • Push notifications

Estimation and Constraints

注意:和面试官确认规模假设。

Traffic

假设 50M DAU,每人每天发 10 条消息给 4 个人,总计 2B messages/day:

$$ 50 \space million \times 40 \space messages = 2 \space billion/day $$

假设 5% 为媒体文件,约 100M files/day:

$$ 5 \space percent \times 2 \space billion = 100 \space million/day $$

RPS

$$ \frac{2 \space billion}{(24 \space hrs \times 3600 \space seconds)} = \sim 24K \space requests/second $$

Storage

消息 100 bytes/条:

$$ 2 \space billion \times 100 \space bytes = \sim 200 \space GB/day $$

媒体文件 100KB/个:

$$ 100 \space million \times 100 \space KB = 10 \space TB/day $$

10 年约 38 PB:

$$ (10 \space TB + 0.2 \space TB) \times 10 \space years \times 365 \space days = \sim 38 \space PB $$

Bandwidth

日入站 10.2 TB:

$$ \frac{10.2 \space TB}{(24 \space hrs \times 3600 \space seconds)} = \sim 120 \space MB/second $$

High-level estimate

TypeEstimate
Daily active users (DAU)50 million
Requests per second (RPS)24K/s
Storage (per day)~10.2 TB
Storage (10 years)~38 PB
Bandwidth~120 MB/s

Data model design

whatsapp-datamodel

usersnamephoneNumber

messagestypecontent、timestamps、chatID/groupID

chats:一对一聊天

users_chats:N:M

groups:群组

users_groups:N:M

选什么 database?

数据模型看似 relational,但可拆成多个服务各自拥有表,避免单库瓶颈。可用 PostgreSQLApache Cassandra

API design

Get all chats or groups

getAll(userID: UUID): Chat[] | Group[]

Get messages

getMessages(userID: UUID, channelID: UUID): Message[]

Send message

sendMessage(userID: UUID, channelID: UUID, message: Message): boolean

Join / Leave group

joinGroup(userID: UUID, channelID: UUID): boolean
leaveGroup(userID: UUID, channelID: UUID): boolean

High-level design

Architecture

采用 microservices

User Service:auth + user info

Chat Service:WebSockets 连接 + 聊天/群聊逻辑(缓存在线连接)

Notification Service:push 通知

Presence Service:last seen

Media Service:文件上传

Inter-service communication

可用 REST/HTTP 或更高效的 gRPC。需要 Service discovery 或 service mesh。

Real-time messaging

Pull model:Long polling

Push model:WebSockets / SSE

Push 更低 latency、更可扩展,WebSockets 支持 full-duplex。

Last seen

用 heartbeat 写入 cache:

KeyValue
User A2022-07-01T14:32:50

也可用“最近操作时间阈值”方式判断离线。

Notifications

消息发送后若用户离线,chat service 将事件写入 message queue,notification service 再调 FCM/APNS。

使用 queue 的原因:best-effort ordering + at-least-once delivery。

Read receipts

通过 ACK 更新 deliveredAt / seenAt。

Design

whatsapp-basic-design

Detailed design

Data Partitioning

Sharding + Consistent hashing

Caching

只缓存旧消息 + pagination。LRU eviction,cache miss 回源。

Media access and storage

使用 object storage / HDFS。WhatsApp 会在用户下载后删除媒体。

CDN

使用 CDN 加速静态内容。

API gateway

多协议(HTTP/WebSocket)建议用 API Gateway

Identify and resolve bottlenecks

whatsapp-advanced-design

增强 resilience:

  • 各服务多实例
  • Load balancers
  • DB read replicas
  • 分布式 cache 多副本
  • API Gateway standby
  • 通知系统用 Kafka / NATS
  • Media 压缩
  • 拆分 group service

相关练习题

Design WhatsApp

暂无相关练习题