What BigQuery topics are tested on the PDE exam, and how deeply?

BigQuery is the most heavily tested service on the PDE exam, spanning nearly all domains. Key topics include: partitioned tables (time-based/integer-range) and clustered tables for optimization, materialized views, BigQuery ML for basic model training, Authorized Views for data sharing, slot reservations for cost control, and migration strategies from on-prem data warehouses using BigQuery Data Transfer Service. Focus on query cost optimization and data governance.

How do I choose between Dataflow and Dataproc on the exam?

Dataflow (Apache Beam-based) is ideal for: real-time streaming ETL, serverless auto-scaling, and new unified batch/stream pipelines. Dataproc (managed Hadoop/Spark) is ideal for: migrating existing Hadoop/Spark jobs, using Spark ecosystem tools (MLlib, GraphX), and short-lived batch clusters. Key rule: if the question mentions "existing Spark jobs" or "Hadoop migration," choose Dataproc; if it mentions "new real-time pipeline" or "auto-scaling," choose Dataflow.

How deeply does the PDE exam test ML/AI knowledge?

The exam does not require deep ML algorithm expertise, but you need to understand: when to use BigQuery ML vs Vertex AI AutoML vs custom training, basic feature engineering concepts, data preparation and storage for model training (e.g., Feature Store), and MLOps workflows using Vertex AI Pipelines. Approximately 15% of questions involve data preparation and ML service selection.

What data security and compliance topics are commonly tested?

Key security topics include: Cloud DLP for PII data de-identification and detection, CMEK/CSEK encryption key management, VPC Service Controls to prevent data exfiltration, BigQuery column-level and row-level security, and Data Catalog tags and policy management. These topics account for roughly 15-20% of exam questions.

What is the difference between Pub/Sub and Datastream use cases?

Pub/Sub is a general-purpose messaging service for asynchronous inter-application communication and event-driven architectures, supporting at-least-once and exactly-once delivery, often paired with Dataflow for real-time stream processing. Datastream is a CDC (Change Data Capture) service specifically for capturing real-time changes from MySQL, PostgreSQL, or Oracle databases and syncing to BigQuery, Cloud Storage, or Cloud SQL. Key rule: choose Datastream for database replication, Pub/Sub for application event streams.

What does the PDE exam focus on regarding Cloud Composer (Airflow)?

Cloud Composer questions focus on workflow orchestration: scheduling and dependency management for cross-service data pipelines, basic DAG concepts (Task, Operator, Sensor), and when to choose Cloud Composer vs Cloud Workflows vs Cloud Scheduler. Common question patterns involve: choosing the right orchestration service for pipelines that coordinate multiple GCP services (e.g., Dataflow processing followed by BigQuery loading).

GCP专业级📊 数据

Google Cloud Certified - Professional Data Engineer

验证您在 Google Cloud 上设计、构建和运维数据处理系统的能力。GCP 数据工程领域最权威的专业认证。

Start Practice Browse Learning Path

$200

Exam Fee

Questions

120m

Exam Duration

70/100

Passing Score

✓

Bottom line · Worth it

GCP 数据工程的旗舰证书，多年被 Global Knowledge / Skillsoft 薪资榜列入全球 Top 5 高薪 IT 认证（美国中位数 $165K+ USD），对已经在用 BigQuery / Dataflow 的数据工程师是 ROI 最高的一张证。

MEMBERSHIP

JR Academy Membership

Unlock all certifications, courses & tools at a fraction of the cost

All certification exam prep included
Course discounts up to 50%
AI tools & Chrome extensions
Priority 1-on-1 coaching

View Membership Plans

What this certification covers

This page is structured for quick scanning first: exam format, fit, prep time, and the actual study scope.

Google Cloud Certified - Professional Data Engineer（PDE）是 GCP 认证体系里唯一的数据工程方向 Professional 级证书，考试 $200 USD、50-60 题、120 分钟，有效期 2 年。它和 Professional Cloud Architect（PCA）并列 GCP 最有知名度的两张 Professional 证，但题型侧重完全不同：PCA 考架构选型和 case study，PDE 则死磕 BigQuery、Dataflow（Apache Beam）、Pub/Sub、Dataproc、Cloud Composer（Airflow） 这五大数据管道核心服务。

PDE 真正值得考的原因是薪资数据。Global Knowledge / Skillsoft 的 IT Skills and Salary Report 连续多年把 Google Cloud Professional Data Engineer 列入全球收入最高的 IT 认证前 5 名，美国市场持证人薪资中位数稳定在 $165,000 USD 以上，和 PCA 属于同一梯队，远高于 AWS Data Analytics、Azure DP-203 这些同方向的竞品。原因有两个：一是 GCP 在数据和 ML 生态（BigQuery + Vertex AI + Looker）的工程体验公认比 AWS/Azure 现代一代；二是 BigQuery 是 GCP 公认的"杀手级产品"，Spotify、Snap、Twitter、PayPal、Home Depot 这类大客户的数据平台基本都跑在 BigQuery 上，对 PDE 持证人的供需缺口一直很紧。

考纲 5 大领域：Designing data processing systems（22%）、Ingesting and processing data（25%）、Storing the data（20%）、Preparing and using data for analysis（15%）、Maintaining and automating data workloads（18%）。BigQuery 在所有领域都会出现，粗略估算题目里 40-50% 直接或间接和 BigQuery 相关 — 分区表 vs 聚簇表、slot reservation vs on-demand 计费、materialized view、BigQuery ML、Authorized Views、Column-level / Row-level security — 这些细节不熟会直接翻车。

和 PCA 一样，PDE 的证书有效期只有 2 年，每 24 个月必须重考整张 $200 的考试，没有简化续期选项。官方建议至少有 3 年以上的数据工程行业经验（含 1 年以上在 GCP 上设计和管理数据解决方案），这个不是硬门槛但确实有道理 — PDE 不像 ACE 那种基础证能靠刷题硬过，题目里对 BigQuery 性能调优、Dataflow windowing、Pub/Sub 传递语义的考察都需要真实项目经验才能稳。

You will work with

BigQueryDataflowDataprocPub/SubCloud StorageCloud SQLSpannerBigtableDatastreamCloud ComposerLookerData Catalog

After preparation

获得 Google Cloud 官方专业级数据工程师认证
掌握 GCP 数据处理系统的设计与实施
具备大规模数据摄取、存储和分析的能力
能够自动化和优化数据工作负载

Exam details

Exam Code

PDE

Provider

Google Cloud Platform

Duration

120 minutes

Question Count

60 questions

Passing Score

70/100

Validity

2 years

Exam Fee

$200 USD

Question Types

Single choice, Multiple select, 案例分析题

Languages

English, 日本語, 한국어

Official Page

Open AWS page

Who should take it

Good fit

数据工程师和大数据工程师
ETL/ELT 开发人员
数据架构师
数据分析师希望提升工程化能力
希望转型数据工程方向的软件工程师

Before you start

3 年以上数据工程相关经验（含 1 年以上 GCP 实践）
熟悉 SQL 和至少一种编程语言（Python/Java）
了解大数据处理框架（Hadoop、Spark）
建议先了解 GCP 基础服务

Is it worth it? Career value

Salary ranges, target job titles, and the real career impact of holding GCP Professional Data Engineer.

美国

$155K-220KUSD

澳洲

$140K-195KAUD

新加坡

$125K-185KSGD

英国

$85K-135KGBP

中国（一线）

¥400K-850KCNY

GCP Data EngineerSenior Data EngineerAnalytics EngineerData Platform EngineerBigQuery EngineerETL Developer (GCP)ML Data EngineerData Architect (GCP)数据工程师（GCP 方向）数据平台工程师

数据工程师 = 目前市场上最难招的技术岗位之一

过去三年的 LinkedIn Talent Insights 报告、Dice Tech Salary Report、Burtch Works 的数据科学与工程薪资调查都指向同一个结论：数据工程师的供需缺口比软件工程师、ML 工程师、DevOps 都要紧。原因是数据工程介于"写代码"和"懂业务"之间，一个合格的数据工程师既要会 SQL + Python + Scala，也要懂分布式计算原理（Spark、Beam、windowing），还要能和业务方撕需求口径 — 这种复合技能的人天然稀少。

在 GCP 生态里这个稀缺性被进一步放大。Global Knowledge / Skillsoft 的年度 IT Skills and Salary Report 从 2021 年开始连续把 Google Professional Data Engineer 列入全球 Top 5 高薪 IT 认证，美国持证人薪资中位数稳定在 $165K USD 以上。Dice 2024 年的技术薪资报告里，GCP Data Engineer 的平均薪资比 AWS Data Engineer 高约 8-12%，比 Azure DP-203 持证人高 10-15%。

最适合考 PDE 的几类人：

已经在用 BigQuery 的数据工程师 / 分析工程师：你公司的数仓就在 BigQuery 上，日常写 SQL + dbt + Airflow，PDE 是把你日常经验"体系化 + 对外可证明"的最短路径。考完直接能写进简历和 LinkedIn 标题，换工作议价立刻能用。
ETL 开发转数据工程的传统开发：以前用 Informatica / SSIS / Talend 做 ETL，现在公司要上云，PDE 是从 "legacy ETL 开发" 到 "cloud data engineer" 的最快跳板。Dataflow 的 Apache Beam 模型和传统 ETL 的思路差别不小，系统学一遍 PDE 能把 gap 补上。
BI 分析师 / 数据分析师想往工程方向走：你已经会写 SQL 和 Looker，但想接触管道和自动化，PDE 覆盖的 Cloud Composer、Dataflow、Pub/Sub 这些是分析师跨入工程的核心门槛。
AWS / Azure 数据工程师想开第二云：多云数据团队在大厂越来越常见（比如 Netflix 同时用 AWS + GCP），加一张 PDE 能直接进"multi-cloud data engineer"的高薪池。

不太建议考的人：

完全没有数据工程经验的软件工程师：PDE 题目里对 windowing、watermark、slot 配额、partition vs cluster 的考察深度已经超出"刷题能过"的范围，没有真实项目经验会学得非常痛苦。建议先在工作里用半年 BigQuery + Dataflow 再考。
目标是 ML 研究 / 数据科学的人：PDE 是工程证，不是建模证。它考的是"怎么把数据管道跑稳、跑快、跑便宜"，不考模型调参、算法选择、论文复现。如果你想做 ML scientist，应该看 Vertex AI 相关的 ML Engineer 认证。
短期内不打算碰 GCP 的 AWS 重度用户：和 PCA 一样，PDE 2 年一重考 $200，不是深度绑定 GCP 的人这笔钱投入产出比不高。

Exam domains

Use this breakdown to decide where to spend study time first instead of reading chapters evenly.

Content Distribution

22%

1. 设计数据处理系统

Design Data Processing Systems

Core Knowledge

DataflowDataprocPub/SubBigQueryApache Beam

25%

2. 数据摄取与处理

Ingest and Process Data

Core Knowledge

Pub/SubDataflowDataprocDatastreamData Fusion

20%

3. 数据存储

Store the Data

Core Knowledge

BigQueryCloud SQLSpannerBigtableFirestore

15%

4. 数据准备与分析

Prepare and Use Data for Analysis

Core Knowledge

Data CatalogLookerDLPDataplexBigQuery ML

18%

5. 数据工作负载维护与自动化

Maintain and Automate Data Workloads

Core Knowledge

Cloud ComposerCloud MonitoringIAMEncryption

Study preparation

With hands-on AWS

6-10 weeks

From scratch

12-16 weeks

Daily pace

1.5-2 hours/day

Learning path preview

7 chapters

PDE 考试概述与备考指南

45 min

设计数据处理系统

150 min

数据摄取与处理

140 min

数据存储

120 min

数据准备与分析

100 min

数据工作负载维护与自动化

120 min

+ 1 more chapters inside the full path

Step-by-step preparation

A concrete week-by-week plan from past test-takers — not generic advice.

第一阶段：BigQuery 硬核动手（3-4 周）

PDE 考纲里 BigQuery 的权重远超其他任何服务，备考第一阶段 **必须从 BigQuery 动手开始**，而不是看视频。开一个 GCP free tier 账户，把公共数据集（bigquery-public-data.stackoverflow、google_trends、covid19）拉出来真实跑几个项目：建分区表（按 DATE(created_at) 分区）、建聚簇表（cluster by user_id, country）、对比两种表在同样 SQL 下扫描的字节数（cost）、建一个 materialized view 看 refresh 行为、跑一次 BigQuery ML CREATE MODEL 训练一个 logistic_reg、用 Authorized View 模拟跨项目数据共享。考试里大量 BigQuery 题都是在问"同样的 SQL 下哪种表结构更省钱"，没有真实跑过几次 dry run 看 bytes processed，完全靠死记硬背绝对学不会。

第二阶段：Dataflow + Apache Beam 管道实战（2-3 周）

Dataflow 是 PDE 第二个考察最密的服务，而且是 AWS/Azure 数据工程师最容易踩坑的一块 — Apache Beam 的编程模型和 Spark 完全不是一个逻辑。重点理解四件事：**(1) Windowing** — Fixed window、Sliding window、Session window 各自的适用场景；**(2) Watermark 和 late data** — 怎么处理迟到数据，withAllowedLateness 怎么配；**(3) Trigger** — Event time trigger vs processing time trigger，EarlyFirings / LateFirings 的组合；**(4) Streaming vs Batch 统一模型** — 同一段 Beam 代码怎么同时跑批和流。建议跟 Google 官方的 Coursera 课 "Serverless Data Processing with Dataflow" 完整跑一遍，再自己用 Python SDK 写一个读 Pub/Sub → 按 1 分钟窗口聚合 → 写 BigQuery 的小 pipeline。

第三阶段：Pub/Sub + Cloud Composer + 存储选型系统过（2-3 周）

这一阶段把剩下的核心服务全部串起来。**Pub/Sub** 的重点是消息传递语义（at-least-once 是默认，exactly-once 是 2021 年新加的特性需要显式开启）、pull vs push subscription 的选择、dead letter topic 怎么配、消息保留期（默认 7 天，最长 31 天）。**Cloud Composer** 的重点是什么时候用它 vs Cloud Workflows vs Cloud Scheduler — Composer 是全托管 Airflow，适合复杂 DAG；Workflows 适合 serverless 的简单编排；Scheduler 只是 cron 替代品。**存储选型**是考点密集区：Cloud SQL（中小规模 OLTP）、Spanner（全球一致性 OLTP，贵）、Bigtable（低延迟、高吞吐的 NoSQL，适合 IoT 时序数据和推荐系统 feature store）、Firestore（移动端 app 的文档数据库）、BigQuery（OLAP 分析）— 每种都要能在 30 秒内说出适用和不适用场景。

第四阶段：模拟考试 + 真实案例复盘（2 周）

最后阶段不学新东西，专门做题和看案例。**题库**：Whizlabs 的 PDE 题库（200+ 题，质量尚可）、ExamTopics 的 PDE 讨论区（免费但质量参差，必须看评论区的 discussion 纠正答案）、Google 官方的 PDE sample questions（免费 20 题，难度最接近真实考试）。**真实案例**：去 Google Cloud Next 的 YouTube 频道看几个 customer story —— Spotify 的 BigQuery 数据平台、Twitter 迁移到 GCP 的 Hadoop → Dataproc 路线、PayPal 的 Dataflow 欺诈检测 pipeline。PDE 有不少题是从真实客户架构反推出来的，看过这些案例后"哪种架构是正确答案"会变得很直观。目标：最后 3 次模考正确率稳定在 85%+ 再去考试。

Real test-taker experiences

What it actually took for real candidates to pass — prep time, scores, and lessons learned.

我们公司数仓完全跑在 BigQuery + dbt + Airflow 上，所以考 PDE 对我来说大部分是"把日常写的东西对一遍考纲"。最难的反而是 Dataflow 那块 — 我之前没写过 Apache Beam，windowing 和 watermark 完全是新概念，花了 2 周硬啃 Coursera 课。考完工资从 $135K 涨到 $162K AUD，HR 的原话是"澳洲 GCP 数据工程师简直是 unicorn"。

S. KumarPass

Data Engineer · 澳洲 fintech 创业公司 · 2 年 BigQuery 生产经验 · 7 weeks prep

我干了 8 年传统 ETL，主要用 Informatica PowerCenter。公司上云要求半年内必须会 Dataflow + BigQuery，我就直接冲 PDE 了。最痛苦的是 Apache Beam 的编程模型，和 Informatica 的图形化拖拽完全不是一个世界，前 3 周学得想放弃。后来硬着头皮跟着 Google 官方的 Dataflow quickstart 写了 5 个小 pipeline 才开窍。PDE 过了之后内部转岗成功，title 从 ETL Developer 变成 Data Engineer，package 涨了 35%。

D. 张Pass

Informatica ETL 开发转 GCP Data Engineer · 上海 · 12 weeks prep

我之前只会写 SQL 和用 Looker，完全没碰过数据管道。备考 PDE 那 3 个半月基本重新学了一遍职业基础 — Apache Beam、Airflow DAG、Pub/Sub 消息语义、分布式系统的一致性模型，每个都是新世界。第一次模考 52% 差点崩溃。但熬过来之后发现：这些东西本来就是数据工程师的日常词汇，我之前只是没接触过。考过之后我从 BI team 内部转到 data platform team，虽然基础工资没变，但拿到了 engineering track 的股权包，长期涨幅会好很多。真心建议分析师朋友认真考一次，它会把你从"会写 SQL"推到"懂数据系统"这条线上去。

J. ThompsonPass

BI Analyst 想往数据工程方向走 · 伦敦 · 14 weeks prep

Certification comparison

	GCP Professional Data Engineer	GCP Associate Cloud Engineer	GCP Professional Cloud Architect
Provider	GCP	GCP	GCP
Level	专业级	助理级	专业级
Fee	$200	$125	$200
Duration	120 min	120 min	120 min
Question count	60	50	60
Validity	2 yrs	2 yrs	2 yrs

Study tips and common mistakes

💡

**在 GCP free tier 上真实跑过所有核心服务的 quickstart** — BigQuery（加载 public dataset 跑几个查询对比 dry run 字节数）、Dataflow（用 WordCount template 跑一个批处理）、Pub/Sub（创建 topic → subscription → gcloud publish 几条消息）、Cloud Composer（建一个最小 environment 跑一个 DAG）、Dataproc（起一个临时集群跑一个 PySpark job）。亲手跑过的服务考试里几乎不会选错。

💡

**看到 "petabyte-scale SQL analytics" → BigQuery**；**看到 "low latency key-value + time series + IoT" → Bigtable**；**看到 "globally consistent relational + horizontal scale" → Spanner**；**看到 "managed Hadoop / Spark migration" → Dataproc**；**看到 "streaming ETL + auto-scaling + Apache Beam" → Dataflow**；**看到 "CDC from MySQL/Oracle to BigQuery" → Datastream**；**看到 "workflow orchestration with DAG" → Cloud Composer**。这组关键词映射能解决至少 30% 的题目。

💡

**BigQuery 成本优化三板斧先想**：(1) 能不能用分区表裁剪扫描范围？(2) 能不能用聚簇表减少 shuffle？(3) 查询频率高的话能不能用 materialized view 预计算？考题里只要提到"reduce query cost" 或 "improve performance"，这三个思路基本能覆盖正确答案。

💡

**Dataflow 题看到"late data"立刻想 windowing + watermark + trigger 三件套** — withAllowedLateness 控制允许迟到多久，trigger 控制何时输出结果，accumulatingFiredPanes vs discardingFiredPanes 控制累积还是丢弃。这三个参数的组合是 Dataflow 最常考的细节。

💡

**考场时间分配**：50-60 题 / 120 分钟 ≈ 2 分钟/题，但案例分析题每题可能要 3-4 分钟。建议第一遍以 90 秒/题的节奏过一遍，标记不确定的题，第二遍回来细看。剩下 15-20 分钟留给标记题和案例分析题。

💡

**考完立刻把 2 年后的重考日期写进日历** — PDE 2 年过期是硬规定，到期前 60 天 Google 会发邮件但很多人错过。过期后 LinkedIn 的认证栏会自动显示 Expired，比没有证书更糟。

⚠️

**分不清 BigQuery on-demand 和 slot reservation 的成本模型** — on-demand 是按扫描字节数收费（$6.25/TB，scanning 100TB 就是 $625 一次查询），slot reservation 是按固定 slot 数量月付（100 slots ≈ $2000/月），对**高频、稳定负载**的团队 slot 便宜很多；对**偶尔大查询、大部分时间闲置**的团队 on-demand 更划算。考试经常出"每天跑 X TB、每月 Y 次"的题问选哪种更便宜，必须会算。

⚠️

**Dataflow 和 Dataproc 选型搞反** — 看到 "Hadoop / Spark" 立刻想到 Dataflow 的大有人在，但 Dataproc 才是 GCP 的托管 Hadoop/Spark。判断标准：题干出现 "existing Spark job"、"lift and shift Hadoop cluster"、"MLlib / GraphX"、"need HDFS compatibility" → Dataproc；出现 "new pipeline"、"unified batch and streaming"、"auto-scaling"、"Apache Beam" → Dataflow。

⚠️

**分区表和聚簇表混为一谈** — 分区表（partition）是**物理分离**数据文件，按时间 / 整数 / ingestion time 分区，查询 where 条件命中分区时只扫描对应分区；聚簇表（cluster）是**在分区内部按列排序**，最多支持 4 列聚簇，对 where、group by、order by 这些列有显著性能提升。**两者可以同时用**：先按 DATE 分区，再按 user_id cluster，是 BigQuery 数仓最常见的组合。考试题经常出"选一种最省钱的表结构"，正确答案往往是两者组合。

⚠️

**以为 Pub/Sub 默认就是 exactly-once delivery** — Pub/Sub 默认是 **at-least-once**（至少一次，可能重复），2021 年才加了 exactly-once subscription 特性，而且必须显式在创建 subscription 时开启，还有额外限制（单 region 等）。考试里"如何保证消息不重复"的标准答案是：**要么用 exactly-once subscription，要么在消费端用 message_id 做幂等去重**。默认当成 exactly-once 一定错。

⚠️

**忽略 BigQuery 列级 / 行级安全和 Authorized View 的区别** — Authorized View 是让一个 view 对底层表有"继承权限"，从而允许你把 view 共享给没有底层表权限的用户，适合**跨项目数据共享**。Column-level security 是用 Data Catalog policy tag 标记敏感列，对没有权限的用户返回 NULL 或报错。Row-level security 是在表上建 row access policy，不同用户看到不同行。考试里"如何让分析团队只能看到自己部门的数据"是 row-level security，"如何隐藏 SSN 列"是 column-level security + DLP，"如何把脱敏视图共享给外部项目"是 Authorized View — 三个要分清。

⚠️

**Cloud Composer 什么场景都往上套** — Cloud Composer 是全托管 Airflow，**最低配置就要 $300+/月**（即便你不跑任何 DAG），不适合简单的定时任务。如果只是每天跑一次"从 Cloud Storage 读文件 → 写 BigQuery"，用 Cloud Scheduler + Cloud Function 或 BigQuery scheduled query 就够了，便宜 10 倍。考试题经常在"最省钱的编排方案"上埋坑，Composer 不是默认答案。

⚠️

**忽略 2 年有效期** — 和 PCA 一样 PDE 也是 2 年过期，比 AWS Data Analytics 的 3 年短。没有简化续期，到期必须重考整张 $200 的 PDE。规划预算时记得把这笔"每 24 个月 $200"算进去。

FAQ

Frequently Asked Questions

If you plan to take GCP Professional Data Engineer, start with real practice.

50+ questions, chapter-by-chapter learning, mock exams, wrong-question review, and AI tutor support live in the exam page.

Go to exam prep

From $39 · 2 free chapters

Google Cloud Certified - Professional Data Engineer

JR Academy Membership

What this certification covers

You will work with

After preparation

Exam details

Who should take it

Good fit

Before you start

Is it worth it? Career value

Exam domains

Content Distribution

1. 设计数据处理系统

2. 数据摄取与处理

3. 数据存储

4. 数据准备与分析

5. 数据工作负载维护与自动化

Study preparation

With hands-on AWS

From scratch

Daily pace

Learning path preview

Step-by-step preparation

第一阶段：BigQuery 硬核动手（3-4 周）

第二阶段：Dataflow + Apache Beam 管道实战（2-3 周）

第三阶段：Pub/Sub + Cloud Composer + 存储选型系统过（2-3 周）

第四阶段：模拟考试 + 真实案例复盘（2 周）

Real test-taker experiences

Certification comparison

Study tips and common mistakes

FAQ

Frequently Asked Questions

If you plan to take GCP Professional Data Engineer, start with real practice.

Related certifications

GCP Associate Cloud Engineer

GCP Professional Cloud Architect

Azure Data Engineer Associate