Why can AI compliance not be a single pre-launch review?

Compliance is not a review step—it is a design constraint. Risk is not just in the output; it spreads across data collection, prompt design, tool access, and user expectations. 'Ship first, add safety later' is expensive because the harm reaches users first, and the liability does not automatically transfer to the model vendor.

What are the 4 risk categories every AI PM must identify?

(1) Accuracy risk (the model is confidently wrong and users believe it), (2) privacy risk (data that should not enter the model does), (3) misuse risk (users do things with the product they should not), (4) IP/copyright risk (generated output or training data has rights issues). The dangerous one is not the most frequent—it is the one whose single occurrence is the most expensive.

How many layers should an AI guardrail cover?

Three layers: input guardrail (length limits, sensitive-content detection, prompt-injection defense), processing guardrail (system prompt, tool permissions, retrieval boundaries), and output guardrail (policy checks, source display, human review, fallback). Output moderation alone is rarely enough.

Why is AI privacy usually 'collecting too much by default' rather than 'leaking'?

PMs default to 'more context, better output', which silently breaks the data boundary. The rule is data minimization: AI summary needs the text itself, not the user's full profile; AI support needs the question context and necessary order fields, not the entire CRM history; AI writing needs the goal and style, not browsing history. If a field is not strictly required for this task, do not send it.

What are the 5 compliance questions a PM must answer before launch?

(1) What is the worst this feature can get wrong? (2) Which data categories enter the model? (3) Do users know AI is involved? (4) Is a source citation, disclaimer, or human-escalation path needed? (5) When bad output appears, how is it caught and tracked? If any of the five is fuzzy, compliance design is not done—do not file safety under 'we will deal with it later' technical debt.

AI Ethics & Compliance: Safety & Governance

⏱️ 45 min

AI Ethics & Compliance: Safety and Regulatory

The biggest compliance risk for AI products isn't "forgetting to write a disclaimer." It's the team defaulting to "we'll add safety later." In practice, many AI features that ship first and add safety afterward pay a very high price, because risk doesn't just appear in the output -- it spreads across the entire chain from data collection, prompt design, and tool access to user expectations.

So this page isn't about memorizing legal articles. It's about a more practical AI safety and compliance thinking from a PM perspective.

AI Compliance Guardrail Map

Bottom Line: AI Compliance Isn't a Review Step -- It's a Design Constraint

The more practical view:

It's not a one-time audit before launch
It's not one team's (legal's) problem
It's not only for sensitive industries

Any generative AI product involving user input, knowledge output, automation, or content distribution already has compliance and safety risks.

4 Risk Types PMs Must Identify First

Risk	Common manifestation
Accuracy risk	Confidently wrong, and the user believes it
Privacy risk	Data that shouldn't enter the model gets sent in
Misuse risk	Users use the product for things it shouldn't be used for
IP / copyright risk	Generated results or training materials have copyright issues

Among these 4, the most dangerous isn't the one with highest probability -- it's the one where a single occurrence is extremely costly.

Risk Isn't Handled Uniformly -- Classify by Use Case

A practical classification:

Use case	Risk level	Why
Brainstorming	Low	Errors have limited impact
Draft generation	Medium	Users might send it out directly
Support answers	Medium-high	Wrong answers affect trust and ops cost
Hiring / finance / legal / medical	High	One error can cause serious consequences

If risk classification isn't done clearly, all downstream guardrails will be either too loose or too heavy.

Guardrails Should Cover Input, Processing, and Output

A complete guardrail system doesn't just filter output -- it has 3 layers:

Layer	Core question	Typical approach
Input guardrail	Can user input go straight into the system	Length limits, sensitive content detection, prompt injection defense
Processing guardrail	How is the model constrained	System prompt, tool permissions, retrieval boundaries
Output guardrail	Can the final result go straight to users	Policy check, source display, human review, fallback

Output moderation alone usually isn't enough.

Privacy Issues Are Often "Collecting Too Much by Default," Not "Leaking"

PMs designing AI features easily default to "give more context, model performs better." But this often directly breaks data boundaries.

The more stable principle remains data minimization:

Scenario	Actually needed data	Often over-collected data
AI summary	The text content itself	User's full profile
AI support	Question context, necessary order fields	Entire CRM history
AI writing	Writing goal and style	Unrelated browsing history

If a field isn't necessary for this task, don't send it in.

Prompt Injection and Tool Abuse Aren't Just Engineering Problems

PMs also need to know what can happen at the product level.

Typical issues include:

Users tricking the model into ignoring its rules
Accessing unauthorized data through tool calling
Using your product to generate prohibited content

This means when designing tool-enabled AI features, you're not just designing "what it can do" -- you're also designing "what it absolutely must not do."

Source Grounding Is Key to Trust

In medium-to-high risk scenarios, having the model "answer like it's right" isn't nearly enough. The more stable direction:

Cite sources whenever possible
Explicitly say "not sure" when uncertain
Refuse to answer or escalate to human when out of scope

These mechanisms sacrifice a bit of "smoothness" but typically earn more long-term trust.

What Compliance Review Should Ask

Before launch, PMs should at least be able to answer:

How bad can this feature fail at worst
What data entered the model
Does the user know AI is involved
Do we need source display, disclaimers, or human escalation paths
How do we handle and track bad outputs when they occur

If all 5 questions are still fuzzy, compliance design isn't finished.

4 Most Dangerous Mindsets

Mindset	Why it's dangerous
Ship first, add guardrails later	Risk reaches users first
The model provider will cover us	Responsibility doesn't automatically transfer out
It's just an internal tool, no need to be strict	Internal tools can also process sensitive data
Low-frequency risks can wait	AI risks are often low-frequency but high-impact

The thing AI PMs should avoid most is treating safety issues as "we'll deal with it later" technical debt.

A Sufficient PM Checklist

Is this use case low, medium, or high risk
Is there unnecessary sensitive data in the input
What real consequences come from wrong output
Is there source / disclaimer / escalation path
Is there bad case logging and rollback mechanism

This checklist isn't complex, but it's practical. Many incidents could've been caught by running through it once.

Practice

Take an AI feature you're building. Write these 4 lines:

Worst case -- how could it harm users or the business
What data actually shouldn't be sent to the model
What questions should the model refuse to answer
After a bad output -- who discovers it, who handles it

Once you can articulate these 4 lines clearly, compliance design has actually begun.

📚 相关资源

AI Safety Guide

❓ 常见问题

关于本章主题最常被搜索的问题，点击展开答案

AI 合规为什么不能等到上线前才审一次？

合规不是 review step，是 design constraint。风险不只在 output——data collection、prompt design、tool access、user expectation 全链路都在扩散。先上线再补 safety 代价非常高，因为问题会先到用户身上、责任也不会自动转移给模型厂商。

AI PM 要识别的 4 类风险是什么？

(1) accuracy risk（一本正经答错，用户还信了）(2) privacy risk（不该进模型的数据送进去）(3) misuse risk（用户拿产品做不该做的事）(4) IP / copyright risk（生成或训练材料有版权问题）。最危险的不是「概率最高」的，而是「发生一次就很贵」的那个。

AI guardrail 应该做几层？

3 层：input guardrail（长度限制、敏感内容检测、prompt injection 防护）、processing guardrail（system prompt、tool permission、retrieval boundary）、output guardrail（policy check、source display、human review、fallback）。只做 output moderation 通常不够。

AI 产品的隐私问题为什么常常不是「泄露」而是「默认收太多」？

PM 容易把「多给一点上下文模型效果更好」当默认策略，这会直接把数据边界做坏。原则是 data minimization：AI summary 只送文本本身而不是用户全量 profile；AI support 只送问题上下文与必要订单字段而不是整份 CRM 历史；AI writing 只送写作目标与 style 而不是浏览轨迹。字段不是这次任务必需就别送。

上线前 PM 必须回答的 compliance 5 问是什么？

(1) 这个功能最坏会错到什么程度 (2) 哪类数据进入了模型 (3) 用户是否知道 AI 在参与 (4) 是否需要 source、disclaimer 或人工升级路径 (5) 出现 bad output 后怎么处置和追踪。这 5 个还模糊，说明合规设计还没完成——别把安全当成「以后再说」的 technical debt。