10

AI Ethics & Compliance: Safety & Governance

⏱️ 45 min

AI Ethics & Compliance: Safety and Regulatory

The biggest compliance risk for AI products isn't "forgetting to write a disclaimer." It's the team defaulting to "we'll add safety later." In practice, many AI features that ship first and add safety afterward pay a very high price, because risk doesn't just appear in the output -- it spreads across the entire chain from data collection, prompt design, and tool access to user expectations.

So this page isn't about memorizing legal articles. It's about a more practical AI safety and compliance thinking from a PM perspective.

AI Compliance Guardrail Map


Bottom Line: AI Compliance Isn't a Review Step -- It's a Design Constraint

The more practical view:

  • It's not a one-time audit before launch
  • It's not one team's (legal's) problem
  • It's not only for sensitive industries

Any generative AI product involving user input, knowledge output, automation, or content distribution already has compliance and safety risks.


4 Risk Types PMs Must Identify First

RiskCommon manifestation
Accuracy riskConfidently wrong, and the user believes it
Privacy riskData that shouldn't enter the model gets sent in
Misuse riskUsers use the product for things it shouldn't be used for
IP / copyright riskGenerated results or training materials have copyright issues

Among these 4, the most dangerous isn't the one with highest probability -- it's the one where a single occurrence is extremely costly.


Risk Isn't Handled Uniformly -- Classify by Use Case

A practical classification:

Use caseRisk levelWhy
BrainstormingLowErrors have limited impact
Draft generationMediumUsers might send it out directly
Support answersMedium-highWrong answers affect trust and ops cost
Hiring / finance / legal / medicalHighOne error can cause serious consequences

If risk classification isn't done clearly, all downstream guardrails will be either too loose or too heavy.


Guardrails Should Cover Input, Processing, and Output

A complete guardrail system doesn't just filter output -- it has 3 layers:

LayerCore questionTypical approach
Input guardrailCan user input go straight into the systemLength limits, sensitive content detection, prompt injection defense
Processing guardrailHow is the model constrainedSystem prompt, tool permissions, retrieval boundaries
Output guardrailCan the final result go straight to usersPolicy check, source display, human review, fallback

Output moderation alone usually isn't enough.


Privacy Issues Are Often "Collecting Too Much by Default," Not "Leaking"

PMs designing AI features easily default to "give more context, model performs better." But this often directly breaks data boundaries.

The more stable principle remains data minimization:

ScenarioActually needed dataOften over-collected data
AI summaryThe text content itselfUser's full profile
AI supportQuestion context, necessary order fieldsEntire CRM history
AI writingWriting goal and styleUnrelated browsing history

If a field isn't necessary for this task, don't send it in.


Prompt Injection and Tool Abuse Aren't Just Engineering Problems

PMs also need to know what can happen at the product level.

Typical issues include:

  • Users tricking the model into ignoring its rules
  • Accessing unauthorized data through tool calling
  • Using your product to generate prohibited content

This means when designing tool-enabled AI features, you're not just designing "what it can do" -- you're also designing "what it absolutely must not do."


Source Grounding Is Key to Trust

In medium-to-high risk scenarios, having the model "answer like it's right" isn't nearly enough. The more stable direction:

  • Cite sources whenever possible
  • Explicitly say "not sure" when uncertain
  • Refuse to answer or escalate to human when out of scope

These mechanisms sacrifice a bit of "smoothness" but typically earn more long-term trust.


What Compliance Review Should Ask

Before launch, PMs should at least be able to answer:

  1. How bad can this feature fail at worst
  2. What data entered the model
  3. Does the user know AI is involved
  4. Do we need source display, disclaimers, or human escalation paths
  5. How do we handle and track bad outputs when they occur

If all 5 questions are still fuzzy, compliance design isn't finished.


4 Most Dangerous Mindsets

MindsetWhy it's dangerous
Ship first, add guardrails laterRisk reaches users first
The model provider will cover usResponsibility doesn't automatically transfer out
It's just an internal tool, no need to be strictInternal tools can also process sensitive data
Low-frequency risks can waitAI risks are often low-frequency but high-impact

The thing AI PMs should avoid most is treating safety issues as "we'll deal with it later" technical debt.


A Sufficient PM Checklist

  • Is this use case low, medium, or high risk
  • Is there unnecessary sensitive data in the input
  • What real consequences come from wrong output
  • Is there source / disclaimer / escalation path
  • Is there bad case logging and rollback mechanism

This checklist isn't complex, but it's practical. Many incidents could've been caught by running through it once.


Practice

Take an AI feature you're building. Write these 4 lines:

  1. Worst case -- how could it harm users or the business
  2. What data actually shouldn't be sent to the model
  3. What questions should the model refuse to answer
  4. After a bad output -- who discovers it, who handles it

Once you can articulate these 4 lines clearly, compliance design has actually begun.

📚 相关资源

❓ 常见问题

关于本章主题最常被搜索的问题,点击展开答案

AI 合规为什么不能等到上线前才审一次?

合规不是 review step,是 design constraint。风险不只在 output——data collection、prompt design、tool access、user expectation 全链路都在扩散。先上线再补 safety 代价非常高,因为问题会先到用户身上、责任也不会自动转移给模型厂商。

AI PM 要识别的 4 类风险是什么?

(1) accuracy risk(一本正经答错,用户还信了)(2) privacy risk(不该进模型的数据送进去)(3) misuse risk(用户拿产品做不该做的事)(4) IP / copyright risk(生成或训练材料有版权问题)。最危险的不是「概率最高」的,而是「发生一次就很贵」的那个。

AI guardrail 应该做几层?

3 层:input guardrail(长度限制、敏感内容检测、prompt injection 防护)、processing guardrail(system prompt、tool permission、retrieval boundary)、output guardrail(policy check、source display、human review、fallback)。只做 output moderation 通常不够。

AI 产品的隐私问题为什么常常不是「泄露」而是「默认收太多」?

PM 容易把「多给一点上下文模型效果更好」当默认策略,这会直接把数据边界做坏。原则是 data minimization:AI summary 只送文本本身而不是用户全量 profile;AI support 只送问题上下文与必要订单字段而不是整份 CRM 历史;AI writing 只送写作目标与 style 而不是浏览轨迹。字段不是这次任务必需就别送。

上线前 PM 必须回答的 compliance 5 问是什么?

(1) 这个功能最坏会错到什么程度 (2) 哪类数据进入了模型 (3) 用户是否知道 AI 在参与 (4) 是否需要 source、disclaimer 或人工升级路径 (5) 出现 bad output 后怎么处置和追踪。这 5 个还模糊,说明合规设计还没完成——别把安全当成「以后再说」的 technical debt。