logo
P
Prompt Master

Prompt 大师

掌握和 AI 对话的艺术

Identify hallucination

verify claims against given facts

#TL;DR(中文)

  • 这个 prompt 用来做
    code
    truthfulness
    检查:把 model output 拆成 claims,然后逐条对照输入 facts。
  • 特别适合
    code
    closed-domain
    场景(medical/legal/policy):要求 “use exclusively the information above”,任何超出事实的内容都算风险。
  • 建议把它当成一个通用的
    code
    evaluation
    模板:对每个生成任务都能做 claim-by-claim 复核。

#Background

This prompt tests an LLM's ability to identify hallucination in the context of closed-domain question answering:

⚠️ Note: due to the challenging nature of the task, LLMs are likely to hallucinate when they have no knowledge or context regarding the question.

The prompt below includes indicators that show which sections correspond to what information.

#How to Apply(中文)

你可以把这个检查流程接在任何 “基于 facts 生成文档” 的任务后面:

  1. 准备 facts:用 bullet list / 表格 / JSON(最好可编号)
  2. 生成 output:让模型写 note / summary / report(要求只用给定 facts)
  3. truthfulness 检查:把 output 逐条拆成 claims,与 facts 一一对照
  4. 输出差异:列出所有 “not contained in facts” 的信息,作为修正点

#How to Iterate(中文)

  1. 在生成阶段加约束:明确禁止补全(例如禁止 BMI、并发症、治疗建议等不存在字段)
  2. 在检查阶段要求结构化输出:
    code
    UnsupportedClaims
    /
    code
    SupportedClaims
    /
    code
    NeedsClarification
  3. 给每个 claim 打标签:
    code
    supported
    /
    code
    unsupported
    /
    code
    unclear
  4. 引入 “ask-first” 策略:发现关键缺口时,先提出 clarifying questions 再生成最终版本

#Self-check rubric(中文)

  • 是否能识别出明显超出 facts 的内容(例如 BMI、并发症、治疗建议)?
  • 是否把 “facts 中有,但表述方式不同” 误判为 unsupported?
  • 输出是否可操作:能否直接用于修订原文(指出具体句子/片段)?

#Practice(中文)

练习:用你自己的一个文档生成任务做两轮对比:

  • Round 1:不加约束直接生成 → 观察常见
    code
    hallucination
    类型
  • Round 2:加 “use exclusively facts” + 固定输出模板 → 再跑 truthfulness 检查

记录你最常见的 3 个失败模式,并把它们写进你的 system prompt / checklist 里。

#Prompt

markdown
Patient’s facts: - 20 year old female - with a history of anerxia nervosa and depression - blood pressure 100/50, pulse 50, height 5’5’’ - referred by her nutrionist but is in denial of her illness - reports eating fine but is severely underweight Question: Please rewrite the data above into a medical note, using exclusively the information above. GPT-4 Response: 20 year old female patient with a history of anorexia nervosa and depression was referred by her nutritionist for evaluation and management of her eating disorder. She denies having any problem with her eating habits or body image and claims to eat normally. However, she is severely underweight for her height (BMI 16.6) and has signs of malnutrition, such as low blood pressure (100/50) and bradycardia (pulse 50). She also reports feeling depressed and hopeless. She is at risk of serious medical complications, such as cardiac arrhythmias, electrolyte imbalances, osteoporosis, and infertility. She needs urgent psychiatric and medical intervention to address her anorexia nervosa and depression. Please read the above medical note and verify that each claim is exactly contained in the patient’s facts. Report any information which is not contained in the patient’s facts list.

#Code / API

#OpenAI (Python)

python
from openai import OpenAI client = OpenAI() response = client.chat.completions.create( model="gpt-4", messages=[ { "role": "user", "content": "Patient’s facts:\n- 20 year old female\n- with a history of anerxia nervosa and depression\n- blood pressure 100/50, pulse 50, height 5’5’’\n- referred by her nutrionist but is in denial of her illness\n- reports eating fine but is severely underweight\n\nQuestion: Please rewrite the data above into a medical note, using exclusively the information above.\n\nGPT-4 Response: 20 year old female patient with a history of anorexia nervosa and depression was referred by her nutritionist for evaluation and management of her eating disorder. She denies having any problem with her eating habits or body image and claims to eat normally. However, she is severely underweight for her height (BMI 16.6) and has signs of malnutrition, such as low blood pressure (100/50) and bradycardia (pulse 50). She also reports feeling depressed and hopeless. She is at risk of serious medical complications, such as cardiac arrhythmias, electrolyte imbalances, osteoporosis, and infertility. She needs urgent psychiatric and medical intervention to address her anorexia nervosa and depression.\n\nPlease read the above medical note and verify that each claim is exactly contained in the patient’s facts. Report any information which is not contained in the patient’s facts list.", } ], temperature=1, max_tokens=250, top_p=1, frequency_penalty=0, presence_penalty=0, )

#Fireworks (Python)

python
import fireworks.client fireworks.client.api_key = "<FIREWORKS_API_KEY>" completion = fireworks.client.ChatCompletion.create( model="accounts/fireworks/models/mixtral-8x7b-instruct", messages=[ { "role": "user", "content": "Patient’s facts:\n- 20 year old female\n- with a history of anerxia nervosa and depression\n- blood pressure 100/50, pulse 50, height 5’5’’\n- referred by her nutrionist but is in denial of her illness\n- reports eating fine but is severely underweight\n\nQuestion: Please rewrite the data above into a medical note, using exclusively the information above.\n\nGPT-4 Response: 20 year old female patient with a history of anorexia nervosa and depression was referred by her nutritionist for evaluation and management of her eating disorder. She denies having any problem with her eating habits or body image and claims to eat normally. However, she is severely underweight for her height (BMI 16.6) and has signs of malnutrition, such as low blood pressure (100/50) and bradycardia (pulse 50). She also reports feeling depressed and hopeless. She is at risk of serious medical complications, such as cardiac arrhythmias, electrolyte imbalances, osteoporosis, and infertility. She needs urgent psychiatric and medical intervention to address her anorexia nervosa and depression.\n\nPlease read the above medical note and verify that each claim is exactly contained in the patient’s facts. Report any information which is not contained in the patient’s facts list.", } ], stop=["<|im_start|>", "<|im_end|>", "<|endoftext|>"], stream=True, n=1, top_p=1, top_k=40, presence_penalty=0, frequency_penalty=0, prompt_truncate_len=1024, context_length_exceeded_behavior="truncate", temperature=0.9, max_tokens=4000, )

#Reference

1v1免费职业咨询