What is the factuality problem in LLMs — is it the same as hallucination?

They are two sides of the same coin. The chapter defines factuality issues as the model producing answers `that sound coherent and convincing but are sometimes fabricated` — which is hallucination. Factuality looks at the output (is it true?), hallucination looks at the behaviour (is the model inventing?). The mitigations are the same set of techniques.

What are the three mitigations the chapter recommends to reduce hallucination?

(1) Inject ground truth into context — relevant article passages or Wikipedia entries — so the model has something real to lean on; this is the seed of RAG. (2) Lower the sampling parameters (drop `temperature`) and tell the model to say `I don't know` when unsure. (3) Provide question-answer pairs as examples, including ones the model should refuse — that turns `decline` into a demonstrated, legal action.

What does the `Neto Beto Roberto` example in the chapter actually demonstrate?

The author mixes three real questions (atom, moons of Mars) with two fake ones (Alvan Muntz, Kozar-09), and answers the fake ones with `?`. When asked about a freshly-invented person, `Neto Beto Roberto`, the model also answers `?`. The lesson: a few-shot demonstration of `admit when you don't know` teaches the model to decline rather than fabricate.

Why does lowering temperature sometimes make the model more confidently fabricate?

Temperature controls sampling randomness, not truthfulness. Lowering it makes the model emit its `most likely` answer more consistently — if that prior is wrong to begin with, low temperature simply locks in the wrong answer. That is why the chapter pairs `lower temperature` with `tell the model to say I don't know` and `provide factual context` — no single dial is enough on its own.

How do the chapter's mitigations relate to RAG?

The chapter's first mitigation — `provide factual context` — is the core idea of RAG. Production RAG industrialises that step: user query → vector-search relevant document chunks → splice them into the prompt as evidence → the LLM answers within that evidence. Manually pasting passages, as the chapter does, is the entry-level version; RAG is its automated, scalable form.

Factuality

Reduce hallucinations and improve response reliability

LLMs tend to generate responses that sound coherent and convincing but are sometimes completely made up. Improving prompts can help the model produce more accurate/factual answers and reduce the likelihood of inconsistent and fabricated responses.

Some solutions include:

Provide ground truth in context (e.g., a relevant article paragraph or Wikipedia entry) to reduce the chance of the model generating fabricated text.
Configure the model to generate less "creative" responses by lowering probability parameters and instructing it to admit when it doesn't know the answer (e.g., "I don't know").
Provide few-shot examples that combine questions and answers, including both known and unknown Q&A pairs.

Here's a simple example:

Prompt:

Q: What is an atom?
A: An atom is a tiny particle that makes up everything.

Q: Who is Alvan Muntz?
A: ?

Q: What is Kozar-09?
A: ?

Q: How many moons does Mars have?
A: Two, Phobos and Deimos.

Q: Who is Neto Beto Roberto?

Output:

A: ?

I made up "Neto Beto Roberto," so the model got this one right. Try tweaking the question slightly and see if you can still get it to work. Based on everything you've learned so far, there are different ways to improve this further.

📚 相关资源

❓ 常见问题

关于本章主题最常被搜索的问题，点击展开答案

什么是模型的「真实性」问题？跟幻觉是一回事吗？

本质上是同一个问题的两种说法。本章定义：模型有时会生成「听起来连贯且令人信服但有时是虚构」的回答——这就是幻觉。Factuality 是从结果维度看（输出是不是真的），hallucination 是从行为维度看（模型在编造）。所以「降低幻觉」和「提升真实性」用的都是同一套手段。

本章降低幻觉的三条做法是什么？

1) 在上下文中提供基本事实（相关文章段落、维基百科条目），让模型有可参考的真相——这就是 RAG 的雏形；2) 调低概率参数（temperature 拉低），并指示模型在不知道时直接说「我不知道」；3) 在 prompt 里给「问题 + 答案」组合示例，包含已知和不知道的样本，把「拒答」变成被示范过的合法行为。

本章用了一个「Neto Beto Roberto」的例子，它说明了什么？

本章作者把三个真问题（atom、Mars 卫星）和两个虚构问题（Alvan Muntz、Kozar-09）混在 few-shot 里，对虚构问题给的答案是 `？`。当作者再问一个新编造的人物 `Neto Beto Roberto`，模型也回答了 `？`。这证明：用 few-shot 示范「不知道就承认」，模型会学会拒答而不是硬编。

为什么 temperature 拉低有时反而让模型「更自信地编」？

temperature 控制的是采样的随机性，不是真假。拉低后模型只是更稳定地输出它「最可能」认为对的答案——如果它的先验本身就错，低 temperature 会让错误答案更稳定地出现。所以本章把「降 temperature」和「指示模型说不知道」+「提供事实上下文」绑在一起讲，单调一项不够。

本章的方法和 RAG 是什么关系？

本章第一条「在上下文中提供基本事实」就是 RAG 的核心思想。Production 上 RAG 把这步工程化：用户提问 → 向量检索相关文档段落 → 拼进 prompt 当事实证据 → LLM 在证据范围内回答。本章手动塞段落是入门版，RAG 是它的自动化、可扩展版本。

Prompt 大师

📚 相关资源

❓ 常见问题