How big is the gap between temperature 0 and 1.5?

Under the hood it's probability-distribution scaling: at T=0 high-probability tokens get amplified and low-probability ones flattened, so the model picks the safest token every time—mandatory for code or JSON extraction. At T=1.5+ probabilities get equalized, giving long-tail tokens a real shot—reserved for fiction or naming. The chapter's rule: lock at 0.0 for accuracy, push past 1.2 for diversity.

Should I tune temperature and top_p at the same time?

No. The chapter is explicit: 'don't crank both at the same time.' Adjust temperature first; if the model starts producing gibberish (grammar errors, nonsense) at high temperature, lower top_p as a noise filter—e.g. top_p=0.1 cuts the bottom 90% of the distribution. Tweaking both at once makes it impossible to attribute the change.

What's the difference between presence penalty and frequency penalty?

Presence penalty is the 'topic expander': a token gets penalized once it appears at all, regardless of frequency, pushing the model toward new topics and broader text. Frequency penalty is the 'vocabulary enricher': the penalty grows with each repetition, forcing synonym variation. When the model is stuck in a loop, these are your surgical tools.

What is Lost in the Middle, and does it still affect long-context models?

Lost in the Middle is the empirical finding that LLMs recall the start and end of long inputs well but lose information in the middle. It still applies in 2026 even though Gemini 3 and GPT-5 support million-token or unlimited context. Put critical information at the top or bottom of your prompt—never bury it in the middle.

What parameter combos match the three task quadrants?

Quadrant A (hard logic—code, math): Temp 0.0 / Top_p 1.0 / Penalty 0.0. Quadrant B (content—email, summary, translation): Temp 0.7 / Top_p 0.9 / Penalty 0.1. Quadrant C (creative—brainstorming, fiction): Temp 1.3 / Top_p 1.0 / Penalty 0.5. Pair with a fixed seed for controlled-variable testing, otherwise you can't tell parameter wins from random luck.

Model Settings

Parameters like temperature, top_p, max length, and stop sequences

Try the "Parameter Lab" at the top of the page to see how temperature changes affect generation results in real time.

Prompts are the "instructions" you give to AI. Parameters are the AI's "personality settings." In production, tweaking these parameters often fixes "unstable output" problems faster than rewriting your prompt.

1. The Creativity Engine: Temperature vs. Top_p

The key insight here: AI doesn't understand word meanings. It's just doing probability prediction.

Temperature (Sampling Temperature)

Under the hood: When picking the next token, the model scales the probability distribution across its entire vocabulary.

Low temperature (T -> 0): "Probability amplifier." High-probability tokens get even higher, low-probability ones basically vanish. The model becomes extremely conservative, always picking the safest word.
High temperature (T -> 1.5+): "Probability equalizer." Lower-probability tokens get a real shot. The model becomes adventurous -- output gets more surprising and creative.

Practical rules:

Need accuracy (writing code, extracting JSON): Lock it at 0.0.
Need diversity (writing fiction, brainstorming names): Try 1.2 or higher.

Top_p (Nucleus Sampling)

Under the hood: Sort tokens by probability, then only sample from the smallest set whose cumulative probability exceeds P.

What it does: Acts as a "noise filter."
Use case: Setting Top_p = 0.1 means the AI won't touch tokens in the bottom 90% of the probability distribution, even in "creative mode."

Parameter tuning rules of thumb:

Don't crank both at the same time.

Adjust Temperature first. If the AI starts producing gibberish (grammar errors, nonsense), lower Top_p to filter out long-tail noise.

2. Presence and Diversity: Penalties

When the AI gets stuck in loops or keeps using the same words, these are your surgical tools:

Presence Penalty: "Topic expander." Once a token has appeared, it gets penalized. This forces the AI to bring up new topics, increasing the "breadth" of the text.
Frequency Penalty: "Vocabulary enricher." The more a token appears, the heavier the penalty. This forces the AI to use synonyms, increasing the "texture" of the text.

3. Length and Boundaries (Constraints)

Context Window -- a core 2026 concept

In 2026, mainstream models (Gemini 3, GPT-5) support million-token or even unlimited context. But you still need to watch out:

Lost in the Middle: Models tend to have the weakest recall for content in the middle of long texts.
Cost control: Longer context means inference cost (and latency) grows exponentially or linearly.
Max Output Tokens: This only limits the reply length -- it doesn't affect how much the model can "see."

Stop Sequences

These aren't just for preventing AI rambling -- they're logic control.

Pro tip: In few-shot prompting, set \n as a stop sequence to force the AI to generate only one line at a time. Great for batch data generation.

The Parameter Mental Model: Dashboard Method

We split tasks into three "quadrants" -- find yours:

Quadrant A: Hard Logic (Code, Logic, Math)

Config: Temp: 0.0, Top_p: 1.0, Penalty: 0.0
Mindset: You're a strict instructor. 1+1 must equal 2. Zero randomness allowed.

Quadrant B: Content Creation (Email, Summary, Translation)

Config: Temp: 0.7, Top_p: 0.9, Penalty: 0.1
Mindset: You're an editor-in-chief. You want smooth prose with professional depth and enough variation to not sound robotic.

Quadrant C: Creative Explosion (Brainstorming, Fiction)

Config: Temp: 1.3, Top_p: 1.0, Penalty: 0.5
Mindset: You're the client saying "show me something I haven't seen before" -- and you're okay with the occasional miss.

Advanced: Scientific Parameter Tuning

Control your variables: Keep the prompt constant, only change parameters.
Fix the Seed: During testing, use a fixed Seed value so you know improvements come from parameter changes, not random luck.
Parallel sampling: At Temperature = 1.0, use n > 1 (generate multiple results at once) and pick the best one.

Next up: Now that you've got the "personality settings" down, we're heading into the real arena -- Core Techniques, where you'll learn how prompt structure directly changes how AI thinks.

📚 相关资源

❓ 常见问题

关于本章主题最常被搜索的问题，点击展开答案

Temperature 调到 0 vs 1.5 区别有多大？

底层是概率分布缩放：T=0 时高概率词被放大、低概率词被压平，模型每次都选最稳的词，写代码、抽 JSON 必须用；T=1.5+ 时概率被「平权」，长尾词也有出头机会，写小说、起名字才用。教程的实战军规：追求准确性死守 0.0，追求多样性敢拉到 1.2 以上。

Temperature 和 Top_p 应该一起调吗？

不要。教程明确写「不要同时大幅调整两者」。先动 Temperature，如果模型在高温下开始说胡话（语法错、瞎编），再降 Top_p 当噪音过滤器——比如 Top_p=0.1 就把概率末尾 90% 的废词全砍掉。两个一起拉，没法定位是哪个引起的问题。

Presence Penalty 和 Frequency Penalty 区别是什么？

Presence Penalty 是「话题开辟者」：词出现过一次就被惩罚，不管出现几次惩罚都一样，逼模型谈新话题、增加文本广度。Frequency Penalty 是「词汇丰富度」：词出现次数越多惩罚越重，逼模型换近义词、避免一个词刷屏。AI 陷入死循环时这俩是手术刀。

什么是 Lost in the Middle？长上下文模型还有这个问题吗？

Lost in the Middle 是大模型对长文本「中间部分」记忆力最弱的现象——开头和结尾召回率高，中间会被忽略。即便 2026 年 Gemini 3、GPT-5 已经支持百万级甚至无限上下文，这个问题依然存在。重要信息要放在 prompt 开头或结尾，不要埋在中段。

三种任务象限分别该用什么参数组合？

象限 A 逻辑硬核（代码/数学）：Temp 0.0 / Top_p 1.0 / Penalty 0.0；象限 B 文案创作（邮件/总结/翻译）：Temp 0.7 / Top_p 0.9 / Penalty 0.1；象限 C 灵感爆发（头脑风暴/小说）：Temp 1.3 / Top_p 1.0 / Penalty 0.5。配上 Seed 固化做控制变量实验，调参才科学。

Prompt 大师

Parameter Playground