Model Settings
Parameters like temperature, top_p, max length, and stop sequences
Try the "Parameter Lab" at the top of the page to see how temperature changes affect generation results in real time.
Prompts are the "instructions" you give to AI. Parameters are the AI's "personality settings." In production, tweaking these parameters often fixes "unstable output" problems faster than rewriting your prompt.
1. The Creativity Engine: Temperature vs. Top_p
The key insight here: AI doesn't understand word meanings. It's just doing probability prediction.
Temperature (Sampling Temperature)
Under the hood: When picking the next token, the model scales the probability distribution across its entire vocabulary.
- Low temperature (T -> 0): "Probability amplifier." High-probability tokens get even higher, low-probability ones basically vanish. The model becomes extremely conservative, always picking the safest word.
- High temperature (T -> 1.5+): "Probability equalizer." Lower-probability tokens get a real shot. The model becomes adventurous -- output gets more surprising and creative.
Practical rules:
- Need accuracy (writing code, extracting JSON): Lock it at
0.0. - Need diversity (writing fiction, brainstorming names): Try
1.2or higher.
Top_p (Nucleus Sampling)
Under the hood: Sort tokens by probability, then only sample from the smallest set whose cumulative probability exceeds P.
- What it does: Acts as a "noise filter."
- Use case: Setting
Top_p = 0.1means the AI won't touch tokens in the bottom 90% of the probability distribution, even in "creative mode."
Parameter tuning rules of thumb:
- Don't crank both at the same time.
- Adjust Temperature first. If the AI starts producing gibberish (grammar errors, nonsense), lower Top_p to filter out long-tail noise.
2. Presence and Diversity: Penalties
When the AI gets stuck in loops or keeps using the same words, these are your surgical tools:
- Presence Penalty: "Topic expander." Once a token has appeared, it gets penalized. This forces the AI to bring up new topics, increasing the "breadth" of the text.
- Frequency Penalty: "Vocabulary enricher." The more a token appears, the heavier the penalty. This forces the AI to use synonyms, increasing the "texture" of the text.
3. Length and Boundaries (Constraints)
Context Window -- a core 2026 concept
In 2026, mainstream models (Gemini 3, GPT-5) support million-token or even unlimited context. But you still need to watch out:
- Lost in the Middle: Models tend to have the weakest recall for content in the middle of long texts.
- Cost control: Longer context means inference cost (and latency) grows exponentially or linearly.
- Max Output Tokens: This only limits the reply length -- it doesn't affect how much the model can "see."
Stop Sequences
These aren't just for preventing AI rambling -- they're logic control.
- Pro tip: In few-shot prompting, set
\nas a stop sequence to force the AI to generate only one line at a time. Great for batch data generation.
The Parameter Mental Model: Dashboard Method
We split tasks into three "quadrants" -- find yours:
Quadrant A: Hard Logic (Code, Logic, Math)
- Config: Temp:
0.0, Top_p:1.0, Penalty:0.0 - Mindset: You're a strict instructor. 1+1 must equal 2. Zero randomness allowed.
Quadrant B: Content Creation (Email, Summary, Translation)
- Config: Temp:
0.7, Top_p:0.9, Penalty:0.1 - Mindset: You're an editor-in-chief. You want smooth prose with professional depth and enough variation to not sound robotic.
Quadrant C: Creative Explosion (Brainstorming, Fiction)
- Config: Temp:
1.3, Top_p:1.0, Penalty:0.5 - Mindset: You're the client saying "show me something I haven't seen before" -- and you're okay with the occasional miss.
Advanced: Scientific Parameter Tuning
- Control your variables: Keep the prompt constant, only change parameters.
- Fix the Seed: During testing, use a fixed
Seedvalue so you know improvements come from parameter changes, not random luck. - Parallel sampling: At
Temperature = 1.0, usen > 1(generate multiple results at once) and pick the best one.
Next up: Now that you've got the "personality settings" down, we're heading into the real arena -- Core Techniques, where you'll learn how prompt structure directly changes how AI thinks.
📚 相关资源
❓ 常见问题
关于本章主题最常被搜索的问题,点击展开答案
Temperature 调到 0 vs 1.5 区别有多大?
底层是概率分布缩放:T=0 时高概率词被放大、低概率词被压平,模型每次都选最稳的词,写代码、抽 JSON 必须用;T=1.5+ 时概率被「平权」,长尾词也有出头机会,写小说、起名字才用。教程的实战军规:追求准确性死守 0.0,追求多样性敢拉到 1.2 以上。
Temperature 和 Top_p 应该一起调吗?
不要。教程明确写「不要同时大幅调整两者」。先动 Temperature,如果模型在高温下开始说胡话(语法错、瞎编),再降 Top_p 当噪音过滤器——比如 Top_p=0.1 就把概率末尾 90% 的废词全砍掉。两个一起拉,没法定位是哪个引起的问题。
Presence Penalty 和 Frequency Penalty 区别是什么?
Presence Penalty 是「话题开辟者」:词出现过一次就被惩罚,不管出现几次惩罚都一样,逼模型谈新话题、增加文本广度。Frequency Penalty 是「词汇丰富度」:词出现次数越多惩罚越重,逼模型换近义词、避免一个词刷屏。AI 陷入死循环时这俩是手术刀。
什么是 Lost in the Middle?长上下文模型还有这个问题吗?
Lost in the Middle 是大模型对长文本「中间部分」记忆力最弱的现象——开头和结尾召回率高,中间会被忽略。即便 2026 年 Gemini 3、GPT-5 已经支持百万级甚至无限上下文,这个问题依然存在。重要信息要放在 prompt 开头或结尾,不要埋在中段。
三种任务象限分别该用什么参数组合?
象限 A 逻辑硬核(代码/数学):Temp 0.0 / Top_p 1.0 / Penalty 0.0;象限 B 文案创作(邮件/总结/翻译):Temp 0.7 / Top_p 0.9 / Penalty 0.1;象限 C 灵感爆发(头脑风暴/小说):Temp 1.3 / Top_p 1.0 / Penalty 0.5。配上 Seed 固化做控制变量实验,调参才科学。