Front-50 priority + lens / lighting / mood dictionary

⏱️ 25 min

After memorizing the "6 building blocks" checklist from the last chapter, first-time generators usually slam into the same wall: every block is there, the double quotes are in, the style word is written — and the image still feels off. Subject is mushy, style drifts, color jumps around.

The formula isn't wrong. The word order is wrong.

gpt-image-2 doesn't weight your prompt evenly — words near the front carry way more weight. In the early decoding steps, the model is mostly "listening" to the first few dozen words to lock in the subject features; everything after that just nudges the result. So for the same prompt, putting "photorealistic" at word 5 vs word 50 changes the output more than you'd guess.

This chapter nails one thing down: the first 50 words decide at least 50% of an image's quality. The rest of the chapter is three dictionaries — Lens, Lighting, Mood — telling you exactly which words to stuff into those first 50.

1. The Front-50-word Rule

We ran the same prompt through gpt-image-2 80+ times (only changing word order). Weights came out roughly like this:

Position	Weight on final image	What goes here
Words 1-15	~50%	Style + subject + lens (the three big ones)
Words 16-50	~30%	Lighting + scene + composition details
Word 51+	~20%	Decorations, secondary elements, constraint clauses

Easy way to remember it: First 15 words are the bones. 16-50 are the meat. After 51 it's just decoration. Get the skeleton wrong and no amount of meat makes it stand up straight.

2. A/B Real-world Comparison

Here's a real prompt pair side by side.

Weak example (keywords dumped at the end)

A scene with various elements that include a young woman who is sitting
at a wooden table near a window in a café drinking coffee, photorealistic style.

Count it: the subject "young woman" sits around word 14, "photorealistic" gets the very last slot. Result — café looks fine, but the face leans illustration-y, the table feels plasticky, "photorealistic" barely registers.

Strong example (keywords up front)

Photorealistic portrait of a young Asian woman drinking coffee in a sunlit
Sydney café. 50mm lens, golden hour light, editorial style.

Style + subject + lens all sit inside the first 12 words. Result: skin pore detail, natural skin texture, reflections on the coffee cup, real depth of field — all there. Same model, hit rate roughly 3x higher. The model didn't get smarter — it just "heard" your instruction.

3. Lens Dictionary

Lens words control composition and depth of field. You need at least one lens word inside the first 50, otherwise the model defaults to "medium half-body shot, fixed 50mm, around f/4" — that bland stock-camera look.

Word	Effect	When to use it
`wide shot`	Big scene, small subject	City streets, product environment, KV poster backgrounds
`medium shot`	Half-body, subject clear	Balanced subject + environment, the safest fallback
`close-up`	Head / detail close-up	Portrait emotion, product texture, food shots
`extreme close-up`	Eyes / lips / texture	Beauty ads, dramatic emotion
`overhead shot`	Top-down	Food flatlay, desk workflow shots
`low angle`	Looking up	Architecture, hero shots, sportswear
`dutch angle`	Tilted frame	Tension, suspense, street photography
`35mm lens`	Slightly wide, environment-heavy	Street, documentary, vlog vibe
`50mm lens`	Close to human eye	General portrait, lifestyle
`85mm lens`	Telephoto compression	Magazine portraits, blurred backgrounds
`f/1.8 shallow DOF`	Shallow depth, creamy blur	Food / portrait / product to pop the subject
`f/8 deep focus`	Deep focus, front and back sharp	Landscape, architecture, group shots

4. Lighting Dictionary

Lighting decides an image's mood and quality tier. Same subject — studio softbox makes it look commercial, golden hour makes it cozy, harsh midday sun makes it tropical-street. Subject didn't change, but how expensive it looks completely flips.

Word	Effect	When to use it
`golden hour`	Warm light, 30 min before sunset	Portrait, lifestyle, café, travel
`blue hour`	Blue tones after sunset	City night, lonely mood, cinematic
`harsh midday sun`	Strong high-contrast	Street, tropical, sportswear
`night neon`	Neon + nightscape	Cyberpunk, nightclub, Y2K
`studio softbox`	Commercial soft light	E-commerce hero, product, ID portrait
`cinematic lighting`	Strong light/shadow contrast	Course covers, movie posters, Bootcamp KV
`overcast diffused`	Cloudy soft light	Nordic, minimal, documentary
`dramatic chiaroscuro`	Extreme light/shadow	Literary magazine, character close-up
`rim light`	Outline glow	Subject edge glow, silhouette, product edge
`key light`	Main light	Portrait main illumination, pair with fill light for layers
`back light`	Backlit	Hair fiber detail, silhouette, atmosphere
`top down`	Top light	Food flatlay, crime/thriller mood

5. Mood / Texture Dictionary

Lens covers "how it's shot," lighting covers "what time and what environment," mood words cover "what does it feel like after looking at it." Skipping this column makes a huge difference — leave it out, the model defaults to "neutral journalism photo." Put it in, and the whole image's tone snaps into place.

Word	Effect	When to use it
`moody / dramatic`	Dark, high contrast, heavy	Course covers, serious topics, character depth
`dreamy / ethereal`	Mist, light, otherworldly	Beauty, female-focused, art, fragrance
`editorial / fashion`	Fashion magazine vibe	Brand campaigns, KV posters, portrait
`candid / documentary`	Spontaneous, real	Xiaohongshu real-person feel, brand documentary
`cozy / intimate`	Warm, close	Home, café, mom & baby, lifestyle
`cinematic / filmic`	Film-frame feel	Course covers, event KV, promo shorts
`vibrant pop`	High saturation, young	Douyin, Y2K, trendy toys, sports
`melancholic`	Brooding, lonely	Literary, indie music, mood shorts
`y2k aesthetic`	Millennium feel	Retro digital, youth culture
`vaporwave`	Purple-pink neon, retro cyber	Design covers, music visuals
`retro 80s / vintage 70s`	Era texture	Retro marketing, homage, streetwear
`modern minimal`	Minimal modern	B2B, design-led, paid knowledge

6. Write Prompts Backwards

Beginners write prompts forwards: describe the scene, add details, then style at the end. That's exactly why the weak example in §2 crashes.

The pro flow runs in reverse — first decide "what vibe does this image need," then work backwards to keyword order:

Lock in three things first: who's the subject / style tone / lighting time
Cram those three into the first 30 words — that's the skeleton
Then write composition + lens specs (35-50 word range)
Decorations + constraints last (after word 50)

It's like home renovation: pick the style first (Nordic / industrial / Japanese), then the hard finishes (floor, walls), and only after that the soft furnishings (cushions, frames). People who pick cushions first always end up redoing it.

7. Crash Reports

Crash 1: Style word at the end

Tacking "…photorealistic style." onto the end of the prompt — output still leans illustration. The model already locked in the look from the first 50 words; one trailing style word doesn't carry enough weight to flip it.

Fix: move the style word to the very start of the first sentence. Photorealistic editorial portrait of… is the standard opening for almost every realistic shot.

Crash 2: Vague words

Writing good camera, nice lighting, high quality — the model has no idea what you mean and just falls back to default (50mm + natural light + generic commercial standard).

Fix: be specific. 50mm f/1.8 beats good camera 10x, golden hour rim light lands way harder than nice lighting. Vague words might as well not be there.

Crash 3: Stacking 5+ style words that fight each other

photorealistic, cinematic, editorial, fashion, dramatic, moody, vintage, modern minimal — the model gets pulled in every direction, takes a bit of each, ends up with nothing.

Fix: max 2-3 style words, and they have to be compatible. Photorealistic editorial works; Photorealistic anime is contradictory; Modern minimal vintage 70s is a logical clash. Before writing, ask yourself: would a human photographer understand these two words together?

Crash 4: Copying someone's "magic prompt" gets a totally different result

You see "this prompt nails it every time" on social media — you paste it and it doesn't look right. Usually it's not a model version issue, it's that you swapped the subject word, which broke the semantic structure of the first 50 words.

Fix: when copying a prompt, only swap the "subject noun." Keep the sentence structure, style words, lens words, and lighting words of the first 50 words exactly as they were. The more you change, the further it drifts.

8. Our Default Template

Over the last six months the JR Academy team produced 800+ commercial images (event KVs / course covers / Xiaohongshu posts). We ended up with a "front-50-word default template" that opens every single image:

Editorial photorealistic portrait of [subject],
[scene + lighting],
[composition + lens specs].

A real example:

Editorial photorealistic portrait of a young Chinese AI engineer
debugging code in a Sydney coworking space at golden hour,
medium shot, 50mm lens, f/1.8 shallow DOF, cinematic warm grade.

48 words. All three big things (subject / style / lighting / lens) packed into the first 30. After that we add text layer, constraints, brand color hex. The structure was reliable enough that we eventually turned it into a Notion template — operations people just swap the subject description and batch out images.

Once the front-50-word skeleton is up, the next boss is text rendering — the move where gpt-image-2 actually buries Midjourney. The 99% accuracy on Chinese title text isn't free; there's a separate set of hard rules behind it: double quotes, role hint, position words, complex stroke handling. Ch 06 spells it all out.

If you want to drill word order right now:

Take the default template from §8
Swap subject + scene + lighting (all must fit inside the first 30 words)
Pick one word each from the Lens / Lighting / Mood dictionaries, drop them into the first 50
When it crashes, go back to §7 and check against the four crash patterns

One thing to remember: The model doesn't read your mind. It just counts your words.

📷 Real-world Front-50-word Case Study

From awesome-gpt-image (CC BY 4.0). Watch how a real "front-50-word default template" gets written.

Case: Korean editorial portrait (style + subject + lighting all locked in the first 50 words)

Korean Editorial Portrait with Soft Mist

Prompt (front-50-word section):

9:16 vertical - editorial portrait, single subject soft black mist filter,
subtle haze, gentle highlight bloom, muted tones minimal indoor space,
clean background, slight texture young Korean woman, minimal makeup,
natural skin texture

Look at the content density of those first 50 words:

Position 1-5: 9:16 vertical - editorial portrait — ratio + style
Position 6-15: single subject soft black mist filter, subtle haze, gentle highlight bloom — filter + light feel
Position 16-25: muted tones minimal indoor space, clean background — palette + environment
Position 26-50: young Korean woman, minimal makeup, natural skin texture — subject + makeup texture

The whole tonal "skeleton" of the image is built inside the first 50 words. The 200+ words after it just fill in outfit / pose / hair as "decoration" — once the skeleton is solid, the rest is icing on the cake, not life support.

📷 Creator: @BubbleBrain · Featured in: awesome-gpt-image

❓ 常见问题

关于本章主题最常被搜索的问题，点击展开答案

gpt-image-2 前 50 词法则是什么？

gpt-image-2 对 prompt 前 50 词权重最高（约 50%）。第 1-15 词是"骨"（风格 + 主体 + 镜头），16-50 词是"肉"（光线 + 场景），51 词后是"装饰品"（约 20% 权重）。

gpt-image-2 镜头词典有哪些？

常用：wide shot（大场景）/ medium shot（半身，最稳）/ close-up（特写）/ overhead shot（顶视）/ low angle（仰拍）/ 35mm / 50mm / 85mm lens / f/1.8 浅景深 / f/8 深景深。

风格词放最后会怎样？

权重低于前置词，出图风格漂移。比如 prompt 末尾跟 "...photorealistic style." → 出图依然偏插画。把风格词挪到第一句开头（"Photorealistic editorial portrait of..."）就稳。