logo
05

Front-50 priority + lens / lighting / mood dictionary

⏱️ 25 min

After memorizing the "6 building blocks" checklist from the last chapter, first-time generators usually slam into the same wall: every block is there, the double quotes are in, the style word is written — and the image still feels off. Subject is mushy, style drifts, color jumps around.

The formula isn't wrong. The word order is wrong.

gpt-image-2 doesn't weight your prompt evenly — words near the front carry way more weight. In the early decoding steps, the model is mostly "listening" to the first few dozen words to lock in the subject features; everything after that just nudges the result. So for the same prompt, putting "photorealistic" at word 5 vs word 50 changes the output more than you'd guess.

This chapter nails one thing down: the first 50 words decide at least 50% of an image's quality. The rest of the chapter is three dictionaries — Lens, Lighting, Mood — telling you exactly which words to stuff into those first 50.


1. The Front-50-word Rule

We ran the same prompt through gpt-image-2 80+ times (only changing word order). Weights came out roughly like this:

PositionWeight on final imageWhat goes here
Words 1-15~50%Style + subject + lens (the three big ones)
Words 16-50~30%Lighting + scene + composition details
Word 51+~20%Decorations, secondary elements, constraint clauses

Easy way to remember it: First 15 words are the bones. 16-50 are the meat. After 51 it's just decoration. Get the skeleton wrong and no amount of meat makes it stand up straight.


2. A/B Real-world Comparison

Here's a real prompt pair side by side.

Weak example (keywords dumped at the end)

A scene with various elements that include a young woman who is sitting
at a wooden table near a window in a café drinking coffee, photorealistic style.

Count it: the subject "young woman" sits around word 14, "photorealistic" gets the very last slot. Result — café looks fine, but the face leans illustration-y, the table feels plasticky, "photorealistic" barely registers.

Strong example (keywords up front)

Photorealistic portrait of a young Asian woman drinking coffee in a sunlit
Sydney café. 50mm lens, golden hour light, editorial style.

Style + subject + lens all sit inside the first 12 words. Result: skin pore detail, natural skin texture, reflections on the coffee cup, real depth of field — all there. Same model, hit rate roughly 3x higher. The model didn't get smarter — it just "heard" your instruction.


3. Lens Dictionary

Lens words control composition and depth of field. You need at least one lens word inside the first 50, otherwise the model defaults to "medium half-body shot, fixed 50mm, around f/4" — that bland stock-camera look.

WordEffectWhen to use it
wide shotBig scene, small subjectCity streets, product environment, KV poster backgrounds
medium shotHalf-body, subject clearBalanced subject + environment, the safest fallback
close-upHead / detail close-upPortrait emotion, product texture, food shots
extreme close-upEyes / lips / textureBeauty ads, dramatic emotion
overhead shotTop-downFood flatlay, desk workflow shots
low angleLooking upArchitecture, hero shots, sportswear
dutch angleTilted frameTension, suspense, street photography
35mm lensSlightly wide, environment-heavyStreet, documentary, vlog vibe
50mm lensClose to human eyeGeneral portrait, lifestyle
85mm lensTelephoto compressionMagazine portraits, blurred backgrounds
f/1.8 shallow DOFShallow depth, creamy blurFood / portrait / product to pop the subject
f/8 deep focusDeep focus, front and back sharpLandscape, architecture, group shots

4. Lighting Dictionary

Lighting decides an image's mood and quality tier. Same subject — studio softbox makes it look commercial, golden hour makes it cozy, harsh midday sun makes it tropical-street. Subject didn't change, but how expensive it looks completely flips.

WordEffectWhen to use it
golden hourWarm light, 30 min before sunsetPortrait, lifestyle, café, travel
blue hourBlue tones after sunsetCity night, lonely mood, cinematic
harsh midday sunStrong high-contrastStreet, tropical, sportswear
night neonNeon + nightscapeCyberpunk, nightclub, Y2K
studio softboxCommercial soft lightE-commerce hero, product, ID portrait
cinematic lightingStrong light/shadow contrastCourse covers, movie posters, Bootcamp KV
overcast diffusedCloudy soft lightNordic, minimal, documentary
dramatic chiaroscuroExtreme light/shadowLiterary magazine, character close-up
rim lightOutline glowSubject edge glow, silhouette, product edge
key lightMain lightPortrait main illumination, pair with fill light for layers
back lightBacklitHair fiber detail, silhouette, atmosphere
top downTop lightFood flatlay, crime/thriller mood

5. Mood / Texture Dictionary

Lens covers "how it's shot," lighting covers "what time and what environment," mood words cover "what does it feel like after looking at it." Skipping this column makes a huge difference — leave it out, the model defaults to "neutral journalism photo." Put it in, and the whole image's tone snaps into place.

WordEffectWhen to use it
moody / dramaticDark, high contrast, heavyCourse covers, serious topics, character depth
dreamy / etherealMist, light, otherworldlyBeauty, female-focused, art, fragrance
editorial / fashionFashion magazine vibeBrand campaigns, KV posters, portrait
candid / documentarySpontaneous, realXiaohongshu real-person feel, brand documentary
cozy / intimateWarm, closeHome, café, mom & baby, lifestyle
cinematic / filmicFilm-frame feelCourse covers, event KV, promo shorts
vibrant popHigh saturation, youngDouyin, Y2K, trendy toys, sports
melancholicBrooding, lonelyLiterary, indie music, mood shorts
y2k aestheticMillennium feelRetro digital, youth culture
vaporwavePurple-pink neon, retro cyberDesign covers, music visuals
retro 80s / vintage 70sEra textureRetro marketing, homage, streetwear
modern minimalMinimal modernB2B, design-led, paid knowledge

6. Write Prompts Backwards

Beginners write prompts forwards: describe the scene, add details, then style at the end. That's exactly why the weak example in §2 crashes.

The pro flow runs in reverse — first decide "what vibe does this image need," then work backwards to keyword order:

  1. Lock in three things first: who's the subject / style tone / lighting time
  2. Cram those three into the first 30 words — that's the skeleton
  3. Then write composition + lens specs (35-50 word range)
  4. Decorations + constraints last (after word 50)

It's like home renovation: pick the style first (Nordic / industrial / Japanese), then the hard finishes (floor, walls), and only after that the soft furnishings (cushions, frames). People who pick cushions first always end up redoing it.


7. Crash Reports

Crash 1: Style word at the end

Tacking "…photorealistic style." onto the end of the prompt — output still leans illustration. The model already locked in the look from the first 50 words; one trailing style word doesn't carry enough weight to flip it.

Fix: move the style word to the very start of the first sentence. Photorealistic editorial portrait of… is the standard opening for almost every realistic shot.

Crash 2: Vague words

Writing good camera, nice lighting, high quality — the model has no idea what you mean and just falls back to default (50mm + natural light + generic commercial standard).

Fix: be specific. 50mm f/1.8 beats good camera 10x, golden hour rim light lands way harder than nice lighting. Vague words might as well not be there.

Crash 3: Stacking 5+ style words that fight each other

photorealistic, cinematic, editorial, fashion, dramatic, moody, vintage, modern minimal — the model gets pulled in every direction, takes a bit of each, ends up with nothing.

Fix: max 2-3 style words, and they have to be compatible. Photorealistic editorial works; Photorealistic anime is contradictory; Modern minimal vintage 70s is a logical clash. Before writing, ask yourself: would a human photographer understand these two words together?

Crash 4: Copying someone's "magic prompt" gets a totally different result

You see "this prompt nails it every time" on social media — you paste it and it doesn't look right. Usually it's not a model version issue, it's that you swapped the subject word, which broke the semantic structure of the first 50 words.

Fix: when copying a prompt, only swap the "subject noun." Keep the sentence structure, style words, lens words, and lighting words of the first 50 words exactly as they were. The more you change, the further it drifts.


8. Our Default Template

Over the last six months the JR Academy team produced 800+ commercial images (event KVs / course covers / Xiaohongshu posts). We ended up with a "front-50-word default template" that opens every single image:

Editorial photorealistic portrait of [subject],
[scene + lighting],
[composition + lens specs].

A real example:

Editorial photorealistic portrait of a young Chinese AI engineer
debugging code in a Sydney coworking space at golden hour,
medium shot, 50mm lens, f/1.8 shallow DOF, cinematic warm grade.

48 words. All three big things (subject / style / lighting / lens) packed into the first 30. After that we add text layer, constraints, brand color hex. The structure was reliable enough that we eventually turned it into a Notion template — operations people just swap the subject description and batch out images.


Next

Once the front-50-word skeleton is up, the next boss is text rendering — the move where gpt-image-2 actually buries Midjourney. The 99% accuracy on Chinese title text isn't free; there's a separate set of hard rules behind it: double quotes, role hint, position words, complex stroke handling. Ch 06 spells it all out.

If you want to drill word order right now:

  1. Take the default template from §8
  2. Swap subject + scene + lighting (all must fit inside the first 30 words)
  3. Pick one word each from the Lens / Lighting / Mood dictionaries, drop them into the first 50
  4. When it crashes, go back to §7 and check against the four crash patterns

One thing to remember: The model doesn't read your mind. It just counts your words.


📷 Real-world Front-50-word Case Study

From awesome-gpt-image (CC BY 4.0). Watch how a real "front-50-word default template" gets written.

Case: Korean editorial portrait (style + subject + lighting all locked in the first 50 words)

Korean Editorial Portrait with Soft Mist

Prompt (front-50-word section):

9:16 vertical - editorial portrait, single subject soft black mist filter,
subtle haze, gentle highlight bloom, muted tones minimal indoor space,
clean background, slight texture young Korean woman, minimal makeup,
natural skin texture

Look at the content density of those first 50 words:

  • Position 1-5: 9:16 vertical - editorial portrait — ratio + style
  • Position 6-15: single subject soft black mist filter, subtle haze, gentle highlight bloom — filter + light feel
  • Position 16-25: muted tones minimal indoor space, clean background — palette + environment
  • Position 26-50: young Korean woman, minimal makeup, natural skin texture — subject + makeup texture

The whole tonal "skeleton" of the image is built inside the first 50 words. The 200+ words after it just fill in outfit / pose / hair as "decoration" — once the skeleton is solid, the rest is icing on the cake, not life support.

📷 Creator: @BubbleBrain · Featured in: awesome-gpt-image

❓ 常见问题

关于本章主题最常被搜索的问题,点击展开答案

gpt-image-2 前 50 词法则是什么?

gpt-image-2 对 prompt 前 50 词权重最高(约 50%)。第 1-15 词是"骨"(风格 + 主体 + 镜头),16-50 词是"肉"(光线 + 场景),51 词后是"装饰品"(约 20% 权重)。

gpt-image-2 镜头词典有哪些?

常用:wide shot(大场景)/ medium shot(半身,最稳)/ close-up(特写)/ overhead shot(顶视)/ low angle(仰拍)/ 35mm / 50mm / 85mm lens / f/1.8 浅景深 / f/8 深景深。

风格词放最后会怎样?

权重低于前置词,出图风格漂移。比如 prompt 末尾跟 "...photorealistic style." → 出图依然偏插画。把风格词挪到第一句开头("Photorealistic editorial portrait of...")就稳。