Text rendering — gpt-image-2's killer feature

⏱️ 20 min

The hottest designer rant of the last two years: making a Chinese poster, 30% of the work goes into the text layer. Midjourney spits out a base image in 30 seconds, then you import it into Photoshop, hunt for fonts, tune kerning, change colors, add shadows, and the client comes back with "make it bigger" — there goes your afternoon.

Why so painful? Because in the MJ / Flux era, AI couldn't render Chinese characters. What came out looked like Chinese-shaped scribbles, and you had to layer real text on top in PS by hand.

gpt-image-2 killed that workflow. 99% character-level accuracy, covering Latin / CJK / Hindi / Bengali — four major writing systems. This is the biggest differentiator vs Midjourney, and it's the first capability the OpenAI Cookbook hammers on.

But to actually pin that 99% down, gluing text into your prompt isn't enough. You have to follow four iron rules — skip any one and accuracy drops to about 75%. This chapter breaks down all four, gives right-vs-wrong comparisons, and ends with a complete mixed Chinese/English layout prompt template.

Rule 1: Literal text must go in double quotes

❌ Headline says 30 days to learn ChatGPT
✅ Headline (top, bold): "30 天学会 ChatGPT"

In the first version, the model treats "30 days to learn ChatGPT" as a description — it'll "decide for itself" how to render that line, maybe translate it to Chinese, maybe rephrase it, maybe drop it entirely.

Double quotes are a hard instruction to the model: "Render this exactly as written. No translating, no warping, no extras." The OpenAI Cookbook spends a lot of ink on this in the text rendering section, and community testing agrees: wrap literal text in double quotes and accuracy jumps from 60% to 90%+.

This matters even more for Chinese titles. Without quotes, Headline says 30 天学会 ChatGPT has a 30% chance of coming out as "30天学会chatgpt" lowercase, or with English mixed in.

Rule 2: Use role hints to control size and hierarchy

No role hint = model improvises = font sizes go everywhere.

The point of a role hint is to tell the model what role this text plays in the layout. The model then uses standard print-design intuition to pick size, weight, position, and alignment.

Role Hint	Use case	Typical font size
`headline`	Main title, biggest type	1/8 to 1/3 of image height
`subhead`	Subtitle, second-biggest	50-60% of headline
`body` / `caption`	Body text / explanation	Medium
`footer`	Bottom small text / date / copyright	Smallest
`stat card`	Data block	Big number, small label
`sidebar item`	Sidebar entry	List style

❌ Add a big title and a small subtitle
✅ Headline (top center): "AI 训练营"
✅ Subhead (below headline): "30 天交付一个 AI 应用"
✅ Footer (bottom): "2026.05.20 开班"

The first version makes the model guess — "big" how big? Headline or display? Spell it out and the hierarchy lands on the first try.

Rule 3: Spell out position + color + font style

❌ Title at the top
✅ Headline (top center, large bold, white with subtle shadow)

This rule has the highest failure rate. "Title at the top" — top where? Top-left, top-center, top-right? Big or small? What color? — the model gives you 8 different positions across 8 images.

The right way is 5 things complete: position + size + weight + color + shadow/outline.

Full example:

Headline (top center, large bold, white with thin black outline)
Subhead (directly below headline, centered, medium gray)
Footer (bottom right, small dark gray)

Use hex codes (#FF5757) over adjectives (bright red) for color — covered in detail in the poster chapter, same applies to text layers.

Rule 4: Add constraint phrases to block extra text

Four lines, mandatory at the end of your prompt:

Exact text only.
No extra words.
No duplicate text.
No background watermarks.

Why mandatory? The model has seen too many text-heavy training images — posters, ads, movie subtitles, copyright watermarks — so it has a habit of "decoratively adding text". You ask for one headline, it sneakily drops a blurred English tagline in the corner that looks like a watermark.

Adding these four lines blocks 90% of those misfires. Our JR team didn't include them the first month, and we were deleting extra text every day. Once we added them, that whole class of problem basically disappeared.

An analogy: double quotes tell the model "say this line", role hints tell the model "play this character", position and color tell it "where to stand and what to wear", constraint phrases tell it "don't ad-lib". All four together — that's when the model knows what you actually want.

Chinese-specific notes (important)

Chinese rendering is harder than English. The reason: Chinese character structures are complex, with high stroke density, so the model has to allocate more "detail budget" to each glyph. Memorize these five points:

1. Use the high quality tier — clean Chinese strokes need more detail budget. Medium tier handles "AI 训练营" fine, but for stroke-dense characters like "匠人学院" or "鬱", medium occasionally breaks strokes or misaligns them. Don't even try low tier with Chinese. A high-tier image is $0.211 (about ¥1.5) — for a real poster, don't skimp on this.

2. Don't "translate" — paste the Chinese glyphs directly. Don't write Chinese title that says 30 days to learn ChatGPT. That makes the model translate first, then render — extra step, double the failure rate. Just write Headline: "30 天学会 ChatGPT".

3. Font hint vocabulary works — gpt-image-2 understands Chinese font terms: 楷体 / 宋体 / 黑体 / 行书 / kai font / serif Chinese are all recognized. Write Headline in 宋体 bold and you actually get 宋体 character, not the default 黑体.

4. Avoid country-level style keywords — this is the sneakiest trap. Japanese aesthetic sounds safe, but the model reads it as "include Japanese elements" and slips kana like "だ" / "ろ" / "ん" into your Chinese title. Switch to Chinese minimalist or editorial Asian aesthetic, or just describe elements (lighting, palette) and skip country labels entirely.

5. Be careful with complex characters — overly stroke-dense characters ("嬴" / "龘" / "鬱" / "灪") drop to 80% accuracy. If a simpler synonym works, use it. If you must use the complex one: high tier + zoom 200% after generation and verify each character.

Mixed Chinese/English layout example (full prompt)

Poster, 16:9 banner.

Top bold headline (Chinese, large): "AI 视觉创作"
Sub-line (English, smaller): "30-Day GPT-Image-2 Cover Mastery"
Footer date: "2026.05.01"

Style: minimal flat design, dark navy + warm yellow.

Exact text only.
No extra words.
No duplicate text.
No background watermarks.

One generation, three text layers — all positioned right, no bleed-through, nothing missing. Chinese on high tier basically nails it first try, English layer hits 99%+ every time.

Want extra insurance? Add positions after each role hint: (top center) / (below headline, centered) / (bottom right) — that completes Rule 3.

Recovery workflow when things go wrong

Even with careful prompts, you'll occasionally get a bad output. Three recovery methods, ordered cheapest-first:

Case A: One Chinese character has wrong strokes — don't regenerate the whole image. In ChatGPT, tell it: "keep everything the same, only fix the headline character X to Y". Multi-turn editing preserves other elements and only touches the specified character. Doesn't work first time? Try again. Three rounds usually fix it.

Case B: Same text renders inconsistently across multiple images — this is a built-in issue with reference mode. Images 1-3 anchor the style fine, then 4-5 the text drifts. The fix: re-paste the full prompt every 2-3 images. Don't lean on "same style as previous" alone. Same principle covered in the Xiaohongshu carousel chapter.

Case C: Can't recover after multiple tries — don't dig in. Use Photopea / Figma to type text on top of the base image: when generating, prompt with no text, and add the text layer in a real design tool. This is the ultimate fallback, especially good for stroke-dense Chinese + print posters. We did one A3 print poster with "嬴" in the title — gpt-image-2 missed the strokes 12 times. Switching to "AI generates base + Figma adds text" took 5 minutes.

A/B comparison from the field

Same brief: "Make an AI Bootcamp recruitment poster, headline '训练营', subtitle '30 天交付一个 AI 应用'."

Weak version (breaks Rules 1 + 3):

Make a poster about AI bootcamp with a Chinese title at the top
and a subtitle below it. Modern style, blue and red.

8 generations: 4 had text in random positions, 3 had wrong Chinese strokes, 1 had no headline at all. Usable: 0/8.

Strong version (all four rules satisfied):

Wide horizontal banner, 16:9, 1920×1080.

Background: a young diverse team in a modern office,
laptops open, brainstorming session, warm afternoon light.

Headline (top center, huge bold, red #FF5757 with white outline):
"AI 训练营"
Subhead (directly below headline, medium white):
"30 天交付一个 AI 应用"
Footer (bottom center, small dark gray):
"2026.05.20 开班 · 线上直播"

Style: editorial corporate photography, slight film grain.

Exact text only. No extra words. No duplicate text. No watermarks.

8 generations: 7 had text completely correct, 1 had subtitle position slightly off (one more line centered horizontally and a regen fixed it). Usable: 7/8.

That's the gap — same model, same user, same topic. Different prompt structure, usable rate goes from 0% to 87.5%.

Real failures (4 actual traps)

Failure 1: Japanese aesthetic leaked katakana — wrote Japanese aesthetic, minimal layout for a Chinese AI tools poster. The headline "AI 工具集" came back with "だ" / "ろ" / "ん" mixed in. Reason covered earlier: country-level style keywords pull in that country's writing system. Switching to editorial Asian aesthetic or just minimal layout fixed it.

Failure 2: Subtitle position word too vague — wrote Subhead below the headline. Out of 8 images, 3 had the subtitle drift to the bottom-right, because "below" in layout terms is ambiguous — it can mean "anywhere underneath". Switching to Subhead (directly below headline, centered horizontally, 30px gap) locked it down. The more specific the position word, the more accurate.

Failure 3: Medium tier on complex Chinese — tried medium tier on "匠人学院" to save money. The bottom-right strokes of "匠" came out wrong, and the left-side ear radical of "院" had an extra stroke. Switched to high tier — all four characters correct. This is exactly why Chinese + important context = always high tier.

Failure 4: Dropped one constraint line — to keep the prompt short, cut No duplicate text. Result: the headline area showed two identical "AI 训练营" — one in the proper spot, the other blurred in the background like a watermark. Don't drop any of the four constraint lines.

One-line memory aid

Double quotes + role hints + position/color + constraints

All four done = 99% accuracy. Skip any one and it drops to 75%.

When you finish writing a prompt, audit it against these four. Whichever's missing, add it. You'll rarely have a misfire.

Real experience at JR Academy

In 4 weeks we made 200+ images with Chinese text — posters, Xiaohongshu covers, course banners, in-article visuals. Chinese text accuracy curve:

Week	Chinese text accuracy	Main reason
Week 1	78%	Pure intuition prompts, often missed Rules 3 / 4
Week 2	86%	Team built a "four iron rules checklist", referenced before each image
Week 3	92%	Added hex colors + made position words specific
Week 4	96%	Complex Chinese always on high tier + 200% zoom verification

That's an 18-point jump, and it wasn't because the model got smarter — gpt-image-2 didn't update during those 4 weeks. It was prompt practice converging on the four iron rules. In other words, going from 78% to 96% is a prompt-craft dividend, not a model dividend.

Translation: you don't have to wait for a model upgrade. Burn the four iron rules into your prompting habit today, and tomorrow's images come out 15-20% more accurate than today's.

Next up

Got text rendering down? The next thing to drill is posters. Ch 07 applies these four text rules to three poster types: event KV / e-commerce hero / course covers. Each comes with full prompt templates and recovery workflows. Read it through and you can ship a production-ready Chinese poster in 6 minutes.

If you want to drill text rendering first:

Take the full prompt from this chapter's mixed Chinese/English example
Swap the headline + subtitle + style block
Generate 8 images, pick 1
If the text is wrong, run through "4 real failures" line by line
Once muscle memory kicks in, Rules 1-4 flow as naturally as breathing

📷 Real text rendering case studies

The 3 example sets below come from awesome-gpt-image (CC BY 4.0). Each is a "boundary test" for text rendering — proof the iron rules really do hold 99% accuracy.

Case 1: Tiny text on a single grain of rice (extreme test)

Rice Grain Micro Typography

Prompt:

A massive pile of rice, and on one single grain of rice there is tiny text that reads "wOw"

This is the limit test for text rendering — writing "wOw" on a single grain of rice. Not only did the model nail it, the mixed-case W-o-W came out exact. Pixel-level text precision like this — previous-generation models never got close. "wOw" uses Rule 1 (double quotes around literal text); the prompt has no role hint or position word because the scene is too tiny — a textbook case of simple prompt + extreme reasoning.

📷 Creator: @adonis_singh · Curated by: awesome-gpt-image

Case 2: Chinese calligraphy practice sheets (4 styles)

王羲之体	草书	行书	楷书

Prompt:

Generate a calligraphy copybook practice sheet in [script style]

Swap [script style] for a specific script (王羲之体 / 草书 / 行书 / 楷书) and you get four distinct calligraphy styles. The Chinese stroke structure + composition + character traits all match — fine-grained typographic language the model has clearly "absorbed". This is a live demo of the Chinese-specific font vocabulary in §3.

📷 Creator: @MrLarus · Curated by: awesome-gpt-image

Case 3: Text-heavy Chinese layout stress test

校园周报	餐厅菜单	教科书页	老黄历

Prompt:

Generate an image of [scene / content]

Four text-dense Chinese layout scenes: weekly paper / menu / textbook page / almanac. For "the whole image is text" scenarios, you can hit 99% accuracy on usable output. For production use, pass in exact copy (menu items) or a reference image — results get even more stable.

📷 Creator: @MrLarus · Curated by: awesome-gpt-image

❓ 常见问题

关于本章主题最常被搜索的问题，点击展开答案

gpt-image-2 怎么写中文不出错？

四条铁律：① 字面文字必须用双引号 ② 用 role hint 控制层级（headline / subhead / footer）③ 显式位置 + 颜色 + 字体风格 ④ 末尾加 4 句约束（exact text only / no extra words / no duplicate / no watermarks）。

复杂中文字怎么稳？

用 high quality 档（$0.211/张）+ 出图后 200% 放大逐字核对 + 必要时 Photopea / Figma 后期覆盖打字层。复杂笔画字（"嬴" / "鬱" / "龘"）准确率掉到 80%，能用同义简单字代替就代替。

中文海报为什么不要写 Japanese aesthetic？

国家级风格词会让模型加该国文字元素。"Japanese aesthetic" 触发模型在中文标题里混入 "だ" / "ろ" / "ん" 等假名。改成 "Chinese minimalist" 或 "editorial Asian aesthetic" 就避开。