Vision with OpenAI API

Vision 能力最容易被高估，也最容易真正落地。高估，是因为很多人把它想成“零错误 OCR”；能落地，是因为截图分析、票据读取、表单初筛、图文问答这类任务，本来就非常适合让模型先看第一遍。

更务实的定位

Vision 更适合当“第一道智能筛选”，不适合当“最后一道财务确认”。

适合的情况通常有：

本来就需要人工看图再录入
图片里同时有结构和语义
允许模型先做初筛，再由规则或人工复核

Responses API 里的基本写法

from openai import OpenAI

client = OpenAI()

response = client.responses.create(
    model="gpt-5",
    input=[
        {
            "role": "user",
            "content": [
                {"type": "input_text", "text": "What is in this image?"},
                {
                    "type": "input_image",
                    "image_url": "https://example.com/image.jpg"
                }
            ]
        }
    ]
)

print(response.output_text)

图片输入可以怎么传

按官方 vision guide，你可以通过：

image URL
Base64-encoded data URL
file ID

把图片传给模型。

`detail` 参数什么时候值得用

input_image 支持 detail: "low" | "high" | "auto"。

low：更省 token，更适合大致判断内容
high：更适合细节识别
auto：让模型自己决定

如果你只是做大致分类或描述，low 往往更省成本；只有在确实需要细节时，再上 high。

常见误区

把 Vision 当零错误 OCR
图片越大越好
只看一次输出，不做规则或人工复核

AI Engineer

OpenAI API Guide

Build with the OpenAI API using Responses API, streaming, tools, embeddings, and multimodal inputs.

Official Documentation↗API Reference↗

Vision

Vision with OpenAI API

Vision 能力最容易被高估，也最容易真正落地。高估，是因为很多人把它想成“零错误 OCR”；能落地，是因为截图分析、票据读取、表单初筛、图文问答这类任务，本来就非常适合让模型先看第一遍。

#更务实的定位

Vision 更适合当“第一道智能筛选”，不适合当“最后一道财务确认”。

适合的情况通常有：

本来就需要人工看图再录入
图片里同时有结构和语义
允许模型先做初筛，再由规则或人工复核

#Responses API 里的基本写法

python
from openai import OpenAI

client = OpenAI()

response = client.responses.create(
    model="gpt-5",
    input=[
        {
            "role": "user",
            "content": [
                {"type": "input_text", "text": "What is in this image?"},
                {
                    "type": "input_image",
                    "image_url": "https://example.com/image.jpg"
                }
            ]
        }
    ]
)

print(response.output_text)

#图片输入可以怎么传

按官方 vision guide，你可以通过：

image URL
Base64-encoded data URL
file ID

把图片传给模型。

#`detail` 参数什么时候值得用

input_image 支持 detail: "low" | "high" | "auto"。

low：更省 token，更适合大致判断内容
high：更适合细节识别
auto：让模型自己决定

如果你只是做大致分类或描述，low 往往更省成本；只有在确实需要细节时，再上 high。

#常见误区

把 Vision 当零错误 OCR
图片越大越好
只看一次输出，不做规则或人工复核

System Design

Core system design concepts and practical case studies

Learn the trade-offs and patterns that matter in technical interviews.

Open System Design →

Related Guides

Claude API 开发指南→

LangChain 框架指南

LangChain 框架指南→

Related Roadmaps

Follow Us

We Accept

Company

About Us Metaverse Classroom News & Blog JR Careers Become a Mentor Our Mentors Contact Us JR Store J3.Club

Resources

Job Referrals Events 1-on-1 Tutoring Industry Whitepapers Online Learning Interview Center Share Interview Experience Internship Membership

AI Tools

AI Toolbox Cert Master Job Hunter UniMate AI

AI Learning Paths

All Learning Paths AI Engineer Context Engineering Vibe Coding Prompt Master AI Builder AI Product Manager Python Basics

AI in Practice

AI Productivity AI Data Analysis AI Finance AI Content Creation AI Image Creation Frontend Development Hermes Agent OpenClaw Local Agent

University Resources

University of Melbourne University of Queensland UNSW Sydney University of Sydney Monash University University of Adelaide RMIT QUT UTS

Kids AI Education

Airbotix — AI Coding for Kids AU Family Resource Hub NAPLAN Report Guide My School Data Guide Sydney Private School Fees 2026 Kids Coding Programs

Immigration Services

Australia Immigration Skilled Visa 189/190/491 Employer Sponsored 482/186/494 Business Visa 188/888 UK Immigration US Immigration Canada Immigration

Enterprise

P3 Career Incubator Enterprise (EN)Corporate Training Internship Partnership Recruitment Partnership Apply for Partnership

Job Application Agent

Job Application Service Job Monitoring LinkedIn Management LinkedIn Networking Learn about P3

Support

FAQs Terms & Conditions Privacy Policy Cancellation & Refund Policy Site map

Top Categories

Web Full-Stack Bootcamp DevOps Bootcamp Data Engineering Bootcamp Data Analysis Bootcamp Coding for Beginners Business Analyst Internship Algorithm Bootcamp

Career Services

BA & PM Internship Data Science Internship Data Analysis Internship Marketing Internship Resume Review Interview Coaching VIP Mentor Guidance

Addresses

Level 10b, 144 Edward Street, Brisbane CBD(Headquarter)

Level 2, 171 La Trobe St, Melbourne VIC 3000

45A13, Block B, Oriental Hope Tianxiang Plaza, 500 Tianfu Avenue Middle Section, Wuhou District, Chengdu, Sichuan, China

Business Hub, 155 Waymouth St, Adelaide SA 5000

Contact

hello@jiangren.com.au 0421-672-555

Disclaimer

JR Academy acknowledges Traditional Owners of Country throughout Australia and recognises the continuing connection to lands, waters and communities. We pay our respect to Aboriginal and Torres Strait Islander cultures; and to Elders past and present. Aboriginal and Torres Strait Islander peoples should be aware that this website may contain images or names of people who have since passed away.

All content on the JR Academy website, including course materials, logos, and information provided, is protected under Australian intellectual property laws. Unauthorized use, sale, distribution, reproduction, or modification is strictly prohibited. Violations may result in legal action. By accessing our website, you agree to respect our intellectual property. JR Academy Pty Ltd reserves all rights, including patents, trademarks, and copyrights. Any infringement will be subject to legal prosecution. View Terms of Service

© 2017-2026 JR Academy Pty Ltd. All rights reserved.

ABN 26621887572