自学编程遇到瓶颈怎么办？

遇到瓶颈是正常的。建议：1. 动手做项目 (Project-based Learning)，不要只看视频；2. 善用 AI 助手 (如 Cursor, ChatGPT) 解释代码和逻辑；3. 加入全球技术社区 (如 Discord, GitHub) 与他人交流；4. 拆解大问题为小模块逐个击破。

如何构建一个具备全球竞争力的开发者作品集 (Portfolio)？

优秀的 Portfolio 不在多而在精。包含 2-3 个完整的、已上线的项目 (Live Demo) 最佳。每个项目应包含：GitHub 源码链接、在线演示地址、以及一份中英文 Readme 文档说明解决了什么问题、使用了什么技术栈。

Multimodal Prompt Design

Advanced prompting with combined text, image, and video inputs

Source: Google Cloud "Prompt Design in Vertex AI" Course Model Focus: Gemini 1.5 Series Estimated Time: 20 mins

What Is Multimodal?

Traditional AI models can only read text. Multimodal AI (like Google's Gemini) can understand and process multiple data types simultaneously: text, images, audio, video, and even code.

Multimodal Input

Why Use Multimodal Prompts?

Some information is nearly impossible to describe in words but instantly conveyed through an image or video. Multimodal design dramatically boosts AI's ability to handle complex tasks:

Image to JSON: Snap a photo of an invoice, have AI extract structured JSON data directly.
Video Analysis: Upload a surveillance clip and ask: "At what point does a blue truck appear in this video?"
UI Debugging: Screenshot a broken UI and ask: "My frontend layout is misaligned — check the CSS for me."

Best Practices for Multimodal Prompt Design

A good multimodal prompt is like writing a high-quality product spec.

1. Specific Instructions

Don't just say "analyze this image." Say "extract all product names and prices from this image and format them as a table."

2. Contextual Padding

Tell the AI when the image was taken, or what the background context is.

"This photo is from our company's annual gala. Please identify the executives in the image and generate a brief intro for each."

3. Task Decomposition

For complex images or videos, ask in steps:

Describe the overall environment in the image.
Locate the key objects.
Perform specific logical reasoning (e.g., count quantities).

Advanced Technique: Image Focus

Guide the AI's attention through your prompt.

"Pay special attention to the fine print in the top-left corner of the image — that's our product key. Transcribe it for me."

Real-World Use Cases

Scenario	Multimodal Input	Expected Output
Retail	Product photo + "describe its style"	Compelling e-commerce copy
Logistics	Warehouse screenshot + "count the boxes"	Automated inventory count
Legal	Scanned PDF	Key clause summary + risk flags

Challenges and Limitations

Gemini is powerful, but multimodal still has its gotchas:

Token consumption: Images and video eat up a massive chunk of the context window.
Resolution sensitivity: Small text in low-resolution images might not be recognized.

Conclusion: Mastering multimodal prompt design means you're not just making AI "listen" — you're making it "see the world." That's a critical step toward becoming an AI architect.

Prompt 大师