自学编程遇到瓶颈怎么办？

遇到瓶颈是正常的。建议：1. 动手做项目 (Project-based Learning)，不要只看视频；2. 善用 AI 助手 (如 Cursor, ChatGPT) 解释代码和逻辑；3. 加入全球技术社区 (如 Discord, GitHub) 与他人交流；4. 拆解大问题为小模块逐个击破。

如何构建一个具备全球竞争力的开发者作品集 (Portfolio)？

优秀的 Portfolio 不在多而在精。包含 2-3 个完整的、已上线的项目 (Live Demo) 最佳。每个项目应包含：GitHub 源码链接、在线演示地址、以及一份中英文 Readme 文档说明解决了什么问题、使用了什么技术栈。

Flan

FLAN overview

TL;DR

The core idea behind FLAN: reformulate a massive number of tasks into instruction-formatted data and perform large-scale instruction tuning, improving generalization across zero-shot / few-shot / Chain-of-Thought (CoT) settings.
If you're building LLM applications for real users: prefer "instruction-tuned / chat-tuned" checkpoints. They're generally better at following instructions and produce more stable outputs.
Including CoT data in instruction tuning often significantly boosts reasoning tasks, but use evaluation to verify you haven't introduced verbosity, format drift, or hallucination.
Multilingual and low-resource language improvements depend more on training data coverage and task diversity; on the prompt side, be explicit about language + output format.

How to Prompt

Write tasks as "instruction + constraints + output format," with a fallback path for missing info:

Instruction: You are a helpful assistant for <domain>.
Task: <what to do>
Constraints:
- Use the provided context only.
- If key information is missing, ask up to 3 clarifying questions.
Output format:
- Return JSON with fields: answer, assumptions, sources

Self-check Rubric

Is there clear separation between instruction, context, constraints, and output format?
When context is missing/conflicting, does the model ask questions or abstain instead of hallucinate?
Is the output format stable and regression-testable (same input, multiple runs, explainable differences)?
Are you using representative samples for evaluation (including edge cases and multilingual inputs)?

What's New

FLAN1

Image source: Scaling Instruction-Finetuned Language Models

This paper explores the benefits of scaling instruction tuning and how it improves performance across various models (PaLM, T5), prompt settings (zero-shot, few-shot, CoT), and benchmarks (MMLU, TyDiQA). The core variables: scaling the number of tasks (1.8K tasks), scaling model size, and incorporating Chain-of-Thought (CoT) data for joint fine-tuning (using 9 datasets).

Fine-tuning process:

1.8K tasks were formulated as instructions and used to fine-tune models
Used with/without examples (few-shot / zero-shot), with/without CoT

Fine-tuning tasks and held-out tasks:

FLAN11

Capabilities and Key Results

Instruction tuning scales well with both task count and model size -- suggesting further scaling of both is worthwhile
Adding CoT datasets to instruction tuning yields strong performance on reasoning tasks
Flan-PaLM shows improved multilingual capabilities: 14.9% improvement on one-shot TyDiQA; 8.1% improvement on arithmetic reasoning in underrepresented languages
Flan-PaLM also performs well on open-ended generation, a good indicator of improved usability
Improved performance on Responsible AI (RAI) benchmarks
Flan-T5 instruction-tuned models show strong few-shot capabilities and outperform public checkpoints like T5

Results from scaling fine-tuning task count and model size: scaling both model size and fine-tuning tasks is expected to continue improving performance, though returns from scaling task count are diminishing.

FLAN2

Image source: Scaling Instruction-Finetuned Language Models

Results from fine-tuning on non-CoT and CoT data: jointly fine-tuning on both non-CoT and CoT data improves performance on both evaluations, compared to fine-tuning on just one.

FLAN3

Image source: Scaling Instruction-Finetuned Language Models

Additionally, self-consistency combined with CoT achieves SoTA results on several benchmarks. CoT + self-consistency also significantly improves benchmarks involving math problems (e.g., MGSM, GSM8K).

FLAN4

Image source: Scaling Instruction-Finetuned Language Models

CoT fine-tuning unlocks zero-shot reasoning on BIG-Bench tasks via the phrase "let's think step by step." Generally, zero-shot CoT Flan-PaLM outperforms zero-shot CoT PaLM without fine-tuning.

FLAN6

Image source: Scaling Instruction-Finetuned Language Models

Here are some demonstrations of zero-shot CoT from PaLM and Flan-PaLM on unseen tasks.

FLAN5

Image source: Scaling Instruction-Finetuned Language Models

More zero-shot prompt examples below. They show how the PaLM model struggles in zero-shot settings with repetition and failure to follow instructions, while Flan-PaLM handles them well. Few-shot examples can mitigate these errors.

FLAN7

Image source: Scaling Instruction-Finetuned Language Models

Here are examples of the Flan-PaLM model demonstrating more zero-shot capabilities on several challenging open-ended questions:

FLAN8

Image source: Scaling Instruction-Finetuned Language Models

FLAN9

Image source: Scaling Instruction-Finetuned Language Models

FLAN10

Image source: Scaling Instruction-Finetuned Language Models

You can try Flan-T5 models on Hugging Face Hub.

Prompt 大师

Flan

TL;DR

How to Prompt

Self-check Rubric

What's New

Capabilities and Key Results

📚 相关资源