自学编程遇到瓶颈怎么办？

遇到瓶颈是正常的。建议：1. 动手做项目 (Project-based Learning)，不要只看视频；2. 善用 AI 助手 (如 Cursor, ChatGPT) 解释代码和逻辑；3. 加入全球技术社区 (如 Discord, GitHub) 与他人交流；4. 拆解大问题为小模块逐个击破。

如何构建一个具备全球竞争力的开发者作品集 (Portfolio)？

优秀的 Portfolio 不在多而在精。包含 2-3 个完整的、已上线的项目 (Live Demo) 最佳。每个项目应包含：GitHub 源码链接、在线演示地址、以及一份中英文 Readme 文档说明解决了什么问题、使用了什么技术栈。

Evaluate Outputs (Teacher)

Use an LLM to compare and evaluate outputs

TL;DR

This is a classic LLM-as-a-Judge use case: have a judge model compare two outputs and give feedback like a teacher grading papers.
Great for evaluation: A/B testing prompts, comparing different models, or comparing different settings of the same model.
The risk is judge bias and instability. Fix the rubric, require evidence citations (quote from the outputs), and do multi-round / multi-judge cross-validation.

Background

This prompt tests an LLM's ability to evaluate and compare outputs from two different models (or two different prompts), as if it was a teacher.

One workflow:

Ask two models to write the dialogue with the same prompt
Ask a judge model to compare the two outputs

Example generation prompt (for the two models):

Plato's Gorgias is a critique of rhetoric and sophistic oratory, where he makes the point that not only is it not a proper form of art, but the use of rhetoric and oratory can often be harmful and malicious. Can you write a dialogue by Plato where instead he criticizes the use of autoregressive language models?

How to Apply

You can break this evaluation workflow into three steps:

Generate: Use the same generation prompt to produce two outputs (different models or different prompt versions)
Judge: Use a judge prompt to compare and evaluate
Decide: Pick the better version based on the rubric, or feed the feedback into the next iteration round

In production, make the rubric more specific, for example:

coherence (logic and structure)
faithfulness (does it stay on topic)
style adherence (does it match Plato dialogue style)
clarity (readability and expression)

How to Iterate

Have the judge output structured results: Winner + Scores + Evidence + Actionable feedback
Fix comparison dimensions and scoring ranges (e.g., 1-5) to reduce arbitrariness
Add a "tie / unsure" option to avoid forced picks
Use multiple judge prompts or multiple judge models for consistency checks (majority vote)

Self-check Rubric

Does the judge cite specific passages from the outputs (evidence)?
Do the scores correspond to rubric dimensions, not vague generalizations?
Can it provide actionable improvement suggestions (how to change the prompt next round)?
Are results stable across multiple runs (temperature control + multi-round consistency)?

Practice

Exercise: do an A/B prompt test on a writing task you commonly use:

prompt A: shorter, more open-ended
prompt B: longer, with constraints and structured output

Then use a judge prompt to compare them and produce an "improvement checklist" for the next prompt iteration round.

Prompt (evaluation)

Can you compare the two outputs below as if you were a teacher?

Output from ChatGPT: {output 1}

Output from GPT-4: {output 2}

Code / API

OpenAI (Python)

from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4",
    messages=[
        {
            "role": "user",
            "content": "Can you compare the two outputs below as if you were a teacher?\n\nOutput from ChatGPT:\n{output 1}\n\nOutput from GPT-4:\n{output 2}",
        }
    ],
    temperature=1,
    max_tokens=1500,
    top_p=1,
    frequency_penalty=0,
    presence_penalty=0,
)

Fireworks (Python)

import fireworks.client

fireworks.client.api_key = "<FIREWORKS_API_KEY>"

completion = fireworks.client.ChatCompletion.create(
    model="accounts/fireworks/models/mixtral-8x7b-instruct",
    messages=[
        {
            "role": "user",
            "content": "Can you compare the two outputs below as if you were a teacher?\n\nOutput from ChatGPT:\n{output 1}\n\nOutput from GPT-4:\n{output 2}",
        }
    ],
    stop=["<|im_start|>", "<|im_end|>", "<|endoftext|>"],
    stream=True,
    n=1,
    top_p=1,
    top_k=40,
    presence_penalty=0,
    frequency_penalty=0,
    prompt_truncate_len=1024,
    context_length_exceeded_behavior="truncate",
    temperature=0.9,
    max_tokens=4000,
)

Reference

Sparks of Artificial General Intelligence: Early experiments with GPT-4 (13 April 2023)

Prompt 大师