自学编程遇到瓶颈怎么办？

遇到瓶颈是正常的。建议：1. 动手做项目 (Project-based Learning)，不要只看视频；2. 善用 AI 助手 (如 Cursor, ChatGPT) 解释代码和逻辑；3. 加入全球技术社区 (如 Discord, GitHub) 与他人交流；4. 拆解大问题为小模块逐个击破。

如何构建一个具备全球竞争力的开发者作品集 (Portfolio)？

优秀的 Portfolio 不在多而在精。包含 2-3 个完整的、已上线的项目 (Live Demo) 最佳。每个项目应包含：GitHub 源码链接、在线演示地址、以及一份中英文 Readme 文档说明解决了什么问题、使用了什么技术栈。

Extract entities

extract structured info from text

TL;DR（中文）

Information Extraction 适合用 structured output 强约束：这里用 JSON array ["model_name"] 作为最小可验收输出。
关键风险：模型会过度抽取（把方法/任务/机构名当成 model name）或漏抽取。
迭代方向：增加 extraction rules（what counts as a model name）、增加 “return ["NA"]” 的触发条件、并做小规模 evaluation 集合回归。

Background

The following prompt tests an LLM's capabilities to perform an information extraction task: extracting model names from machine learning paper abstracts.

How to Apply（中文）

把这个模板迁移到业务里，思路是一样的：

输入：一段非结构化文本（abstract、客服对话、会议纪要、简历、合同）
输出：一个结构化字段集合（array / JSON object）

为了让 extraction 更稳定，建议：

约束输出为严格 JSON（只输出 array，不要加解释）
明确 “不确定就返回 ["NA"]”
准备一个小测试集（10-50 条）做 evaluation

How to Iterate（中文）

加规则：哪些算 model name（例如包含版本号、大小、family name），哪些不算（dataset/task/company）
增加 negative examples：给 1-2 个 “不要抽取这些词” 的例子
做 post-processing：对输出做去重、大小写规范、过滤常见误报
加 “evidence” 模式：除了 array，再返回每个抽取结果对应的原文 span（但要注意格式一致性）

Self-check rubric（中文）

输出是否是合法 JSON array？是否包含多余文本？
是否把非 model name 抽进来了（false positives）？
是否漏掉明显的 model name（false negatives）？
不确定时是否正确返回 ["NA"]？

Practice（中文）

练习：准备 10 段你熟悉领域的文本（可以不是 ML），把目标字段换成：

product names
competitor names
key requirements

为每段文本手工写一个 gold label，然后对比模型输出做误差分析。

Prompt

Your task is to extract model names from machine learning paper abstracts. Your response is an array of the model names in the format ["model_name"]. If you don't find model names in the abstract or you are not sure, return ["NA"]

Abstract: Large Language Models (LLMs), such as ChatGPT and GPT-4, have revolutionized natural language processing research and demonstrated potential in Artificial General Intelligence (AGI). However, the expensive training and deployment of LLMs present challenges to transparent and open academic research. To address these issues, this project open-sources the Chinese LLaMA and Alpaca…

Prompt template

Your task is to extract model names from machine learning paper abstracts. Your response is an array of the model names in the format ["model_name"]. If you don't find model names in the abstract or you are not sure, return ["NA"]

Abstract: {input}

Code / API

OpenAI (Python)

from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4",
    messages=[
        {
            "role": "user",
            "content": "Your task is to extract model names from machine learning paper abstracts. Your response is an array of the model names in the format [\\\"model_name\\\"]. If you don't find model names in the abstract or you are not sure, return [\\\"NA\\\"]\n\nAbstract: Large Language Models (LLMs), such as ChatGPT and GPT-4, have revolutionized natural language processing research and demonstrated potential in Artificial General Intelligence (AGI). However, the expensive training and deployment of LLMs present challenges to transparent and open academic research. To address these issues, this project open-sources the Chinese LLaMA and Alpaca…",
        }
    ],
    temperature=1,
    max_tokens=250,
    top_p=1,
    frequency_penalty=0,
    presence_penalty=0,
)

Fireworks (Python)

import fireworks.client

fireworks.client.api_key = "<FIREWORKS_API_KEY>"

completion = fireworks.client.ChatCompletion.create(
    model="accounts/fireworks/models/mixtral-8x7b-instruct",
    messages=[
        {
            "role": "user",
            "content": "Your task is to extract model names from machine learning paper abstracts. Your response is an array of the model names in the format [\\\"model_name\\\"]. If you don't find model names in the abstract or you are not sure, return [\\\"NA\\\"]\n\nAbstract: Large Language Models (LLMs), such as ChatGPT and GPT-4, have revolutionized natural language processing research and demonstrated potential in Artificial General Intelligence (AGI). However, the expensive training and deployment of LLMs present challenges to transparent and open academic research. To address these issues, this project open-sources the Chinese LLaMA and Alpaca…",
        }
    ],
    stop=["<|im_start|>", "<|im_end|>", "<|endoftext|>"],
    stream=True,
    n=1,
    top_p=1,
    top_k=40,
    presence_penalty=0,
    frequency_penalty=0,
    prompt_truncate_len=1024,
    context_length_exceeded_behavior="truncate",
    temperature=0.9,
    max_tokens=4000,
)

Prompt 大师