Should I use zero-shot or few-shot classification?

Use zero-shot during development to check whether the label set even makes sense — short prompt, low token cost. In production, switch to few-shot with 2-5 consistent examples that nail down both format and casing. Zero-shot's real weakness is not accuracy but output drift — one call returns `positive`, the next returns `Positive.`, and your downstream parser breaks.

My classifier keeps adding explanations — how do I force label-only output?

Three steps: add `Only output the label, no explanation` to the instruction; in every few-shot example show label only, no commentary; call the API with `temperature=0` and `max_tokens=10`. If it still drifts, list the allowed labels as an enum (positive / negative / neutral) and add an `unknown` fallback so the model has a legal exit instead of writing a paragraph.

How do I write a stable multi-label classification prompt?

Pin three things: the label pool (every legal tag listed), the min/max count (`pick 1-3`), and the output separator (`tag1, tag2`). The content-tagging example in the chapter uses exactly that shape. For more stability, switch to JSON output with a confidence field — downstream code can filter by threshold and parsing is cleaner than splitting plain text.

Edge cases like "it was okay" or "so-so" keep getting misclassified — is that a prompt issue or a model issue?

Ninety percent of the time it is the prompt. Spell out the edge rules in the classification criteria — for example `"okay" / "so-so" / "fine" → neutral; "good but pricy" → judge by overall lean`. If that is still not enough, ask for a `confidence` field and force a justification when confidence is below 70. The model then has to reason carefully and accuracy usually recovers.

What API parameters should I set for a classification task?

Set `temperature=0` for consistency and `max_tokens=10-50` since labels are short and any extra tokens are waste. Pin role and output spec in `system`, and use a `stop` sequence to cut off rambling. In production add retry-on-429 with backoff, and log request ID, model version and input hash so you can run A/B evaluations later.

Classification Prompts

Basic classification prompts and quick-start guide

This chapter covers common Classification Prompt templates for testing and deploying LLM text classification. Classification is one of the most fundamental and practical NLP tasks -- nail it and you can build all kinds of automation pipelines fast.

What Is Text Classification?

Text Classification assigns a piece of text to one or more predefined categories.

┌─────────────────────────────────────────────────────────────┐
│                    Classification Flow                        │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│   Input Text          →      LLM Analysis   →   Output Label│
│                                                             │
│  "This product is great!"   Understand semantics  positive  │
│  "Terrible service"         Judge sentiment       negative  │
│  "It's okay"                Match label           neutral   │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Why Classification Matters

Use Case	Specific Application	Business Value
Customer Service	Ticket routing, mood detection, priority	Faster response
Content Moderation	Spam, policy violations, sensitive topics	Lower manual costs
Email Processing	Spam filtering, email type classification	Better productivity
User Feedback	Review sentiment, NPS prediction	User insight
Smart Routing	Issue type detection, department dispatch	Optimized workflows

Common Classification Types

Type	Label Examples	Typical Scenario
Sentiment Analysis	positive / negative / neutral	Review analysis, monitoring
Intent Detection	inquiry / complaint / refund / buy	Chatbots
Topic Classification	tech / finance / sports / entertainment	News categorization
Urgency Level	high / medium / low	Ticket systems
Language Detection	Chinese / English / Japanese	Multi-language routing
Spam Detection	spam / not_spam	Email filtering

Core Prompt Structure

A good Classification Prompt needs these elements:

┌─────────────────────────────────────────────────────────────┐
│               Classification Prompt Structure                │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  1. Task description  - Clearly state what to do            │
│  2. Label space       - List all possible categories        │
│  3. Output constraint - Specify format (label only / JSON)  │
│  4. Input data        - The text to classify                │
│                                                             │
└─────────────────────────────────────────────────────────────┘

General Template

Classify the following text into the specified categories.

Categories: {label_1} / {label_2} / {label_3}

Requirements:
- Output only the category name, no explanation
- If uncertain, output "unknown"

Text: {input_text}

Category:

Quick Start: Zero-shot Classification

The simplest approach -- just tell the LLM which labels to use:

Prompt:

Classify the text into neutral, negative, or positive.

Text: I think the food was okay.

Sentiment:

Output:

neutral

This works for:

Quick prototyping
Clear-cut label semantics
Tasks where consistency isn't critical

Few-shot Classification: More Stable Output

When you need more stable, predictable output formatting, use Few-shot (provide 2-5 examples):

Prompt:

Classify the text as positive / negative / neutral.

Examples:
Text: This product is amazing, highly recommend!
Classification: positive

Text: Terrible service, never coming back.
Classification: negative

Text: It's alright, nothing special.
Classification: neutral

Now classify:
Text: Beautiful packaging, but the taste is average.
Classification:

Output:

neutral

Example 1: Sentiment Analysis

The most common classification task -- detecting emotional tone in text.

Scenario: E-commerce review analysis

Prompt:

You are a sentiment analysis expert. Analyze the sentiment of the following product review.

Classification criteria:
- positive: Positive evaluation, satisfaction, recommendation
- negative: Negative evaluation, dissatisfaction, complaint
- neutral: Neutral evaluation, stating facts, no clear sentiment

Review: Shipping was fast, but the packaging was slightly damaged. Overall acceptable.

Sentiment:

Output:

neutral

Example 2: Intent Classification

Identify user message intent -- commonly used in customer service bots.

Scenario: Smart customer service

Prompt:

You are a customer service intent classifier. Identify the intent of the user message.

Intent types:
- inquiry: Asking about product info, usage instructions
- complaint: Expressing dissatisfaction, requesting resolution
- refund: Requesting refund or return
- other: Messages that don't fit other categories

User message: The phone I bought last week has a cracked screen. It's only been three days -- how are you going to handle this?

Intent:

Output:

complaint

Example 3: Topic Classification

Categorize text into different subject areas.

Scenario: News content classification

Prompt:

Classify the following news headline into the appropriate topic.

Topic options:
- Technology: Internet, AI, phones, software
- Finance: Stock market, economy, investment, business
- Sports: Events, athletes, match results
- Entertainment: Celebrities, film/TV, variety shows, music
- Society: Public interest, events, policy

Headline: Apple launches iPhone 16 with the new A18 chip

Topic:

Output:

Technology

Example 4: Urgency Classification

Determine ticket or request priority.

Scenario: Ticket system

Prompt:

You are a ticket priority classifier. Judge the urgency based on ticket content.

Urgency criteria:
- High: System outage, data loss, security vulnerability, affecting many users
- Medium: Feature malfunction, performance issues, affecting some users
- Low: UI issues, feature requests, general inquiries

Ticket: Users report the login page loads very slowly, taking over 10 seconds, affecting about 20% of users.

Urgency:

Output:

Medium

Example 5: Multi-label Classification

Sometimes a single text belongs to multiple categories.

Scenario: Content tagging

Prompt:

You are a content tag classifier. Add appropriate tags to the following article.

Available tags:
- Artificial Intelligence
- Software Development
- Career Growth
- Learning Methods
- Tool Recommendations

Requirements:
- Select 1-3 most relevant tags
- Output format: tag1, tag2

Article summary: This article covers how to use ChatGPT to boost programming productivity, including code generation, bug fixing, and code review -- helping developers stay competitive in the AI era.

Tags:

Output:

Artificial Intelligence, Software Development, Tool Recommendations

Advanced Tips: Improving Classification Accuracy

1. Define Boundary Conditions

Classification criteria:
- positive: Must have clearly positive words or recommendation intent
- negative: Must have clearly negative words or complaint intent
- neutral: No clear sentiment, or mixed positive/negative

Edge cases:
- "it's fine" "average" "okay" → neutral
- "great but a bit expensive" → judge by overall leaning

2. Add Confidence Scores

Classify and provide a confidence score (0-100).

Output format:
Classification: {label}
Confidence: {score}

If confidence is below 70, explain why.

3. Use JSON Structured Output

Output classification results in JSON format:

{
  "text": "original text",
  "category": "classification",
  "confidence": 0.95,
  "reasoning": "brief rationale"
}

Common Problems & Solutions

Problem	Cause	Solution
Unstable output format	Prompt constraints too vague	Explicitly say "output label only"
Inconsistent casing	No example constraints	Provide few-shot examples
Wrong edge cases	Vague label definitions	Add detailed criteria
Output includes explanation	Didn't forbid it	Add "do not explain"
Refuses to classify	Model is uncertain	Add "unknown" fallback option

Zero-shot vs Few-shot Comparison

Dimension	Zero-shot	Few-shot
Prompt length	Short	Long
Output stability	Lower	Higher
Format consistency	May vary	High
Best for	Quick prototyping, simple tasks	Production, high-precision needs
Token cost	Low	High

Recommendation:

Use Zero-shot during development for quick validation
Use Few-shot in production for stability

API Examples

Python (OpenAI)

from openai import OpenAI

client = OpenAI()

def classify_sentiment(text: str) -> str:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": "You are a sentiment classifier. Output only one of: positive/negative/neutral."
            },
            {
                "role": "user",
                "content": f"Classify the sentiment of the following text:\n\n{text}"
            }
        ],
        temperature=0,  # Set to 0 for consistency
        max_tokens=10
    )
    return response.choices[0].message.content.strip()

# Usage
result = classify_sentiment("This product is amazing!")
print(result)  # positive

Python (Claude)

import anthropic

client = anthropic.Anthropic()

def classify_intent(text: str) -> str:
    message = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=50,
        messages=[
            {
                "role": "user",
                "content": f"""Identify the intent of the following user message.

Intent types: inquiry / complaint / refund / other

Output only the intent type, no explanation.

User message: {text}

Intent:"""
            }
        ]
    )
    return message.content[0].text.strip()

# Usage
result = classify_intent("I want to return the shirt I bought last week")
print(result)  # refund

Hands-on Exercises

Open ChatGPT or Claude and try these:

Exercise 1: Basic Sentiment Classification

Classify the following reviews as positive / negative / neutral:

1. Super fast shipping, arrived the next day
2. Terrible quality, broke after one week
3. Price is fair, about the same as market rate
4. Customer service was great, patiently answered all my questions
5. Neither good nor bad, just average

Output format:
1. [label]
2. [label]
...

Exercise 2: Design Your Own Classifier

Try designing a Classification Prompt for these scenarios:

Email classification (work / personal / promotional / spam)
Bug report classification (UI / functionality / performance / security)
Social post sentiment (happy / sad / angry / neutral)

Dive deeper into Classification techniques:

Sentiment Classification (Zero-shot) - Sentiment basics
Sentiment Classification (Few-shot) - Few-shot sentiment
Few-shot Prompting - Few-shot technique deep dive
Zero-shot Prompting - Zero-shot technique deep dive

Takeaways

Classification is one of the most practical prompt skills. Remember these:

Define the label space: Clearly list all possible categories
Constrain the output format: Specify "output label only" to avoid extra noise
Use Few-shot: In production, use examples for stability
Handle edge cases: Add "unknown" or define clear classification criteria
Low temperature: Set temperature=0 in API calls for consistency

Master Classification and you can build all kinds of automated classification pipelines fast.

📚 相关资源

❓ 常见问题

关于本章主题最常被搜索的问题，点击展开答案

Zero-shot 和 Few-shot 分类该怎么选？

开发阶段用 Zero-shot 快速验证标签是否合理，Prompt 短、token 省；生产环境一律切 Few-shot，给 2-5 个稳定示例，把输出格式和大小写都钉死。Zero-shot 的最大问题不是准确率，是输出不稳定——这次返回 `positive`，下次返回 `Positive.`，下游代码立刻挂。

分类 Prompt 输出老带解释，怎么让它只吐标签？

三步走：在 instruction 加「只输出类别名称，不要解释」；用 Few-shot 示例每条都只写标签；调 API 时 `temperature=0` + `max_tokens=10`。再不稳就把允许的标签写成枚举（positive / negative / neutral），并加 `unknown` 兜底，让模型有合法出口而不是自己编一句解释。

多标签分类（multi-label）怎么写 Prompt 才稳？

明确三件事：标签池（列出所有合法标签）、最小/最大数量（如「选 1-3 个」）、输出分隔符（`标签1, 标签2`）。本章「内容标签」示例里就是这种结构。再加 JSON 输出 + confidence 字段更稳，下游能直接按阈值过滤，比纯文本好解析。

边界案例（『还行』『一般』）老分错，是 Prompt 问题还是模型问题？

九成是 Prompt 问题。在分类标准里写清边界规则，例如「『还行』『一般』『可以』→ neutral；『很好但是有点贵』看整体倾向」。再不够就给一个 `confidence` 字段，confidence < 70 时强制说明原因——模型反而被迫认真判断，准确率会回升。

API 调用分类任务，参数应该怎么配？

`temperature=0` 拉高一致性、`max_tokens=10-50`（标签很短，多余 token 都是浪费）、`system` 钉住角色和输出规范、`stop` 设分隔符防止啰嗦。生产环境再加：retry on 429 with backoff、记录 request ID + 模型版本 + 输入 hash，方便后续 A/B 评估。

Prompt 大师

What Is Text Classification?

Why Classification Matters

Common Classification Types

Core Prompt Structure

General Template

Quick Start: Zero-shot Classification

Few-shot Classification: More Stable Output

Example 1: Sentiment Analysis

Example 2: Intent Classification

Example 3: Topic Classification

Example 4: Urgency Classification

Example 5: Multi-label Classification

Advanced Tips: Improving Classification Accuracy

1. Define Boundary Conditions

2. Add Confidence Scores

3. Use JSON Structured Output

Common Problems & Solutions

Zero-shot vs Few-shot Comparison

API Examples

Python (OpenAI)

Python (Claude)

Hands-on Exercises

Exercise 1: Basic Sentiment Classification

Exercise 2: Design Your Own Classifier

Related Reading

Takeaways

📚 相关资源

❓ 常见问题