Classification Prompts
Basic classification prompts and quick-start guide
This chapter covers common Classification Prompt templates for testing and deploying LLM text classification. Classification is one of the most fundamental and practical NLP tasks -- nail it and you can build all kinds of automation pipelines fast.
What Is Text Classification?
Text Classification assigns a piece of text to one or more predefined categories.
┌─────────────────────────────────────────────────────────────┐
│ Classification Flow │
├─────────────────────────────────────────────────────────────┤
│ │
│ Input Text → LLM Analysis → Output Label│
│ │
│ "This product is great!" Understand semantics positive │
│ "Terrible service" Judge sentiment negative │
│ "It's okay" Match label neutral │
│ │
└─────────────────────────────────────────────────────────────┘
Why Classification Matters
| Use Case | Specific Application | Business Value |
|---|---|---|
| Customer Service | Ticket routing, mood detection, priority | Faster response |
| Content Moderation | Spam, policy violations, sensitive topics | Lower manual costs |
| Email Processing | Spam filtering, email type classification | Better productivity |
| User Feedback | Review sentiment, NPS prediction | User insight |
| Smart Routing | Issue type detection, department dispatch | Optimized workflows |
Common Classification Types
| Type | Label Examples | Typical Scenario |
|---|---|---|
| Sentiment Analysis | positive / negative / neutral | Review analysis, monitoring |
| Intent Detection | inquiry / complaint / refund / buy | Chatbots |
| Topic Classification | tech / finance / sports / entertainment | News categorization |
| Urgency Level | high / medium / low | Ticket systems |
| Language Detection | Chinese / English / Japanese | Multi-language routing |
| Spam Detection | spam / not_spam | Email filtering |
Core Prompt Structure
A good Classification Prompt needs these elements:
┌─────────────────────────────────────────────────────────────┐
│ Classification Prompt Structure │
├─────────────────────────────────────────────────────────────┤
│ │
│ 1. Task description - Clearly state what to do │
│ 2. Label space - List all possible categories │
│ 3. Output constraint - Specify format (label only / JSON) │
│ 4. Input data - The text to classify │
│ │
└─────────────────────────────────────────────────────────────┘
General Template
Classify the following text into the specified categories.
Categories: {label_1} / {label_2} / {label_3}
Requirements:
- Output only the category name, no explanation
- If uncertain, output "unknown"
Text: {input_text}
Category:
Quick Start: Zero-shot Classification
The simplest approach -- just tell the LLM which labels to use:
Prompt:
Classify the text into neutral, negative, or positive.
Text: I think the food was okay.
Sentiment:
Output:
neutral
This works for:
- Quick prototyping
- Clear-cut label semantics
- Tasks where consistency isn't critical
Few-shot Classification: More Stable Output
When you need more stable, predictable output formatting, use Few-shot (provide 2-5 examples):
Prompt:
Classify the text as positive / negative / neutral.
Examples:
Text: This product is amazing, highly recommend!
Classification: positive
Text: Terrible service, never coming back.
Classification: negative
Text: It's alright, nothing special.
Classification: neutral
Now classify:
Text: Beautiful packaging, but the taste is average.
Classification:
Output:
neutral
Example 1: Sentiment Analysis
The most common classification task -- detecting emotional tone in text.
Scenario: E-commerce review analysis
Prompt:
You are a sentiment analysis expert. Analyze the sentiment of the following product review.
Classification criteria:
- positive: Positive evaluation, satisfaction, recommendation
- negative: Negative evaluation, dissatisfaction, complaint
- neutral: Neutral evaluation, stating facts, no clear sentiment
Review: Shipping was fast, but the packaging was slightly damaged. Overall acceptable.
Sentiment:
Output:
neutral
Example 2: Intent Classification
Identify user message intent -- commonly used in customer service bots.
Scenario: Smart customer service
Prompt:
You are a customer service intent classifier. Identify the intent of the user message.
Intent types:
- inquiry: Asking about product info, usage instructions
- complaint: Expressing dissatisfaction, requesting resolution
- refund: Requesting refund or return
- other: Messages that don't fit other categories
User message: The phone I bought last week has a cracked screen. It's only been three days -- how are you going to handle this?
Intent:
Output:
complaint
Example 3: Topic Classification
Categorize text into different subject areas.
Scenario: News content classification
Prompt:
Classify the following news headline into the appropriate topic.
Topic options:
- Technology: Internet, AI, phones, software
- Finance: Stock market, economy, investment, business
- Sports: Events, athletes, match results
- Entertainment: Celebrities, film/TV, variety shows, music
- Society: Public interest, events, policy
Headline: Apple launches iPhone 16 with the new A18 chip
Topic:
Output:
Technology
Example 4: Urgency Classification
Determine ticket or request priority.
Scenario: Ticket system
Prompt:
You are a ticket priority classifier. Judge the urgency based on ticket content.
Urgency criteria:
- High: System outage, data loss, security vulnerability, affecting many users
- Medium: Feature malfunction, performance issues, affecting some users
- Low: UI issues, feature requests, general inquiries
Ticket: Users report the login page loads very slowly, taking over 10 seconds, affecting about 20% of users.
Urgency:
Output:
Medium
Example 5: Multi-label Classification
Sometimes a single text belongs to multiple categories.
Scenario: Content tagging
Prompt:
You are a content tag classifier. Add appropriate tags to the following article.
Available tags:
- Artificial Intelligence
- Software Development
- Career Growth
- Learning Methods
- Tool Recommendations
Requirements:
- Select 1-3 most relevant tags
- Output format: tag1, tag2
Article summary: This article covers how to use ChatGPT to boost programming productivity, including code generation, bug fixing, and code review -- helping developers stay competitive in the AI era.
Tags:
Output:
Artificial Intelligence, Software Development, Tool Recommendations
Advanced Tips: Improving Classification Accuracy
1. Define Boundary Conditions
Classification criteria:
- positive: Must have clearly positive words or recommendation intent
- negative: Must have clearly negative words or complaint intent
- neutral: No clear sentiment, or mixed positive/negative
Edge cases:
- "it's fine" "average" "okay" → neutral
- "great but a bit expensive" → judge by overall leaning
2. Add Confidence Scores
Classify and provide a confidence score (0-100).
Output format:
Classification: {label}
Confidence: {score}
If confidence is below 70, explain why.
3. Use JSON Structured Output
Output classification results in JSON format:
{
"text": "original text",
"category": "classification",
"confidence": 0.95,
"reasoning": "brief rationale"
}
Common Problems & Solutions
| Problem | Cause | Solution |
|---|---|---|
| Unstable output format | Prompt constraints too vague | Explicitly say "output label only" |
| Inconsistent casing | No example constraints | Provide few-shot examples |
| Wrong edge cases | Vague label definitions | Add detailed criteria |
| Output includes explanation | Didn't forbid it | Add "do not explain" |
| Refuses to classify | Model is uncertain | Add "unknown" fallback option |
Zero-shot vs Few-shot Comparison
| Dimension | Zero-shot | Few-shot |
|---|---|---|
| Prompt length | Short | Long |
| Output stability | Lower | Higher |
| Format consistency | May vary | High |
| Best for | Quick prototyping, simple tasks | Production, high-precision needs |
| Token cost | Low | High |
Recommendation:
- Use Zero-shot during development for quick validation
- Use Few-shot in production for stability
API Examples
Python (OpenAI)
from openai import OpenAI
client = OpenAI()
def classify_sentiment(text: str) -> str:
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "system",
"content": "You are a sentiment classifier. Output only one of: positive/negative/neutral."
},
{
"role": "user",
"content": f"Classify the sentiment of the following text:\n\n{text}"
}
],
temperature=0, # Set to 0 for consistency
max_tokens=10
)
return response.choices[0].message.content.strip()
# Usage
result = classify_sentiment("This product is amazing!")
print(result) # positive
Python (Claude)
import anthropic
client = anthropic.Anthropic()
def classify_intent(text: str) -> str:
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=50,
messages=[
{
"role": "user",
"content": f"""Identify the intent of the following user message.
Intent types: inquiry / complaint / refund / other
Output only the intent type, no explanation.
User message: {text}
Intent:"""
}
]
)
return message.content[0].text.strip()
# Usage
result = classify_intent("I want to return the shirt I bought last week")
print(result) # refund
Hands-on Exercises
Open ChatGPT or Claude and try these:
Exercise 1: Basic Sentiment Classification
Classify the following reviews as positive / negative / neutral:
1. Super fast shipping, arrived the next day
2. Terrible quality, broke after one week
3. Price is fair, about the same as market rate
4. Customer service was great, patiently answered all my questions
5. Neither good nor bad, just average
Output format:
1. [label]
2. [label]
...
Exercise 2: Design Your Own Classifier
Try designing a Classification Prompt for these scenarios:
- Email classification (work / personal / promotional / spam)
- Bug report classification (UI / functionality / performance / security)
- Social post sentiment (happy / sad / angry / neutral)
Related Reading
Dive deeper into Classification techniques:
- Sentiment Classification (Zero-shot) - Sentiment basics
- Sentiment Classification (Few-shot) - Few-shot sentiment
- Few-shot Prompting - Few-shot technique deep dive
- Zero-shot Prompting - Zero-shot technique deep dive
Takeaways
Classification is one of the most practical prompt skills. Remember these:
- Define the label space: Clearly list all possible categories
- Constrain the output format: Specify "output label only" to avoid extra noise
- Use Few-shot: In production, use examples for stability
- Handle edge cases: Add "unknown" or define clear classification criteria
- Low temperature: Set temperature=0 in API calls for consistency
Master Classification and you can build all kinds of automated classification pipelines fast.
📚 相关资源
❓ 常见问题
关于本章主题最常被搜索的问题,点击展开答案
Zero-shot 和 Few-shot 分类该怎么选?
开发阶段用 Zero-shot 快速验证标签是否合理,Prompt 短、token 省;生产环境一律切 Few-shot,给 2-5 个稳定示例,把输出格式和大小写都钉死。Zero-shot 的最大问题不是准确率,是输出不稳定——这次返回 `positive`,下次返回 `Positive.`,下游代码立刻挂。
分类 Prompt 输出老带解释,怎么让它只吐标签?
三步走:在 instruction 加「只输出类别名称,不要解释」;用 Few-shot 示例每条都只写标签;调 API 时 `temperature=0` + `max_tokens=10`。再不稳就把允许的标签写成枚举(positive / negative / neutral),并加 `unknown` 兜底,让模型有合法出口而不是自己编一句解释。
多标签分类(multi-label)怎么写 Prompt 才稳?
明确三件事:标签池(列出所有合法标签)、最小/最大数量(如「选 1-3 个」)、输出分隔符(`标签1, 标签2`)。本章「内容标签」示例里就是这种结构。再加 JSON 输出 + confidence 字段更稳,下游能直接按阈值过滤,比纯文本好解析。
边界案例(『还行』『一般』)老分错,是 Prompt 问题还是模型问题?
九成是 Prompt 问题。在分类标准里写清边界规则,例如「『还行』『一般』『可以』→ neutral;『很好但是有点贵』看整体倾向」。再不够就给一个 `confidence` 字段,confidence < 70 时强制说明原因——模型反而被迫认真判断,准确率会回升。
API 调用分类任务,参数应该怎么配?
`temperature=0` 拉高一致性、`max_tokens=10-50`(标签很短,多余 token 都是浪费)、`system` 钉住角色和输出规范、`stop` 设分隔符防止啰嗦。生产环境再加:retry on 429 with backoff、记录 request ID + 模型版本 + 输入 hash,方便后续 A/B 评估。