logo
P
Prompt Master

Prompt 大师

掌握和 AI 对话的艺术

Model Collection

Model overview and selection guide

TL;DR

  • This page is a foundational LLM index: it helps you quickly build a mental map of "which models exist and what category they fall into," covering GPT-5.1 / GPT-4.1 / o1, Claude 4.5, Gemini 3, Llama 3.1, Grok-2, and more.
  • Don't pick models based on parameters/leaderboards alone: what matters more is latency, cost, context length, tool support (e.g., Tool Calling), and your own evaluation results.
  • In real projects, you'll typically use a "capability model + fast model" combo: a stronger model for complex planning/reasoning, a faster and cheaper model for routine steps and batch processing.

2024-2025 Quick Reference

VendorCapability Tier (multimodal/tools)Fast/Cheap TierNotes
OpenAIChatGPT 5.1 (GPT-5 series) / GPT-4.1GPT-4o-miniGreat vision/tools; Mini for batch/automation
OpenAI (reasoning)o1 / o1-mini--Stronger long-chain reasoning/planning, higher cost
AnthropicClaude 4.5 SonnetClaude 3.5/4.5 HaikuStrong long doc/table capabilities, high safety
GoogleGemini 3 ProGemini 3 Flash / Flash-Lite1M tokens context, multimodal/tools; Flash series is fast
Meta (open-source)Llama 3/3.1 70BLlama 3.1 8BGood for private deployment, rich ecosystem
MistralMistral LargeMistral SmallCost-effective, strong multilingual
xAIGrok-2Grok-2 miniFor time-sensitive/web-connected scenarios

Selection advice: start with "small model to get it working -> big model to improve quality -> eval regression." For multimodal/screenshots, prefer ChatGPT 5.1 / GPT-4o / Gemini 3; for long documents/tables, prefer Claude 4.5 or Gemini 3 Pro.

Last updated: 2025-02

Reading Guide

This list leans toward "foundational + notable" historical context. It doesn't cover every latest model or version. Use it for:

  • Backtracking: when a paper/blog mentions a model, quickly locate its origin and era
  • Selection: build your candidate set before running evaluation

For engineering selection, answer at least these questions:

  1. Is the task more like chat, coding, reasoning, or RAG?
  2. Do you need long context? How long? (context length)
  3. Do you need Tool Calling? Structured output (JSON schema)?
  4. Can you run a small evaluation set for regression (10-50 samples is enough to start)?

Data adopted from Papers with Code and Zhao et al. (2023).

Models

ModelRelease DateDescription
BERT2018Bidirectional Encoder Representations from Transformers
GPT2018Improving Language Understanding by Generative Pre-Training
RoBERTa2019A Robustly Optimized BERT Pretraining Approach
GPT-22019Language Models are Unsupervised Multitask Learners
T52019Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
BART2019Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
ALBERT2019A Lite BERT for Self-supervised Learning of Language Representations
XLNet2019Generalized Autoregressive Pretraining for Language Understanding and Generation
CTRL2019CTRL: A Conditional Transformer Language Model for Controllable Generation
ERNIE2019ERNIE: Enhanced Representation through Knowledge Integration
GShard2020GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
GPT-32020Language Models are Few-Shot Learners
LaMDA2021LaMDA: Language Models for Dialog Applications
PanGu-α2021PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation
mT52021mT5: A massively multilingual pre-trained text-to-text transformer
CPM-22021CPM-2: Large-scale Cost-effective Pre-trained Language Models
T02021Multitask Prompted Training Enables Zero-Shot Task Generalization
HyperCLOVA2021What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers
Codex2021Evaluating Large Language Models Trained on Code
ERNIE 3.02021ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation
Jurassic-12021Jurassic-1: Technical Details and Evaluation
FLAN2021Finetuned Language Models Are Zero-Shot Learners
MT-NLG2021Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Yuan 1.02021Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning
WebGPT2021WebGPT: Browser-assisted question-answering with human feedback
Gopher2021Scaling Language Models: Methods, Analysis & Insights from Training Gopher
ERNIE 3.0 Titan2021ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation
GLaM2021GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
InstructGPT2022Training language models to follow instructions with human feedback
GPT-NeoX-20B2022GPT-NeoX-20B: An Open-Source Autoregressive Language Model
AlphaCode2022Competition-Level Code Generation with AlphaCode
CodeGen2022CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis
Chinchilla2022Shows that for a compute budget, the best performances are not achieved by the largest models but by smaller models trained on more data.
Tk-Instruct2022Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
UL22022UL2: Unifying Language Learning Paradigms
PaLM2022PaLM: Scaling Language Modeling with Pathways
OPT2022OPT: Open Pre-trained Transformer Language Models
BLOOM2022BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
GLM-130B2022GLM-130B: An Open Bilingual Pre-trained Model
AlexaTM2022AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model
Flan-T52022Scaling Instruction-Finetuned Language Models
Sparrow2022Improving alignment of dialogue agents via targeted human judgements
U-PaLM2022Transcending Scaling Laws with 0.1% Extra Compute
mT02022Crosslingual Generalization through Multitask Finetuning
Galactica2022Galactica: A Large Language Model for Science
OPT-IML2022OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization
LLaMA2023LLaMA: Open and Efficient Foundation Language Models
GPT-42023GPT-4 Technical Report
PanGu-Σ2023PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing
BloombergGPT2023BloombergGPT: A Large Language Model for Finance
PaLM 22023A Language Model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM.
Claude 22023Anthropic’s second-gen assistant, improved writing/code quality and safety
Llama 22023Open-weight chat models (7B-70B) widely used for private deployment
Mixtral 8x7B2023Sparse Mixture-of-Experts open-source model, great cost-effectiveness
Gemini 1.02023Google's multimodal model (Ultra/Pro/Nano), first release of the Gemini series
Claude 3 (Opus/Sonnet/Haiku)2024Next-gen multimodal models, strong long doc/table extraction and safety
Gemini 1.5 Pro2024Up to 1M+ tokens long context, multimodal
Gemini 1.5 Flash2024Cheap and fast multimodal model, good for batch and real-time interaction
Mistral Large2024Multilingual large model, supports function calling and long context
Grok-1.52024xAI's long context model, focused on real-time use
GPT-4o2024OpenAI's omni-modal flagship, faster than GPT-4, supports voice/image/video
GPT-4o mini2024Low-cost small model with strong tool capabilities
Llama 320248B/70B open-source, strong in English and multilingual performance
Llama 3.120248B/70B/405B, upgraded reasoning and 128k context
Claude 3.5 Sonnet2024Claude 3.5 primary model, strong at code and tool calling
Claude 3.5 Haiku2024Lightweight fast variant, retains high safety and multimodal capabilities
o12024OpenAI reasoning model, focused on chain-of-thought and planning
o1-mini2024Cheaper/faster version of o1
GPT-4.12024“All-in-one” model, unifies text/image/voice, upgraded reasoning and tool calling
Grok-22024xAI next-gen model, improved code and web-connected QA
Gemini 2.0 Flash (Exp)2024Gemini 2.0 preview for real-time/tool scenarios
Gemini 2.0 Pro (Exp)2024Gemini 2.0 preview flagship, multimodal and long context
DeepSeek-R12025Open-source model focused on reasoning efficiency, supports long context and math/code
Claude 4.5 Sonnet2025Anthropic 2025 flagship, long context (200K-1M beta), stronger code and table retrieval
ChatGPT 5.1 (GPT-5 series)2025Latest OpenAI product for Responses API, controllable reasoning depth and multimodal
Gemini 3 Pro2025Google's ultra-long context flagship (1,048,576 tokens), multimodal + tool chain optimization
Gemini 3 Flash / Flash-Lite2025Fast/low-cost multimodal models, good for in-product real-time interaction and batch processing