Hermes Agent
Build your own Agent with Nous Hermes open-weight models
Hermes is a family of open-weight models from Nous Research, fine-tuned on Llama. Hermes 4, released August 2025, comes in 14B, 70B, and 405B variants. The 405B hits 96.3% on MATH-500 and 81.9% on AIME'24 — frontier-tier numbers. Two things make it stand out: **native hybrid reasoning** (switch between think/fast modes) and **RefusalBench 57.1%** — the lowest refusal rate of any evaluated model (GPT-4o: 17.67%, Claude Sonnet 4: 17%).
Why learn it specifically? Because every team building Agents eventually hits two walls: **is tool calling actually reliable?** and **will the model refuse my business case?** Hermes 3 started training `<tool_call>` JSON emission into the weights directly — no external parsing hacks needed. Hermes 4 scaled the post-training corpus from 1M samples / 1.2B tokens to ~5M samples / ~60B tokens blended across reasoning and non-reasoning data.
💰 Salary reference (2026): Self-hosted LLM + Agent framework roles pay AU $160K-$250K, US $180K-$350K total comp. The demand isn't "can you call an API" — it's deployment, tool use tuning, and guardrails.
🏢 Hiring companies: Nous Research, Together AI, OpenRouter, Replicate; finance/healthcare/defense companies that can't send data to OpenAI; every startup building an "AI Agent platform."
This track assumes you already know how to call an LLM API. If not, start with chapters 01-05 of the AI Engineer track.
30-Second Quick Start
Try Hermes 4 14B in 30 seconds — local Ollama is enough.
# 装 Ollama 后
ollama pull hermes3:70b # 或 hermes3:8b 本机能跑
# 命令行对话
ollama run hermes3:70b "给我一段 Python 代码,用 requests 调用一个 REST API 并重试 3 次"
# 或者用 OpenAI 兼容 API
curl http://localhost:11434/v1/chat/completions \
-d '{"model": "hermes3:70b", "messages": [{"role": "user", "content": "Hi"}]}'No GPU locally? Try Hermes 3 405B via OpenRouter free tier: `nousresearch/hermes-3-llama-3.1-405b:free`. Chapters 6 and 7 cover all three deployment options.
What You Will Learn
In this tutorial, you will learn:
- ✓Explain the real differences between Hermes 3 vs Hermes 4 and 14B/70B/405B — know which to pick when
- ✓Run Hermes both locally and via cloud (OpenRouter / Together), with real cost comparison
- ✓Master the native `<tool_call>` format — drop the hand-rolled parsers and swap Hermes in for GPT/Claude function calling
- ✓Build a multi-step autonomous Agent with Hermes + LangGraph, including tool calls, state recovery, and LangSmith tracing
- ✓Understand uncensored / neutral alignment, its operational risk, and ship basic guardrails before production
Chapter Overview
Quick preview by section - jump directly to what interests you.
Origins of the Hermes series, Nous Research roadmap, Hermes 3 → 4 evolution
- What is Hermes, Who is Nous Research20 min
- Hermes vs Llama / Qwen / DeepSeek / GPT — A Selection Map25 min
Parameters, training data, B200 cluster, flex attention, DPO strategy
- Hermes Architecture — Llama Base + Fine-tuning Strategy30 min
- What "Neutral Alignment" / "Uncensored" Actually Means20 min
- Hybrid Reasoning — think vs fast mode25 min
Install Ollama, pull weights, VRAM sizing, OpenAI-compatible API
- Run Hermes Locally — Ollama + hermes3:70b30 min
- Run Hermes in the Cloud — OpenRouter / Together / DeepInfra25 min
- Structured Output — JSON Schema + `<tool_call>` in Practice40 min
State Graph, Tool Node, checkpointer, interrupt — build a working research Agent from scratch
- Build an Autonomous Agent — Hermes 4 + LangGraph60 min
- Hermes + RAG — Practical Patterns with Long Context45 min
Real numbers: Hermes 4 70B monthly cost / per-token price / GPU amortization
- Deployment Cost — Self-host vs Cloud API vs OpenAI30 min
- Before You Ship — Guardrails, Rate Limits, Prompt Injection35 min
Distributed training, inference network, datasets from Nous — and Hermes roadmap