Hermes Agent
Build your own Agent with Nous Hermes open-weight models
Hermes is a family of open-weight models from Nous Research, fine-tuned on Llama. Hermes 4, released August 2025, comes in 14B, 70B, and 405B variants. The 405B hits 96.3% on MATH-500 and 81.9% on AIME'24 โ frontier-tier numbers. Two things make it stand out: **native hybrid reasoning** (switch between think/fast modes) and **RefusalBench 57.1%** โ the lowest refusal rate of any evaluated model (GPT-4o: 17.67%, Claude Sonnet 4: 17%).
Why learn it specifically? Because every team building Agents eventually hits two walls: **is tool calling actually reliable?** and **will the model refuse my business case?** Hermes 3 started training `<tool_call>` JSON emission into the weights directly โ no external parsing hacks needed. Hermes 4 scaled the post-training corpus from 1M samples / 1.2B tokens to ~5M samples / ~60B tokens blended across reasoning and non-reasoning data.
๐ฐ Salary reference (2026): Self-hosted LLM + Agent framework roles pay AU $160K-$250K, US $180K-$350K total comp. The demand isn't "can you call an API" โ it's deployment, tool use tuning, and guardrails.
๐ข Hiring companies: Nous Research, Together AI, OpenRouter, Replicate; finance/healthcare/defense companies that can't send data to OpenAI; every startup building an "AI Agent platform."
This track assumes you already know how to call an LLM API. If not, start with chapters 01-05 of the AI Engineer track.
30-Second Quick Start
Try Hermes 4 14B in 30 seconds โ local Ollama is enough.
# ่ฃ
Ollama ๅ
ollama pull hermes3:70b # ๆ hermes3:8b ๆฌๆบ่ฝ่ท
# ๅฝไปค่กๅฏน่ฏ
ollama run hermes3:70b "็ปๆไธๆฎต Python ไปฃ็ ๏ผ็จ requests ่ฐ็จไธไธช REST API ๅนถ้่ฏ 3 ๆฌก"
# ๆ่
็จ OpenAI ๅ
ผๅฎน API
curl http://localhost:11434/v1/chat/completions \
-d '{"model": "hermes3:70b", "messages": [{"role": "user", "content": "Hi"}]}'No GPU locally? Try Hermes 3 405B via OpenRouter free tier: `nousresearch/hermes-3-llama-3.1-405b:free`. Chapters 6 and 7 cover all three deployment options.
What You Will Learn
In this tutorial, you will learn:
- โExplain the real differences between Hermes 3 vs Hermes 4 and 14B/70B/405B โ know which to pick when
- โRun Hermes both locally and via cloud (OpenRouter / Together), with real cost comparison
- โMaster the native `<tool_call>` format โ drop the hand-rolled parsers and swap Hermes in for GPT/Claude function calling
- โBuild a multi-step autonomous Agent with Hermes + LangGraph, including tool calls, state recovery, and LangSmith tracing
- โUnderstand uncensored / neutral alignment, its operational risk, and ship basic guardrails before production
Chapter Overview
Quick preview by section - jump directly to what interests you.
Origins of the Hermes series, Nous Research roadmap, Hermes 3 โ 4 evolution
- What is Hermes, Who is Nous Research20 min
- Hermes vs Llama / Qwen / DeepSeek / GPT โ A Selection Map25 min
Parameters, training data, B200 cluster, flex attention, DPO strategy
- Hermes Architecture โ Llama Base + Fine-tuning Strategy30 min
- What "Neutral Alignment" / "Uncensored" Actually Means20 min
- Hybrid Reasoning โ think vs fast mode25 min
Install Ollama, pull weights, VRAM sizing, OpenAI-compatible API
- Run Hermes Locally โ Ollama + hermes3:70b30 min
- Run Hermes in the Cloud โ OpenRouter / Together / DeepInfra25 min
- Structured Output โ JSON Schema + `<tool_call>` in Practice40 min
State Graph, Tool Node, checkpointer, interrupt โ build a working research Agent from scratch
- Build an Autonomous Agent โ Hermes 4 + LangGraph60 min
- Hermes + RAG โ Practical Patterns with Long Context45 min
Real numbers: Hermes 4 70B monthly cost / per-token price / GPU amortization
- Deployment Cost โ Self-host vs Cloud API vs OpenAI30 min
- Before You Ship โ Guardrails, Rate Limits, Prompt Injection35 min
Distributed training, inference network, datasets from Nous โ and Hermes roadmap