logo

Hermes Agent

Build your own Agent with Nous Hermes open-weight models

๐Ÿ‘คFor: AI engineers who want to self-host LLMs / Developers frustrated by GPT/Claude refusal policies / Teams building Agents who need reliable tool calling
โฑ๏ธ3-5 weeks
๐Ÿ“ŠIntermediate

Hermes is a family of open-weight models from Nous Research, fine-tuned on Llama. Hermes 4, released August 2025, comes in 14B, 70B, and 405B variants. The 405B hits 96.3% on MATH-500 and 81.9% on AIME'24 โ€” frontier-tier numbers. Two things make it stand out: **native hybrid reasoning** (switch between think/fast modes) and **RefusalBench 57.1%** โ€” the lowest refusal rate of any evaluated model (GPT-4o: 17.67%, Claude Sonnet 4: 17%).

Why learn it specifically? Because every team building Agents eventually hits two walls: **is tool calling actually reliable?** and **will the model refuse my business case?** Hermes 3 started training `<tool_call>` JSON emission into the weights directly โ€” no external parsing hacks needed. Hermes 4 scaled the post-training corpus from 1M samples / 1.2B tokens to ~5M samples / ~60B tokens blended across reasoning and non-reasoning data.

๐Ÿ’ฐ Salary reference (2026): Self-hosted LLM + Agent framework roles pay AU $160K-$250K, US $180K-$350K total comp. The demand isn't "can you call an API" โ€” it's deployment, tool use tuning, and guardrails.

๐Ÿข Hiring companies: Nous Research, Together AI, OpenRouter, Replicate; finance/healthcare/defense companies that can't send data to OpenAI; every startup building an "AI Agent platform."

This track assumes you already know how to call an LLM API. If not, start with chapters 01-05 of the AI Engineer track.


30-Second Quick Start

Try Hermes 4 14B in 30 seconds โ€” local Ollama is enough.

# ่ฃ… Ollama ๅŽ ollama pull hermes3:70b # ๆˆ– hermes3:8b ๆœฌๆœบ่ƒฝ่ท‘ # ๅ‘ฝไปค่กŒๅฏน่ฏ ollama run hermes3:70b "็ป™ๆˆ‘ไธ€ๆฎต Python ไปฃ็ ๏ผŒ็”จ requests ่ฐƒ็”จไธ€ไธช REST API ๅนถ้‡่ฏ• 3 ๆฌก" # ๆˆ–่€…็”จ OpenAI ๅ…ผๅฎน API curl http://localhost:11434/v1/chat/completions \ -d '{"model": "hermes3:70b", "messages": [{"role": "user", "content": "Hi"}]}'

No GPU locally? Try Hermes 3 405B via OpenRouter free tier: `nousresearch/hermes-3-llama-3.1-405b:free`. Chapters 6 and 7 cover all three deployment options.


What You Will Learn

In this tutorial, you will learn:

  • โœ“Explain the real differences between Hermes 3 vs Hermes 4 and 14B/70B/405B โ€” know which to pick when
  • โœ“Run Hermes both locally and via cloud (OpenRouter / Together), with real cost comparison
  • โœ“Master the native `<tool_call>` format โ€” drop the hand-rolled parsers and swap Hermes in for GPT/Claude function calling
  • โœ“Build a multi-step autonomous Agent with Hermes + LangGraph, including tool calls, state recovery, and LangSmith tracing
  • โœ“Understand uncensored / neutral alignment, its operational risk, and ship basic guardrails before production


Chapter Overview

Quick preview by section - jump directly to what interests you.

Section
01 What is Hermes

Origins of the Hermes series, Nous Research roadmap, Hermes 3 โ†’ 4 evolution

2 lessonsReading / Visual
Enter 01 What is Hermes
Section
02 Architecture & Core Concepts

Parameters, training data, B200 cluster, flex attention, DPO strategy

3 lessonsReading / Visual
Enter 02 Architecture & Core Concepts
Section
03 Running Locally & in the Cloud

Install Ollama, pull weights, VRAM sizing, OpenAI-compatible API

3 lessonsReading / Visual
Enter 03 Running Locally & in the Cloud
Section
04 Building Agents with Hermes

State Graph, Tool Node, checkpointer, interrupt โ€” build a working research Agent from scratch

2 lessonsReading / Visual
Enter 04 Building Agents with Hermes
Section
05 Production & Cost

Real numbers: Hermes 4 70B monthly cost / per-token price / GPU amortization

2 lessonsReading / Visual
Enter 05 Production & Cost
Section
06 Ecosystem & Next Steps

Distributed training, inference network, datasets from Nous โ€” and Hermes roadmap

1 lessonsReading / Visual
Enter 06 Ecosystem & Next Steps