logo
AI Agent Engineering Handbook
AI Engineer

AI Agent Engineering Handbook

Build production-grade AI agents with MCP, planning, memory, orchestration, and evaluation patterns.

AI Agent Engineering HandbookAgent 101:从 Chat 到 Action

AI Agent 101: From Chat Interface to Autonomous Execution

An agent is not just a better chatbot. A chatbot answers. An agent takes a goal, uses tools, reacts to results, and keeps moving until the task is done.

The difference is simple:

  • a normal LLM waits for your next message
  • an agent can keep working between messages
Prompt Lab

Turn this chapter into hands-on prompt practice

Use the interactive lab to practice real prompt tasks in minutes.

Open Prompt Lab →

#What is an AI agent

A practical definition is:

An AI agent is a system that receives a goal, understands its environment, uses external capabilities, and adjusts its actions based on feedback.

The important question is not "can it talk?" It is "can it continue the work?"

Examples:

  • after searching for information, it keeps going and organizes the findings
  • after changing code, it continues by running tests
  • after a tool fails, it decides whether to retry, take another route, or ask for help

If a normal LLM is a brain that only outputs text, an agent adds a few more parts:

  • eyes: it can inspect files, web pages, logs, or database results
  • hands: it can call tools, run commands, and modify resources
  • notes: it can preserve task state, context, and reusable knowledge
text
┌─────────────────────────────────────────────────────────────┐ │ AI Agent Core Architecture │ ├─────────────────────────────────────────────────────────────┤ │ │ │ [ Planning ] <──────────> [ Memory ] │ │ ↑ ↑ │ │ └────────── [ LLM Brain ] ──────────┘ │ │ │ │ │ ▼ │ │ [ Tools ] <──────────> [ Actions ] │ │ │ └─────────────────────────────────────────────────────────────┘

#Four core parts of an AI agent

#1. Planning

One of the clearest differences between an agent and a normal chat model is task decomposition.

If you say "figure out why this project starts slowly," a capable agent should not jump to a conclusion. It should inspect config, build scripts, dependencies, and logs, then narrow the issue step by step.

#2. Tool use

Without tools, the agent is trapped inside text.

Tools may include:

  • web search and scraping
  • local file read and write
  • command execution
  • GitHub, Notion, Slack, and other service integrations

This is the difference between a system that can analyze and a system that can execute.

#3. Memory

Memory keeps the agent from starting over every round.

  • short-term memory keeps the current task coherent
  • long-term memory preserves reusable knowledge and user preferences

Many complaints about "unstable agents" are really memory-design problems.

#4. Perception

Perception defines what the agent can observe, such as:

  • file system changes
  • browser output
  • database results
  • images, audio, or logs

The more grounded the perception is, the less the agent needs to guess.


#Typical use cases

ScenarioTraditional workflowAgent workflowReal value
Software developmentHuman reads code, fixes bug, runs testsAgent locates the issue, edits, verifies, and reportsReduces repetitive work
Market researchManual search and manual synthesisAgent gathers, extracts, and summarizesCuts analysis time
Customer supportHuman searches docs before replyingAgent retrieves knowledge and works across systemsImproves response speed
Personal workflowManual scheduling and follow-upAgent handles repeatable tasks from rulesReduces operational overhead

#Common implementation paths

If you are just getting started, you usually end up in one of these paths:

PathBest forTypical trait
IDE-based agentsDevelopersWorks directly inside the coding environment
Framework-based orchestrationEngineering teamsMore control over complex flows
Low-code platformsProduct, ops, and business teamsFast to validate business automation

Do not choose based on hype. Start from the actual job: code collaboration, knowledge retrieval, or business automation.


#How to brief your first agent properly

The first rule is simple: do not hand it a vague one-line instruction. State the goal, the scope, and the verification rules clearly.

#Bad prompt

Help me analyze this project.

#Better agent-style prompt

markdown
# Role You are a senior Node.js architect focused on performance optimization. # Context This is a NestJS backend. The `GET /products` endpoint becomes very slow under concurrency. # Task 1. Inspect all code under `src/modules/products`. 2. Identify the top three causes of the bottleneck. 3. Fix the most obvious issue and verify the change. 4. Summarize the before-and-after performance impact. # Constraints - Only modify files under `src/modules/products`. - Run the relevant tests after the change.

That is the difference between "say something smart" and "do a bounded engineering task."


#Common failure modes

ProblemTypical causeBetter response
Gets stuck in loopsGoal is vague and there is no stop conditionAdd an iteration limit and a clearer plan
Edits code recklesslyWeak context and weak constraintsTighten the scope and the verification rules
Cost grows too fastExpensive models are used repeatedlyUse layered model choices and shorter context

#Practice ideas

  1. Beginner: use an IDE agent to refactor all style files in a folder and extract shared variables into theme.css.
  2. Advanced: build a multi-agent workflow where Agent A drafts a blog post, Agent B prepares visuals, and Agent C publishes to a mock API.

#Summary

The important shift is to stop treating an agent as a talking machine and start treating it as an execution system.

  1. It should break down tasks, not just answer questions.
  2. It should use tools, not just generate text.
  3. It should continue based on feedback, not stop after one response.
  4. It still needs boundaries and verification, or it will sound smart while behaving unreliably.

Next chapter: the "USB interface" of the AI era, The Ultimate Guide to MCP

FAQ

开发 AI Agent 需要掌握哪些编程语言?
首选 Python 或 TypeScript。Python 是 AI 生态的基石,而 TypeScript 在开发 MCP Server 和网页端交互时效率极高。借助 Cursor 等 AI 原生编辑器,编程门槛已大幅降低。
MCP 协议目前支持哪些模型?
MCP 是开放协议,目前对 Claude 3.5 系列支持最完美。通过 MCP Proxy,GPT-4o 和 Gemini 也可以间接访问 MCP Server 数据源。
AI Agent 会导致程序员失业吗?
不会,但会改变程序员的工作内容。未来的开发者将从"写代码"转向"管理 Agent 团队",重点在于系统架构设计、复杂逻辑校验和 Agent 的提示词优化。