5 Patterns for Building Production AI Agents
The five agentic patterns from Anthropic's 'Building Effective Agents' — prompt chaining, routing, parallelization, orchestrator-workers, and evaluator-optimizer — each with one runnable Python example. Start simple, add complexity only when it pays for itself.
Primary Focus
ai and-machine-learningAI Tools Covered
What You'll Learn
- ✓Sequential Steps with a Gate
- ✓Classify, Then Dispatch
- ✓Sectioning with Concurrent Calls
- ✓Dynamic Decomposition
- ✓The Feedback Loop
- ✓Start Simple
Guide Curriculum
Prompt Chaining
Learn key concepts
- •Sequential Steps with a Gate2m
Routing
Learn key concepts
- •Classify, Then Dispatch1m
Parallelization
Learn key concepts
- •Sectioning with Concurrent Calls1m
Orchestrator-Workers
Learn key concepts
- •Dynamic Decomposition1m
Evaluator-Optimizer
Learn key concepts
- •The Feedback Loop1m
Choosing the Right Pattern
Learn key concepts
- •Start Simple1m
Preview: First Lesson
Prompt Chaining
Sequential Steps with a Gate
Module objectives:
- Decompose one hard task into a fixed sequence of smaller LLM calls.
- Add a programmatic gate between steps so a bad intermediate result stops the chain early.
Prompt chaining decomposes a task into a sequence of steps where each LLM call processes the output of the one before it. Because the steps are fixed, you can add deterministic checks ("gates") between them. Use it when a task splits cleanly into predictable sub-steps — the classic example is write a draft → translate it, with a length check in between.
import anthropic client = anthropic.Anthropic() # reads ANTHROPIC_API_KEY from the environment def call_llm(prompt: str, system: str = "") -> str: msg = client.messages.create( model="claude-sonnet-4-6", max_tokens=1024, system=system, messages=[{"role": "user", "content": prompt}], ) return msg.content[0].text def chain(topic: str) -> str: outline = call_llm(f"Write a 3-bullet outline for a blog post about: {topic}") # Programmatic gate: bail out if the model ignored the format. if outline.count("\n") < 2: raise ValueError("Outline gate failed — too few bullets") draft = call_llm(f"Expand this outline into 2 paragraphs:\n\n{outline}") polished = call_llm(f"Tighten this draft and fix any awkward phrasing:\n\n{draft}") return polished print(chain("why small AI agents beat big ones"))
Each step is simpler and more reliable than asking for the
Start learning with this comprehensive guide
This guide includes:
About the Author
Hiram Clark is the founder of vybecoding.ai and editor of every guide and news article published on the site. He reviews all AI-drafted content for accuracy before publication and is personally accountable for factual errors. He works hands-on with the AI development tools, workflows, and infrastructure covered here.
Full Guide Content
Complete lesson text — start the interactive course above for exercises and progress tracking.
Module 1Prompt Chaining
1.1Sequential Steps with a Gate
- Decompose one hard task into a fixed sequence of smaller LLM calls.
- Add a programmatic gate between steps so a bad intermediate result stops the chain early.
import anthropic
client = anthropic.Anthropic() # reads ANTHROPIC_API_KEY from the environment
def call_llm(prompt: str, system: str = "") -> str:
msg = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system=system,
messages=[{"role": "user", "content": prompt}],
)
return msg.content[0].text
def chain(topic: str) -> str:
outline = call_llm(f"Write a 3-bullet outline for a blog post about: {topic}")
# Programmatic gate: bail out if the model ignored the format.
if outline.count("\n") < 2:
raise ValueError("Outline gate failed — too few bullets")
draft = call_llm(f"Expand this outline into 2 paragraphs:\n\n{outline}")
polished = call_llm(f"Tighten this draft and fix any awkward phrasing:\n\n{draft}")
return polished
print(chain("why small AI agents beat big ones"))
Each step is simpler and more reliable than asking for the whole thing at once, and the gate catches failures before you spend tokens on later steps.
Module 2Routing
2.1Classify, Then Dispatch
- Classify an input, then dispatch it to a prompt specialized for that category.
- Understand why separating classification from handling improves both.
def route(query: str) -> str:
category = call_llm(
f"Classify this support query as exactly one word — "
f"billing, technical, or general:\n\n{query}"
).strip().lower()
handlers = {
"billing": "You are a precise billing specialist. Cite amounts and dates.",
"technical": "You are a patient engineer. Give step-by-step fixes.",
"general": "You are a friendly concierge. Keep it short.",
}
system = handlers.get(category, handlers["general"])
return call_llm(query, system=system)
print(route("My card was charged twice this month"))
The cheap classification call protects each specialist prompt from having to handle every possible input.
Module 3Parallelization
3.1Sectioning with Concurrent Calls
- Run independent LLM calls at the same time and aggregate their results.
- Distinguish sectioning (split a task) from voting (repeat a task).
from concurrent.futures import ThreadPoolExecutor
def parallel_review(code: str) -> dict:
aspects = {
"security": "List only security issues in this code.",
"performance": "List only performance issues in this code.",
"style": "List only style and readability issues in this code.",
}
with ThreadPoolExecutor() as pool:
futures = {k: pool.submit(call_llm, code, sys) for k, sys in aspects.items()}
return {k: f.result() for k, f in futures.items()}
for aspect, notes in parallel_review("def f(x): return eval(x)").items():
print(f"== {aspect} ==\n{notes}\n")
Three focused reviewers running at once beat one reviewer asked to juggle every concern in a single pass.
Module 4Orchestrator-Workers
4.1Dynamic Decomposition
- Let a central LLM break a task into subtasks at runtime, not in advance.
- Delegate each subtask to a worker call and synthesize the results.
In the orchestrator-workers pattern, a central LLM dynamically breaks a complex task into subtasks, delegates each to a worker LLM, and synthesizes their outputs. Unlike parallelization, the subtasks aren't fixed in advance — the orchestrator decides them based on the specific input. Use it when you can't predict the breakdown ahead of time, such as a research question that fans out into a different set of sub-questions each time.
import json
def orchestrate(task: str) -> str:
plan = call_llm(
f"Break this task into 2-4 independent subtasks. "
f'Reply as a JSON array of strings only:\n\n{task}'
)
subtasks = json.loads(plan)
results = [call_llm(f"Complete this subtask concisely:\n\n{s}") for s in subtasks]
return call_llm(
"Synthesize these findings into one coherent answer:\n\n"
+ "\n\n".join(results)
)
print(orchestrate("Compare REST and GraphQL for a mobile-first startup"))
The orchestrator owns the what; the workers own the how; a final synthesis call owns the together.
Module 5Evaluator-Optimizer
5.1The Feedback Loop
- Build a generate → critique → revise loop with two LLM roles.
- Know when iterative refinement is worth the extra calls.
def evaluate_optimize(task: str, max_rounds: int = 3) -> str:
answer = call_llm(task)
for _ in range(max_rounds):
verdict = call_llm(
f"Task: {task}\n\nDraft:\n{answer}\n\n"
"If the draft fully meets the task, reply exactly 'PASS'. "
"Otherwise give one specific improvement."
)
if verdict.strip().upper().startswith("PASS"):
break
answer = call_llm(f"Revise the draft using this feedback:\n{verdict}\n\nDraft:\n{answer}")
return answer
print(evaluate_optimize("Write a one-sentence tagline for a privacy-first email app"))
The loop stops as soon as the evaluator is satisfied, so easy tasks finish in one round and hard ones get the refinement they need.
Module 6Choosing the Right Pattern
6.1Start Simple
- Default to the simplest option and escalate only when it earns its keep.
Anthropic's guidance is to reach for the least complex thing that works. Many production problems are solved by a single augmented LLM call with retrieval and tools — no orchestration at all. When you do need structure, prefer a workflow (prompt chaining, routing, parallelization) because its predictable code paths are easier to test and debug. Move to orchestrator-workers or a full agent only when the steps genuinely can't be known ahead of time. Every layer of autonomy you add trades away predictability, latency, and cost — so add it deliberately, measure the result, and remove it if a simpler pattern would have done.
Resources
- Anthropic — Building Effective Agents (primary source for all five patterns):
- Anthropic Python SDK: