5 Patterns for Building Production AI Agents

Name: 5 Patterns for Building Production AI Agents
Author: vybecoding

Intermediate8m readFull-stack developers

The five agentic patterns from Anthropic's 'Building Effective Agents' — prompt chaining, routing, parallelization, orchestrator-workers, and evaluator-optimizer — each with one runnable Python example. Start simple, add complexity only when it pays for itself.

Primary Focus

ai and-machine-learning

AI Tools Covered

ai-agentsanthropicpatterns

What You'll Learn

✓Sequential Steps with a Gate
✓Classify, Then Dispatch
✓Sectioning with Concurrent Calls
✓Dynamic Decomposition
✓The Feedback Loop
✓Start Simple

Guide Curriculum

Prompt Chaining

Learn key concepts

1 lessons

•Sequential Steps with a Gate2m

Routing

Learn key concepts

1 lessons

•Classify, Then Dispatch1m

Parallelization

Learn key concepts

1 lessons

•Sectioning with Concurrent Calls1m

Orchestrator-Workers

Learn key concepts

1 lessons

•Dynamic Decomposition1m

Evaluator-Optimizer

Learn key concepts

1 lessons

•The Feedback Loop1m

Choosing the Right Pattern

Learn key concepts

1 lessons

•Start Simple1m

Preview: First Lesson

Prompt Chaining

Sequential Steps with a Gate

Module objectives:

Decompose one hard task into a fixed sequence of smaller LLM calls.
Add a programmatic gate between steps so a bad intermediate result stops the chain early.

Prompt chaining decomposes a task into a sequence of steps where each LLM call processes the output of the one before it. Because the steps are fixed, you can add deterministic checks ("gates") between them. Use it when a task splits cleanly into predictable sub-steps — the classic example is write a draft → translate it, with a length check in between.

import anthropic

client = anthropic.Anthropic()  # reads ANTHROPIC_API_KEY from the environment

def call_llm(prompt: str, system: str = "") -> str:
    msg = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        system=system,
        messages=[{"role": "user", "content": prompt}],
    )
    return msg.content[0].text

def chain(topic: str) -> str:
    outline = call_llm(f"Write a 3-bullet outline for a blog post about: {topic}")

    # Programmatic gate: bail out if the model ignored the format.
    if outline.count("\n") < 2:
        raise ValueError("Outline gate failed — too few bullets")

    draft = call_llm(f"Expand this outline into 2 paragraphs:\n\n{outline}")
    polished = call_llm(f"Tighten this draft and fix any awkward phrasing:\n\n{draft}")
    return polished

print(chain("why small AI agents beat big ones"))

Each step is simpler and more reliable than asking for the

Free Access

Start learning with this comprehensive guide

This guide includes:

6 modules with 6 lessons

8m estimated reading time

About the Author

✨ Vibe Coder

@hiram-clark

Hiram Clark is the founder of vybecoding.ai and editor of every guide and news article published on the site. He reviews all AI-drafted content for accuracy before publication and is personally accountable for factual errors. He works hands-on with the AI development tools, workflows, and infrastructure covered here.

Full Guide Content

Complete lesson text — start the interactive course above for exercises and progress tracking.

Module 1Prompt Chaining

1.1Sequential Steps with a Gate

Module objectives:

Decompose one hard task into a fixed sequence of smaller LLM calls.
Add a programmatic gate between steps so a bad intermediate result stops the chain early.

import anthropic

client = anthropic.Anthropic()  # reads ANTHROPIC_API_KEY from the environment

def call_llm(prompt: str, system: str = "") -> str:
    msg = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        system=system,
        messages=[{"role": "user", "content": prompt}],
    )
    return msg.content[0].text

def chain(topic: str) -> str:
    outline = call_llm(f"Write a 3-bullet outline for a blog post about: {topic}")

    # Programmatic gate: bail out if the model ignored the format.
    if outline.count("\n") < 2:
        raise ValueError("Outline gate failed — too few bullets")

    draft = call_llm(f"Expand this outline into 2 paragraphs:\n\n{outline}")
    polished = call_llm(f"Tighten this draft and fix any awkward phrasing:\n\n{draft}")
    return polished

print(chain("why small AI agents beat big ones"))

Each step is simpler and more reliable than asking for the whole thing at once, and the gate catches failures before you spend tokens on later steps.

Module 2Routing

2.1Classify, Then Dispatch

Module objectives:

Classify an input, then dispatch it to a prompt specialized for that category.
Understand why separating classification from handling improves both.

Routing classifies an input and directs it to a specialized follow-up task. This gives you separation of concerns: each category gets its own optimized prompt, and the classifier can be tuned independently. Use it when inputs fall into distinct buckets that are best handled differently — for example, routing customer questions to billing, technical, or general handlers.

def route(query: str) -> str:
    category = call_llm(
        f"Classify this support query as exactly one word — "
        f"billing, technical, or general:\n\n{query}"
    ).strip().lower()

    handlers = {
        "billing": "You are a precise billing specialist. Cite amounts and dates.",
        "technical": "You are a patient engineer. Give step-by-step fixes.",
        "general": "You are a friendly concierge. Keep it short.",
    }
    system = handlers.get(category, handlers["general"])
    return call_llm(query, system=system)

print(route("My card was charged twice this month"))

The cheap classification call protects each specialist prompt from having to handle every possible input.

Module 3Parallelization

3.1Sectioning with Concurrent Calls

Module objectives:

Run independent LLM calls at the same time and aggregate their results.
Distinguish sectioning (split a task) from voting (repeat a task).

Parallelization runs LLM tasks simultaneously and aggregates their outputs. It comes in two flavors: sectioning breaks a task into independent subtasks that run in parallel, and voting runs the same task several times for diverse answers you then combine. Use it when subtasks don't depend on each other — speed and focus both improve.

from concurrent.futures import ThreadPoolExecutor

def parallel_review(code: str) -> dict:
    aspects = {
        "security": "List only security issues in this code.",
        "performance": "List only performance issues in this code.",
        "style": "List only style and readability issues in this code.",
    }
    with ThreadPoolExecutor() as pool:
        futures = {k: pool.submit(call_llm, code, sys) for k, sys in aspects.items()}
        return {k: f.result() for k, f in futures.items()}

for aspect, notes in parallel_review("def f(x): return eval(x)").items():
    print(f"== {aspect} ==\n{notes}\n")

Three focused reviewers running at once beat one reviewer asked to juggle every concern in a single pass.

Module 4Orchestrator-Workers

4.1Dynamic Decomposition

Module objectives:

Let a central LLM break a task into subtasks at runtime, not in advance.
Delegate each subtask to a worker call and synthesize the results.

In the orchestrator-workers pattern, a central LLM dynamically breaks a complex task into subtasks, delegates each to a worker LLM, and synthesizes their outputs. Unlike parallelization, the subtasks aren't fixed in advance — the orchestrator decides them based on the specific input. Use it when you can't predict the breakdown ahead of time, such as a research question that fans out into a different set of sub-questions each time.

import json

def orchestrate(task: str) -> str:
    plan = call_llm(
        f"Break this task into 2-4 independent subtasks. "
        f'Reply as a JSON array of strings only:\n\n{task}'
    )
    subtasks = json.loads(plan)

    results = [call_llm(f"Complete this subtask concisely:\n\n{s}") for s in subtasks]

    return call_llm(
        "Synthesize these findings into one coherent answer:\n\n"
        + "\n\n".join(results)
    )

print(orchestrate("Compare REST and GraphQL for a mobile-first startup"))

The orchestrator owns the what; the workers own the how; a final synthesis call owns the together.

Module 5Evaluator-Optimizer

5.1The Feedback Loop

Module objectives:

Build a generate → critique → revise loop with two LLM roles.
Know when iterative refinement is worth the extra calls.

Evaluator-optimizer pairs two LLM roles: one generates a response while another evaluates it and returns feedback, looping until the work passes. Use it when you have clear evaluation criteria and iteration measurably improves quality — literary translation and complex search are Anthropic's examples. The pattern works best when the feedback is concrete enough to act on.

def evaluate_optimize(task: str, max_rounds: int = 3) -> str:
    answer = call_llm(task)
    for _ in range(max_rounds):
        verdict = call_llm(
            f"Task: {task}\n\nDraft:\n{answer}\n\n"
            "If the draft fully meets the task, reply exactly 'PASS'. "
            "Otherwise give one specific improvement."
        )
        if verdict.strip().upper().startswith("PASS"):
            break
        answer = call_llm(f"Revise the draft using this feedback:\n{verdict}\n\nDraft:\n{answer}")
    return answer

print(evaluate_optimize("Write a one-sentence tagline for a privacy-first email app"))

The loop stops as soon as the evaluator is satisfied, so easy tasks finish in one round and hard ones get the refinement they need.

Module 6Choosing the Right Pattern

6.1Start Simple

Module objectives:

Default to the simplest option and escalate only when it earns its keep.

Anthropic's guidance is to reach for the least complex thing that works. Many production problems are solved by a single augmented LLM call with retrieval and tools — no orchestration at all. When you do need structure, prefer a workflow (prompt chaining, routing, parallelization) because its predictable code paths are easier to test and debug. Move to orchestrator-workers or a full agent only when the steps genuinely can't be known ahead of time. Every layer of autonomy you add trades away predictability, latency, and cost — so add it deliberately, measure the result, and remove it if a simpler pattern would have done.

Resources

Anthropic — Building Effective Agents (primary source for all five patterns):

Anthropic Python SDK: