Claude Code's /goal Command: Autonomous Coding with Adversarial Verification

Name: Claude Code's /goal Command: Autonomous Coding with Adversarial Verification
Author: vybecoding

Intermediate11m readFull-stack developers

A practical walkthrough of Claude Code's /goal command — how its independent adversarial review loop verifies work, why you plan first and feed tasks two at a time under a ~3,500-character ceiling, and the memory system that keeps long autonomous runs from drifting.

Primary Focus

development

AI Tools Covered

claude-codegoal-commandautonomous-agents

What You'll Learn

✓The command in one sentence
✓The adversarial verification loop (the two-Claude review)
✓Why this beats a naive autonomous loop
✓Prerequisites — update and unblock
✓The planning phase (the part that actually matters)
✓Exporting tasks — the two-task cadence and the character ceiling

Guide Curriculum

What /goal Is and Why It Matters

Learn key concepts

3 lessons

•The command in one sentence1m
•The adversarial verification loop (the two-Claude review)1m
•Why this beats a naive autonomous loop1m

The Workflow — From Setup to Execution

Learn key concepts

4 lessons

•Prerequisites — update and unblock1m
•The planning phase (the part that actually matters)1m
•Exporting tasks — the two-task cadence and the character ceiling1m
•Running the goal1m

Best Practices and Failure Modes

Learn key concepts

4 lessons

•Write completion criteria a machine can check1m
•The memory system requirement1m
•Failure modes and misuse cases1m
•Monitor cost early, then trust the loop1m

Preview: First Lesson

What /goal Is and Why It Matters

The command in one sentence

Module objectives

Define /goal precisely and place it in the Claude Code feature set.
Understand the adversarial verification loop that makes the command trustworthy.
See why this beats a naive "keep going until done" loop.

/goal gives Claude Code the ability to operate autonomously toward a defined objective. You state a clear task and a way to verify it is complete; Claude then plans, writes, tests, refactors, verifies, and iterates across multiple turns — sometimes for hours — until the goal condition is satisfied or you stop it.

Claude Code tracks resource usage as it runs: elapsed time, turns, and tokens. /goal is available in interactive mode, programmatically via the -p flag, and through remote control surfaces.

The shape of a good goal is:

/goal [do the work] until [a measurable end state] without [constraints that must hold]

The most important word is measurable. The end state is what the verification loop checks against, so a vague goal ("make the auth flow better") gives the verifier nothing concrete to test, while a measurable goal ("all tests in auth.test.ts pass and npm run typecheck exits 0") gives it a clear yes/no.

Free Access

Start learning with this comprehensive guide

This guide includes:

3 modules with 11 lessons

11m estimated reading time

About the Author

✨ Vibe Coder

@hiram-clark

Hiram Clark is the founder of vybecoding.ai and editor of every guide and news article published on the site. He reviews all AI-drafted content for accuracy before publication and is personally accountable for factual errors. He works hands-on with the AI development tools, workflows, and infrastructure covered here.

Full Guide Content

Complete lesson text — start the interactive course above for exercises and progress tracking.

Module 1What /goal Is and Why It Matters

1.1The command in one sentence

Module objectives

Define /goal precisely and place it in the Claude Code feature set.
Understand the adversarial verification loop that makes the command trustworthy.
See why this beats a naive "keep going until done" loop.

Claude Code tracks resource usage as it runs: elapsed time, turns, and tokens. /goal is available in interactive mode, programmatically via the -p flag, and through remote control surfaces.

The shape of a good goal is:

/goal [do the work] until [a measurable end state] without [constraints that must hold]

1.2The adversarial verification loop (the two-Claude review)

This is the feature that makes /goal worth using.

In an ordinary Claude Code session, the same model that did the work also decides it is finished. That is the root of two well-known problems: the model hallucinates that it completed a task, or it does the minimum and declares success without truly meeting the goal.

/goal breaks that conflict of interest by separating doing from judging:

Step-level validation. A small, fast validator model runs after steps the main agent takes and answers exactly one question: "Has the goal been met?" If the answer is no, the main model keeps working. If yes, the loop can close.
Independent final review. Before the run is reported complete, a separate, independent Claude session reviews the resulting repository state to confirm the goal was actually achieved — rather than trusting the primary agent's self-report. This independent pass is the "adversarial review": its job is to try to find that the work is not done.

The practical effect: the primary Claude cannot simply assert "done" and exit. A second judge has to agree, and that judge did not write the code, so it has no incentive to protect it.

1.3Why this beats a naive autonomous loop

You can build a crude autonomous loop yourself — a "keep prompting Claude until it says finished" script (the community calls the pattern a "Ralph loop"). The problem is that such loops trust the agent's own claim of completion, so they inherit the exact hallucination-and-laziness problems above.

/goal is that loop with an adversarial verifier wired in. The autonomy gives you unattended progress; the independent review gives you a reason to trust the result. Without the second half, autonomy just lets a confident-but-wrong agent run longer.

Module 2The Workflow — From Setup to Execution

2.1Prerequisites — update and unblock

Module objectives

Get your environment ready so /goal can run without constant interruptions.
Run the planning phase that the whole workflow depends on.
Export tasks in the correct shape and feed them to /goal.

Two setup steps, both quick:

Use a current Claude Code. /goal needs version 2.1.139 or newer. Update the CLI, then fully exit and restart Claude Code so the new command registers. If /goal does not appear as an option, you are on an old build.
Reduce interruptions. A long autonomous run defeats its purpose if it stops every few seconds to ask permission. Turn on an auto-accept / "accept edits" mode, or run with bypassed permissions, so Claude can keep working. Only do this in a context where you are comfortable with Claude editing files unattended — treat it like any other powerful automation.

2.2The planning phase (the part that actually matters)

Practitioner pattern. This planning ritual is the single highest-leverage habit for /goal, and it is where most people who get poor results went wrong.

Do not hand /goal a raw, half-formed idea. First run a real planning pass with Claude on your project:

Take your brainstormed ideas and have Claude break them into independent phases and tasks — small, atomic units of work.
For each task, define how you will know it is done (the verification criteria).
Save that plan to long-term memory (see Module 3, Lesson 2). The plan has to persist beyond the current chat window, because you will refer back to it across many /goal runs.

If writing exact tasks and verification criteria feels intimidating, that is fine — the point of the planning pass is to have Claude produce that detail with you, before any autonomous work starts. The plan is the contract; /goal executes against it.

2.3Exporting tasks — the two-task cadence and the character ceiling

Once the plan is saved, pull work out of it in small, verifiable slices. The practitioner prompt is:

Give me the next two tasks from our project plan. Include verification
criteria to ensure each task is completed and working, and do not exceed
3500 characters for any single task plus its verification list.

Two numbers in that prompt matter:

Two tasks at a time (three at most). It is tempting to dump the whole plan in at once, but during real coding the plan changes — Claude hits bugs, blockers, and configuration surprises, and may need to reorder phases. Small batches keep the plan adaptive instead of forcing a stale ordering.
~3,500 characters per task + verification. A single /goal input has a hard ceiling (the practitioner figure is 4,000 characters), and models are unreliable at counting their own characters. Asking for 3,500 leaves a ~500-character safety buffer so the exported task reliably fits. The deeper principle holds regardless of the exact cap: keep each goal small enough to fit comfortably inside the input limit, because an overlong goal gets truncated and the verifier ends up checking against an incomplete condition.

2.4Running the goal

Take task one and its verification criteria, run /goal, and paste them in. Claude starts working independently and keeps going until the work is independently verified, not merely claimed done. Repeat for task two.

You can stop an in-flight goal with /goal clear (aliases include stop, cancel, and reset), or press Ctrl+C in -p mode, if you need to redirect. Native /goal has no pause/resume — it is active until the condition is achieved or you clear it. Then export the next two tasks and repeat. The loop is: export two tasks → run them through /goal → verify → export the next two.

Module 3Best Practices and Failure Modes

3.1Write completion criteria a machine can check

Module objectives

Write completion criteria the verifier can actually check.
Stand up the long-term memory the workflow depends on.
Recognize the ways /goal is misused so you can avoid them.

The verifier is only as good as the end state you give it. A measurable condition is one with an objective yes/no answer:

Good: "all tests in payments.test.ts pass, npm run typecheck exits 0, and the new endpoint returns 200 for a valid request."
Weak: "improve the payments code" — there is nothing to verify, so the check degrades into a fuzzy semantic guess, and a fuzzy guess is exactly where drift comes from.

Prefer conditions tied to commands and outputs (tests passing, type checks clean, a specific log line, an HTTP status) over subjective quality judgments.

3.2The memory system requirement

Practitioner pattern, strongly recommended. The planning-first workflow only works if the plan and project context persist. Long autonomous runs span more context than a single chat window holds.

Give Claude a long-term memory ("second brain") — for example an Obsidian vault or an equivalent persistent note store — and write the plan, the high-level goals, and session notes into it. Without persistent memory:

Claude will not accurately recall the details of your plan across runs.
It loses the why behind the project and starts optimizing locally.
The two-task cadence breaks, because each batch needs the plan to pull from.

The pro habit: every few completed tasks, ask Claude to re-read the plan and recent session notes from memory and confirm you are still on track toward the high-level goals. In practice Claude will often reorder or revise the plan as it learns the real constraints — that is good, because it is now working with better context. Powering blindly through the original plan is what produces bugs you later have to refactor.

3.3Failure modes and misuse cases

These are the ways /goal wastes time or money:

Vague completion conditions. No measurable end state means the verifier cannot give a clean yes/no, so the loop either stops too early or never converges.
Skipping the planning phase. Handing /goal an unplanned idea produces sprawling, unfocused work — the command amplifies whatever structure you gave it, including none.
No persistent memory. Without a second brain the agent forgets the plan between runs and drifts off the high-level goal.
Goals that exceed the input ceiling. An overlong task gets truncated; the verifier then checks against a partial condition and can wrongly report success.
Dumping the entire plan at once. Large batches lock in a task ordering that real coding will invalidate, so you end up fighting your own stale plan.
Open-ended, creative, or unbounded goals. Tasks with no objective finish line (e.g. "make the UX delightful") have no end state to verify and can run indefinitely. Set a reasonable scope and monitor early runs to estimate cost before letting one run long.

3.4Monitor cost early, then trust the loop

Because a goal can run for a long time, watch the first few runs: keep an eye on the turn count, token usage, and elapsed time the command reports, and use them to estimate what a long run will cost. Once you trust your completion criteria and your batch sizes, you can let /goal work unattended — that unattended-but-verified progress is the entire point of the command.

Summary

/goal turns Claude Code into an autonomous worker whose results you can trust, because a separate, independent Claude session has to agree the goal was met before the run is reported done. To get the most from it: plan first and save the plan to long-term memory, export work two tasks at a time with verification criteria under a ~3,500-character ceiling, write completion conditions a machine can check, and review the plan periodically so a long run stays aligned with your high-level goals.