I Built a Multi-Agent Sprint System for Claude Code — Here's How It Works

I’ve been using Claude Code almost daily for the past year. It’s the best AI coding tool I’ve ever used. But there’s a pattern I kept running into.

I’d ask it to build a feature, and it would jump straight to code. No architecture thinking. No consideration of existing patterns. No real review process. The output was good — sometimes great — but it was unstructured. One long conversation where planning, building, and reviewing all blur together in the same context window.

I wanted something different. I wanted the workflow I’d use if I had a small engineering team: someone to challenge the idea, someone to architect, someone to build, independent reviewers who don’t know each other exist, and a QA engineer who actually runs the code. Except all of them are Claude.

So I built micro-squad.

What It Is

micro-squad is a drop-in multi-agent sprint system for Claude Code. It’s a set of Markdown skill files that coordinate sub-agents through a structured workflow: think → plan → build → verify → ship.

No binaries. No servers. No external dependencies. Just Markdown files and Claude Code’s native Agent tool.

You type /squad add dark mode support and this happens:

THINK ─── Forcing questions, explore alternatives
          You pick scope: EXPAND / SELECTIVE / HOLD / REDUCE
              │
PLAN ─────── Architect ──┐ parallel    Unified plan with
              Scout ─────┘              effort estimates
              │
BUILD ─────── Builder implements with atomic commits
              Auto-reverts on regression, 3-strike escalation
              │
VERIFY ───── Judge A ───┐
              Judge B ───┤ parallel    Consensus table:
              QA Agent ──┘              FIX / TRIAGE / DISMISS
              │
              Fix Agent → Re-judge (max 2 rounds)
              │
SHIP ──────── Commit + PR with full evidence trail

Each phase produces artifacts in a .squad/ directory. Everything is inspectable, diffable, and git-friendly.

The Part I’m Most Proud Of: Judgment Day

This is the verify phase, and it’s where micro-squad really earns its keep.

After the builder finishes, three agents launch in parallel:

Judge A — an adversarial code reviewer
Judge B — another adversarial code reviewer
QA Agent — runs tests, checks edge cases, verifies regressions

Here’s the key: neither judge knows the other exists. They review independently, with no shared context. Then the orchestrator builds a consensus table:

Finding	Severity	Judge A	Judge B	QA	Verdict
Null check missing	CRITICAL	YES	YES	—	FIX
Race condition	WARNING	YES	no	FAIL	FIX
Naming inconsistency	SUGGESTION	YES	no	—	TRIAGE

The verdict rules are simple:

FIX: Found by 2+ agents, or ANY single CRITICAL finding
TRIAGE: Found by 1 agent only (non-critical) — you decide
DISMISS: Contradicted by another agent + SUGGESTION severity only

If FIX items exist, a Fix Agent patches the confirmed issues, then all three agents re-run for round 2. Max 2 rounds — if it’s still broken after that, it escalates to you with the full history.

The dual blind approach catches things a single reviewer misses. I’ve seen Judge A flag a race condition that Judge B missed, and Judge B catch a security issue Judge A glossed over. The consensus protocol filters false positives while preserving real findings.

Why Parallel Agents Matter

Context isolation is the main reason this works better than one long conversation.

When Claude builds a feature in a single session, the context window fills with implementation details. By the time it gets to “review your own work,” it’s already anchored to its own decisions. It’s hard to be adversarial about code you just wrote.

micro-squad solves this by giving each agent fresh context. The builder only sees the plan. The judges only see the plan and the diff. The QA agent only sees what should work and what was built. Nobody carries the builder’s assumptions.

The other benefit: real parallelism. Architect + Scout run simultaneously. Judge A + Judge B + QA run simultaneously. This isn’t sequential “first do this, then that” — it’s concurrent work that finishes faster.

The Forcing Questions

The think phase is inspired by a simple idea: most bad implementations trace back to unclear requirements, not bad code.

Before any code gets written, the system asks you 3 forcing questions from a menu:

Demand Reality — “What specifically is broken or missing?” (for vague requests)
Status Quo — “What happens if we do nothing for 3 months?” (for nice-to-haves)
Narrowest Wedge — “What’s the smallest version that makes one user happy?” (for large scope)
Future Fit — “If this succeeds at 100x scale, what must it handle?” (for infrastructure)

Then you pick a scope mode: EXPAND, SELECTIVE, HOLD, or REDUCE. This bounds the work before architecture begins.

The Builder’s Guardrails

The build agent isn’t just “implement the plan.” It has strict self-regulation:

Atomic commits — one logical change per commit
Auto-revert on regression — if a change breaks an unrelated test, revert immediately
Three-strike rule — after 3 failed attempts at the same problem, stop and report
Blast radius check — if a change touches more than 5 files unexpectedly, pause
No gold-plating — implement exactly the plan, nothing more

The three-strike rule is important. Without it, AI agents will keep retrying the same broken approach forever, burning tokens and making things worse. Three strikes means it escalates to you with what it learned — which is often more useful than a questionable fix.

Learning Loop

Every time /verify runs, it captures 1-3 key findings from the consensus table and appends them to a learnings.md file. Next time any agent runs on that project, it reads past findings first.

This means the system gets smarter over sprints. If Judge A flagged a null check pattern in sprint 1, the builder in sprint 3 already knows to handle it. Capped at 50 entries to prevent bloat.

The Philosophy Behind It

micro-squad is built on three principles:

Boil the Lake. AI makes completeness nearly free. When doing 100% costs minutes more than doing 90%, always do 100%. The last 10% that teams used to skip — full edge case coverage, complete error paths, comprehensive tests — costs seconds now. Shipping shortcuts is legacy thinking.

Search Before Building. Before designing anything, check what exists. There are three layers of knowledge: tried-and-true patterns (verify them), new-and-popular approaches (scrutinize them), and first-principles reasoning (prize this above everything). The best outcome of searching isn’t finding a solution to copy — it’s discovering why the conventional approach is wrong.

User Sovereignty. AI recommends. You decide. Always. Two models agreeing on a change is a strong signal, not a mandate. The system never skips the verification step because it’s confident.

What It Looks Like in Practice

Here’s a real sprint I ran to upgrade micro-squad itself:

/squad upgrade prompts and add retro skill

THINK — Asked 3 forcing questions, generated 3 approaches,
        picked "philosophy injection + prompt sharpening"
PLAN  — Architect and Scout ran in parallel, produced a
        unified plan: 10 files, 9 steps
BUILD — Builder implemented all changes: ETHOS.md, CLAUDE.md,
        retro/SKILL.md, sharpened agent prompts, learnings system
VERIFY — Judge A found 2 warnings, Judge B found 4, QA found 1 failure
         Consensus: 4 FIX items. Fix Agent patched all 4.
         2 TRIAGE items presented — I fixed both.
         Verdict: APPROVED WITH NOTES
SHIP  — Committed and pushed

The whole thing took about 20 minutes. The verification phase alone caught issues I wouldn’t have noticed in a manual review — a missing entry in the contract’s phase table, duplicated principles, ambiguous file paths.

Install

git clone https://github.com/SebastianPuchet/micro-squad.git ~/.claude/skills/micro-squad
cd ~/.claude/skills/micro-squad && ./setup

That’s it. 9 skills get symlinked to ~/.claude/skills/. Each one works standalone or as part of a full sprint:

Command	What it does
`/squad <task>`	Full sprint — think, plan, build, verify, ship
`/think`	Challenge assumptions with forcing questions
`/plan`	Parallel architect + scout → unified plan
`/build`	Implementation with guardrails
`/verify`	Judgment day: dual blind review + QA
`/secure`	Infrastructure-first security audit
`/investigate <bug>`	3-strike root cause debugging
`/ship`	PR with full evidence trail
`/retro`	Sprint retrospective — git stats and patterns

What’s Next

micro-squad is what I use daily. It’s opinionated, it’s small, and it works. The entire system is ~2,000 lines of Markdown. No build step, no dependencies, no lock-in.

If you’re using Claude Code and want more structure than a single conversation — or if you just want independent reviewers catching each other’s blind spots — give it a try.

The code is at github.com/SebastianPuchet/micro-squad. MIT licensed.

If you found this useful, I’d appreciate a star on GitHub.

I Built a Multi-Agent Sprint System for Claude Code — Here's How It Works

What It Is

The Part I’m Most Proud Of: Judgment Day

Why Parallel Agents Matter

The Forcing Questions

The Builder’s Guardrails

Learning Loop

The Philosophy Behind It

What It Looks Like in Practice

Install

What’s Next

Recent Blogs

Apr 23, 2026. Incremental Hydration and the End of the Uncanny Valley in Angular

Apr 20, 2026. Testing Modern Angular with Vitest in Browser Mode

Apr 16, 2026. Vertical Slicing and the Architecture Matrix: Scaling Angular Codebases

Apr 13, 2026. State Management in Modern Angular: From Services to Signal Stores

Apr 09, 2026. Signal Forms: Angular Finally Gets Forms Right