logo
2026-03-31 Tech 7 min

I Built a Multi-Agent Sprint System for Claude Code — Here's How It Works

I’ve been using Claude Code almost daily for the past year. It’s the best AI coding tool I’ve ever used. But there’s a pattern I kept running into.

I’d ask it to build a feature, and it would jump straight to code. No architecture thinking. No consideration of existing patterns. No real review process. The output was good — sometimes great — but it was unstructured. One long conversation where planning, building, and reviewing all blur together in the same context window.

I wanted something different. I wanted the workflow I’d use if I had a small engineering team: someone to challenge the idea, someone to architect, someone to build, independent reviewers who don’t know each other exist, and a QA engineer who actually runs the code. Except all of them are Claude.

So I built micro-squad.


What It Is

micro-squad is a drop-in multi-agent sprint system for Claude Code. It’s a set of Markdown skill files that coordinate sub-agents through a structured workflow: think → plan → build → verify → ship.

No binaries. No servers. No external dependencies. Just Markdown files and Claude Code’s native Agent tool.

You type /squad add dark mode support and this happens:

THINK ─── Forcing questions, explore alternatives
          You pick scope: EXPAND / SELECTIVE / HOLD / REDUCE

PLAN ─────── Architect ──┐ parallel    Unified plan with
              Scout ─────┘              effort estimates

BUILD ─────── Builder implements with atomic commits
              Auto-reverts on regression, 3-strike escalation

VERIFY ───── Judge A ───┐
              Judge B ───┤ parallel    Consensus table:
              QA Agent ──┘              FIX / TRIAGE / DISMISS

              Fix Agent → Re-judge (max 2 rounds)

SHIP ──────── Commit + PR with full evidence trail

Each phase produces artifacts in a .squad/ directory. Everything is inspectable, diffable, and git-friendly.


The Part I’m Most Proud Of: Judgment Day

This is the verify phase, and it’s where micro-squad really earns its keep.

After the builder finishes, three agents launch in parallel:

Here’s the key: neither judge knows the other exists. They review independently, with no shared context. Then the orchestrator builds a consensus table:

FindingSeverityJudge AJudge BQAVerdict
Null check missingCRITICALYESYESFIX
Race conditionWARNINGYESnoFAILFIX
Naming inconsistencySUGGESTIONYESnoTRIAGE

The verdict rules are simple:

If FIX items exist, a Fix Agent patches the confirmed issues, then all three agents re-run for round 2. Max 2 rounds — if it’s still broken after that, it escalates to you with the full history.

The dual blind approach catches things a single reviewer misses. I’ve seen Judge A flag a race condition that Judge B missed, and Judge B catch a security issue Judge A glossed over. The consensus protocol filters false positives while preserving real findings.


Why Parallel Agents Matter

Context isolation is the main reason this works better than one long conversation.

When Claude builds a feature in a single session, the context window fills with implementation details. By the time it gets to “review your own work,” it’s already anchored to its own decisions. It’s hard to be adversarial about code you just wrote.

micro-squad solves this by giving each agent fresh context. The builder only sees the plan. The judges only see the plan and the diff. The QA agent only sees what should work and what was built. Nobody carries the builder’s assumptions.

The other benefit: real parallelism. Architect + Scout run simultaneously. Judge A + Judge B + QA run simultaneously. This isn’t sequential “first do this, then that” — it’s concurrent work that finishes faster.


The Forcing Questions

The think phase is inspired by a simple idea: most bad implementations trace back to unclear requirements, not bad code.

Before any code gets written, the system asks you 3 forcing questions from a menu:

Then you pick a scope mode: EXPAND, SELECTIVE, HOLD, or REDUCE. This bounds the work before architecture begins.


The Builder’s Guardrails

The build agent isn’t just “implement the plan.” It has strict self-regulation:

  1. Atomic commits — one logical change per commit
  2. Auto-revert on regression — if a change breaks an unrelated test, revert immediately
  3. Three-strike rule — after 3 failed attempts at the same problem, stop and report
  4. Blast radius check — if a change touches more than 5 files unexpectedly, pause
  5. No gold-plating — implement exactly the plan, nothing more

The three-strike rule is important. Without it, AI agents will keep retrying the same broken approach forever, burning tokens and making things worse. Three strikes means it escalates to you with what it learned — which is often more useful than a questionable fix.


Learning Loop

Every time /verify runs, it captures 1-3 key findings from the consensus table and appends them to a learnings.md file. Next time any agent runs on that project, it reads past findings first.

This means the system gets smarter over sprints. If Judge A flagged a null check pattern in sprint 1, the builder in sprint 3 already knows to handle it. Capped at 50 entries to prevent bloat.


The Philosophy Behind It

micro-squad is built on three principles:

Boil the Lake. AI makes completeness nearly free. When doing 100% costs minutes more than doing 90%, always do 100%. The last 10% that teams used to skip — full edge case coverage, complete error paths, comprehensive tests — costs seconds now. Shipping shortcuts is legacy thinking.

Search Before Building. Before designing anything, check what exists. There are three layers of knowledge: tried-and-true patterns (verify them), new-and-popular approaches (scrutinize them), and first-principles reasoning (prize this above everything). The best outcome of searching isn’t finding a solution to copy — it’s discovering why the conventional approach is wrong.

User Sovereignty. AI recommends. You decide. Always. Two models agreeing on a change is a strong signal, not a mandate. The system never skips the verification step because it’s confident.


What It Looks Like in Practice

Here’s a real sprint I ran to upgrade micro-squad itself:

/squad upgrade prompts and add retro skill

THINK — Asked 3 forcing questions, generated 3 approaches,
        picked "philosophy injection + prompt sharpening"
PLAN  — Architect and Scout ran in parallel, produced a
        unified plan: 10 files, 9 steps
BUILD — Builder implemented all changes: ETHOS.md, CLAUDE.md,
        retro/SKILL.md, sharpened agent prompts, learnings system
VERIFY — Judge A found 2 warnings, Judge B found 4, QA found 1 failure
         Consensus: 4 FIX items. Fix Agent patched all 4.
         2 TRIAGE items presented — I fixed both.
         Verdict: APPROVED WITH NOTES
SHIP  — Committed and pushed

The whole thing took about 20 minutes. The verification phase alone caught issues I wouldn’t have noticed in a manual review — a missing entry in the contract’s phase table, duplicated principles, ambiguous file paths.


Install

git clone https://github.com/SebastianPuchet/micro-squad.git ~/.claude/skills/micro-squad
cd ~/.claude/skills/micro-squad && ./setup

That’s it. 9 skills get symlinked to ~/.claude/skills/. Each one works standalone or as part of a full sprint:

CommandWhat it does
/squad <task>Full sprint — think, plan, build, verify, ship
/thinkChallenge assumptions with forcing questions
/planParallel architect + scout → unified plan
/buildImplementation with guardrails
/verifyJudgment day: dual blind review + QA
/secureInfrastructure-first security audit
/investigate <bug>3-strike root cause debugging
/shipPR with full evidence trail
/retroSprint retrospective — git stats and patterns

What’s Next

micro-squad is what I use daily. It’s opinionated, it’s small, and it works. The entire system is ~2,000 lines of Markdown. No build step, no dependencies, no lock-in.

If you’re using Claude Code and want more structure than a single conversation — or if you just want independent reviewers catching each other’s blind spots — give it a try.

The code is at github.com/SebastianPuchet/micro-squad. MIT licensed.


If you found this useful, I’d appreciate a star on GitHub.