I’ve been using Claude Code almost daily for the past year. It’s the best AI coding tool I’ve ever used. But there’s a pattern I kept running into.
I’d ask it to build a feature, and it would jump straight to code. No architecture thinking. No consideration of existing patterns. No real review process. The output was good — sometimes great — but it was unstructured. One long conversation where planning, building, and reviewing all blur together in the same context window.
I wanted something different. I wanted the workflow I’d use if I had a small engineering team: someone to challenge the idea, someone to architect, someone to build, independent reviewers who don’t know each other exist, and a QA engineer who actually runs the code. Except all of them are Claude.
So I built micro-squad.
What It Is
micro-squad is a drop-in multi-agent sprint system for Claude Code. It’s a set of Markdown skill files that coordinate sub-agents through a structured workflow: think → plan → build → verify → ship.
No binaries. No servers. No external dependencies. Just Markdown files and Claude Code’s native Agent tool.
You type /squad add dark mode support and this happens:
THINK ─── Forcing questions, explore alternatives
You pick scope: EXPAND / SELECTIVE / HOLD / REDUCE
│
PLAN ─────── Architect ──┐ parallel Unified plan with
Scout ─────┘ effort estimates
│
BUILD ─────── Builder implements with atomic commits
Auto-reverts on regression, 3-strike escalation
│
VERIFY ───── Judge A ───┐
Judge B ───┤ parallel Consensus table:
QA Agent ──┘ FIX / TRIAGE / DISMISS
│
Fix Agent → Re-judge (max 2 rounds)
│
SHIP ──────── Commit + PR with full evidence trail
Each phase produces artifacts in a .squad/ directory. Everything is inspectable, diffable, and git-friendly.
The Part I’m Most Proud Of: Judgment Day
This is the verify phase, and it’s where micro-squad really earns its keep.
After the builder finishes, three agents launch in parallel:
- Judge A — an adversarial code reviewer
- Judge B — another adversarial code reviewer
- QA Agent — runs tests, checks edge cases, verifies regressions
Here’s the key: neither judge knows the other exists. They review independently, with no shared context. Then the orchestrator builds a consensus table:
| Finding | Severity | Judge A | Judge B | QA | Verdict |
|---|---|---|---|---|---|
| Null check missing | CRITICAL | YES | YES | — | FIX |
| Race condition | WARNING | YES | no | FAIL | FIX |
| Naming inconsistency | SUGGESTION | YES | no | — | TRIAGE |
The verdict rules are simple:
- FIX: Found by 2+ agents, or ANY single CRITICAL finding
- TRIAGE: Found by 1 agent only (non-critical) — you decide
- DISMISS: Contradicted by another agent + SUGGESTION severity only
If FIX items exist, a Fix Agent patches the confirmed issues, then all three agents re-run for round 2. Max 2 rounds — if it’s still broken after that, it escalates to you with the full history.
The dual blind approach catches things a single reviewer misses. I’ve seen Judge A flag a race condition that Judge B missed, and Judge B catch a security issue Judge A glossed over. The consensus protocol filters false positives while preserving real findings.
Why Parallel Agents Matter
Context isolation is the main reason this works better than one long conversation.
When Claude builds a feature in a single session, the context window fills with implementation details. By the time it gets to “review your own work,” it’s already anchored to its own decisions. It’s hard to be adversarial about code you just wrote.
micro-squad solves this by giving each agent fresh context. The builder only sees the plan. The judges only see the plan and the diff. The QA agent only sees what should work and what was built. Nobody carries the builder’s assumptions.
The other benefit: real parallelism. Architect + Scout run simultaneously. Judge A + Judge B + QA run simultaneously. This isn’t sequential “first do this, then that” — it’s concurrent work that finishes faster.
The Forcing Questions
The think phase is inspired by a simple idea: most bad implementations trace back to unclear requirements, not bad code.
Before any code gets written, the system asks you 3 forcing questions from a menu:
- Demand Reality — “What specifically is broken or missing?” (for vague requests)
- Status Quo — “What happens if we do nothing for 3 months?” (for nice-to-haves)
- Narrowest Wedge — “What’s the smallest version that makes one user happy?” (for large scope)
- Future Fit — “If this succeeds at 100x scale, what must it handle?” (for infrastructure)
Then you pick a scope mode: EXPAND, SELECTIVE, HOLD, or REDUCE. This bounds the work before architecture begins.
The Builder’s Guardrails
The build agent isn’t just “implement the plan.” It has strict self-regulation:
- Atomic commits — one logical change per commit
- Auto-revert on regression — if a change breaks an unrelated test, revert immediately
- Three-strike rule — after 3 failed attempts at the same problem, stop and report
- Blast radius check — if a change touches more than 5 files unexpectedly, pause
- No gold-plating — implement exactly the plan, nothing more
The three-strike rule is important. Without it, AI agents will keep retrying the same broken approach forever, burning tokens and making things worse. Three strikes means it escalates to you with what it learned — which is often more useful than a questionable fix.
Learning Loop
Every time /verify runs, it captures 1-3 key findings from the consensus table and appends them to a learnings.md file. Next time any agent runs on that project, it reads past findings first.
This means the system gets smarter over sprints. If Judge A flagged a null check pattern in sprint 1, the builder in sprint 3 already knows to handle it. Capped at 50 entries to prevent bloat.
The Philosophy Behind It
micro-squad is built on three principles:
Boil the Lake. AI makes completeness nearly free. When doing 100% costs minutes more than doing 90%, always do 100%. The last 10% that teams used to skip — full edge case coverage, complete error paths, comprehensive tests — costs seconds now. Shipping shortcuts is legacy thinking.
Search Before Building. Before designing anything, check what exists. There are three layers of knowledge: tried-and-true patterns (verify them), new-and-popular approaches (scrutinize them), and first-principles reasoning (prize this above everything). The best outcome of searching isn’t finding a solution to copy — it’s discovering why the conventional approach is wrong.
User Sovereignty. AI recommends. You decide. Always. Two models agreeing on a change is a strong signal, not a mandate. The system never skips the verification step because it’s confident.
What It Looks Like in Practice
Here’s a real sprint I ran to upgrade micro-squad itself:
/squad upgrade prompts and add retro skill
THINK — Asked 3 forcing questions, generated 3 approaches,
picked "philosophy injection + prompt sharpening"
PLAN — Architect and Scout ran in parallel, produced a
unified plan: 10 files, 9 steps
BUILD — Builder implemented all changes: ETHOS.md, CLAUDE.md,
retro/SKILL.md, sharpened agent prompts, learnings system
VERIFY — Judge A found 2 warnings, Judge B found 4, QA found 1 failure
Consensus: 4 FIX items. Fix Agent patched all 4.
2 TRIAGE items presented — I fixed both.
Verdict: APPROVED WITH NOTES
SHIP — Committed and pushed
The whole thing took about 20 minutes. The verification phase alone caught issues I wouldn’t have noticed in a manual review — a missing entry in the contract’s phase table, duplicated principles, ambiguous file paths.
Install
git clone https://github.com/SebastianPuchet/micro-squad.git ~/.claude/skills/micro-squad
cd ~/.claude/skills/micro-squad && ./setup
That’s it. 9 skills get symlinked to ~/.claude/skills/. Each one works standalone or as part of a full sprint:
| Command | What it does |
|---|---|
/squad <task> | Full sprint — think, plan, build, verify, ship |
/think | Challenge assumptions with forcing questions |
/plan | Parallel architect + scout → unified plan |
/build | Implementation with guardrails |
/verify | Judgment day: dual blind review + QA |
/secure | Infrastructure-first security audit |
/investigate <bug> | 3-strike root cause debugging |
/ship | PR with full evidence trail |
/retro | Sprint retrospective — git stats and patterns |
What’s Next
micro-squad is what I use daily. It’s opinionated, it’s small, and it works. The entire system is ~2,000 lines of Markdown. No build step, no dependencies, no lock-in.
If you’re using Claude Code and want more structure than a single conversation — or if you just want independent reviewers catching each other’s blind spots — give it a try.
The code is at github.com/SebastianPuchet/micro-squad. MIT licensed.
If you found this useful, I’d appreciate a star on GitHub.