tdd-flow — test-first development by construction
tdd-flow drives a single change test-first: write pseudocode, derive failing
tests from it, prove they’re red, then implement until green — with a check-in
gate between each step. Coverage is built by construction rather than bolted on
afterwards.
Introduced in #604 (part of the test-driven-development capability #593).
Why it exists
Section titled “Why it exists”/warp-drive already gates on tests — but after implementation: it runs the
suite in its testing phase and blocks the commit on red. That catches
regressions; it does not give you test-first discipline, and it assumes a
suite already exists. tdd-flow adds the test-first path on top of that gate: the
test is written and proven to fail before the code exists, so the test is known
to actually exercise the new behaviour.
Per the Prime Directive it is built on reuse, not reinvention:
| Concern | Reused from |
|---|---|
| Iterate-until-green loop | loop-primitive (scripts/loop/run.js) |
| “Green” definition | warp-drive’s test-command resolution (one source of truth) |
| Check-in level policy | warp-drive’s L1/L2/L3 automation conventions |
The flow
Section titled “The flow”pseudocode -> write failing tests -> implement until green -> check-in │ │ │ │ checkin.sh assert-red.sh loop-primitive checkin.sh1. Pseudocode → check-in
Section titled “1. Pseudocode → check-in”Write the intended behaviour as plain-language pseudocode, capture it as the first inspectable artifact, then gate:
scripts/tdd-flow/artifact.sh initscripts/tdd-flow/artifact.sh capture --step pseudocode --stdin <<<'…pseudocode…'scripts/tdd-flow/checkin.sh --gate after-pseudocode # resolve the level policyscripts/tdd-flow/artifact.sh verdict --step pseudocode --decision approve2. Write failing tests → assert red
Section titled “2. Write failing tests → assert red”Translate the pseudocode into tests in the repo’s native framework, then prove they fail before writing any implementation:
For a non-default language, detect the framework and let the
unit-test-generator agent author
idiomatic tests for it (cross-language authoring,
#605):
scripts/tdd-flow/detect-framework.sh # {language, framework, runner, test_convention}# unit-test-generator authors the failing tests for the detected frameworkscripts/tdd-flow/assert-red.sh # exit 0 only when the suite is REDscripts/tdd-flow/checkin.sh --gate after-testsdetect-framework.sh maps repo markers (Cargo.toml, go.mod, pyproject.toml,
package.json) to the native framework + runner, so authored tests run under the
project’s existing runner with no bespoke harness. The agent emits idiomatic
Rust/Go/Python/JS tests and never writes a vacuous (always-green) test — see the
Rust and Go examples.
assert-red.sh then exits 0 when the tests fail (precondition satisfied), 1
when they already pass (test-first violated — the tests assert nothing new), and
2 when no test command can be resolved. tdd-flow gates on their redness
regardless of who wrote them.
3. Implement until green
Section titled “3. Implement until green”Run the implement loop on the delivered loop-primitive — no new runner:
cp ~/.claude/templates/loops/implement-until-green.json /tmp/impl.json# set evaluator.command to the project's test command, then:node ~/.claude/scripts/loop/run.js /tmp/impl.json --cwd "$(pwd)"The template forbids editing tests to force green and halts on its guardrails
(max_iterations, cost/time budgets) as a blockable event.
4. Check-in
Section titled “4. Check-in”scripts/tdd-flow/checkin.sh --gate after-greenInspectable artifacts & check-in verdicts
Section titled “Inspectable artifacts & check-in verdicts”Each step lands as a durable, on-disk artifact, and each check-in records an
explicit verdict — so a reviewer can intervene early and audit the run afterwards
rather than relying on terminal scrollback. Both live in artifact.sh, backed by
a per-run store at .tdd-flow/runs/<run-id>/ (one file per step plus a
manifest.json ledger). The store is gitignored working state.
scripts/tdd-flow/artifact.sh init # {run, dir}scripts/tdd-flow/artifact.sh capture --step tests --file tests/foo_test.rs --ext rsscripts/tdd-flow/artifact.sh verdict --step tests --decision approvescripts/tdd-flow/artifact.sh manifest # review the whole runThe verdict’s exit code is what drives early intervention — a rejected step loops back for revision instead of aborting the flow:
| Decision | Exit | Meaning |
|---|---|---|
approve |
0 |
Proceed to the next step. |
reject |
20 |
Loop back — revise this step, re-capture, re-gate. |
edit |
21 |
The reviewer edited the artifact — re-capture it, then proceed. |
The ledger is append-only, so a reject → revise → approve cycle stays visible
after the fact. Re-capturing a step bumps its revision so revisions are
distinguishable from the original.
checkin.sh and artifact.sh split the job: checkin.sh resolves how the gate
behaves for the active level (ask / confirm / auto); artifact.sh captures the
artifact and records the decision + loop-back signal. At L1/L2 the calling
flow asks the human and feeds the answer to verdict; at L3 the gate degrades to
auto-proceed-with-record — checkin.sh returns auto-proceed and the flow records
verdict --mode auto (decision defaults to approve, level captured for audit).
Check-in behaviour by automation level
Section titled “Check-in behaviour by automation level”| Level | Behaviour | checkin.sh exit |
recorded verdict |
|---|---|---|---|
| L1 (supervised) | Ask — stop for an explicit human decision | 10 |
human approve/reject/edit |
| L2 (trusted dev) | Confirm — proceed on confirmation | 10 |
human approve/reject/edit |
| L3 (autonomous) | Auto-proceed-with-record — don’t block; log it | 0 |
--mode auto approve |
Level is read from .claude/settings.local.json (_automation.active_level),
defaulting to L1/ask when unset. With RDB enabled the verdict carries
"rdb": true and the L1/L2 ask routes through ask_remote.
Composition with warp-drive
Section titled “Composition with warp-drive”Run tdd-flow within warp-drive’s coding phase as the test-first way to
produce a chunk; warp-drive’s existing testing phase then gates the commit
unchanged. Because assert-red.sh and the loop evaluator resolve the test command
the same way warp-drive does, the commit gate re-confirms the same green — it
does not double-gate, and nothing here overrides test_before_commit.
See also
Section titled “See also”loop-primitivereference — the iterate-until-done runner this flow’s implement step rides on- warp-drive how-to — the post-hoc test gate
tdd-flowcomposes with