Skip to content

tdd-flow — test-first development by construction

tdd-flow drives a single change test-first: write pseudocode, derive failing tests from it, prove they’re red, then implement until green — with a check-in gate between each step. Coverage is built by construction rather than bolted on afterwards.

Introduced in #604 (part of the test-driven-development capability #593).

/warp-drive already gates on tests — but after implementation: it runs the suite in its testing phase and blocks the commit on red. That catches regressions; it does not give you test-first discipline, and it assumes a suite already exists. tdd-flow adds the test-first path on top of that gate: the test is written and proven to fail before the code exists, so the test is known to actually exercise the new behaviour.

Per the Prime Directive it is built on reuse, not reinvention:

Concern Reused from
Iterate-until-green loop loop-primitive (scripts/loop/run.js)
“Green” definition warp-drive’s test-command resolution (one source of truth)
Check-in level policy warp-drive’s L1/L2/L3 automation conventions
pseudocode -> write failing tests -> implement until green -> check-in
│ │ │ │
checkin.sh assert-red.sh loop-primitive checkin.sh

Write the intended behaviour as plain-language pseudocode, capture it as the first inspectable artifact, then gate:

Terminal window
scripts/tdd-flow/artifact.sh init
scripts/tdd-flow/artifact.sh capture --step pseudocode --stdin <<<'…pseudocode…'
scripts/tdd-flow/checkin.sh --gate after-pseudocode # resolve the level policy
scripts/tdd-flow/artifact.sh verdict --step pseudocode --decision approve

Translate the pseudocode into tests in the repo’s native framework, then prove they fail before writing any implementation:

For a non-default language, detect the framework and let the unit-test-generator agent author idiomatic tests for it (cross-language authoring, #605):

Terminal window
scripts/tdd-flow/detect-framework.sh # {language, framework, runner, test_convention}
# unit-test-generator authors the failing tests for the detected framework
scripts/tdd-flow/assert-red.sh # exit 0 only when the suite is RED
scripts/tdd-flow/checkin.sh --gate after-tests

detect-framework.sh maps repo markers (Cargo.toml, go.mod, pyproject.toml, package.json) to the native framework + runner, so authored tests run under the project’s existing runner with no bespoke harness. The agent emits idiomatic Rust/Go/Python/JS tests and never writes a vacuous (always-green) test — see the Rust and Go examples.

assert-red.sh then exits 0 when the tests fail (precondition satisfied), 1 when they already pass (test-first violated — the tests assert nothing new), and 2 when no test command can be resolved. tdd-flow gates on their redness regardless of who wrote them.

Run the implement loop on the delivered loop-primitive — no new runner:

Terminal window
cp ~/.claude/templates/loops/implement-until-green.json /tmp/impl.json
# set evaluator.command to the project's test command, then:
node ~/.claude/scripts/loop/run.js /tmp/impl.json --cwd "$(pwd)"

The template forbids editing tests to force green and halts on its guardrails (max_iterations, cost/time budgets) as a blockable event.

Terminal window
scripts/tdd-flow/checkin.sh --gate after-green

Each step lands as a durable, on-disk artifact, and each check-in records an explicit verdict — so a reviewer can intervene early and audit the run afterwards rather than relying on terminal scrollback. Both live in artifact.sh, backed by a per-run store at .tdd-flow/runs/<run-id>/ (one file per step plus a manifest.json ledger). The store is gitignored working state.

Terminal window
scripts/tdd-flow/artifact.sh init # {run, dir}
scripts/tdd-flow/artifact.sh capture --step tests --file tests/foo_test.rs --ext rs
scripts/tdd-flow/artifact.sh verdict --step tests --decision approve
scripts/tdd-flow/artifact.sh manifest # review the whole run

The verdict’s exit code is what drives early intervention — a rejected step loops back for revision instead of aborting the flow:

Decision Exit Meaning
approve 0 Proceed to the next step.
reject 20 Loop back — revise this step, re-capture, re-gate.
edit 21 The reviewer edited the artifact — re-capture it, then proceed.

The ledger is append-only, so a reject → revise → approve cycle stays visible after the fact. Re-capturing a step bumps its revision so revisions are distinguishable from the original.

checkin.sh and artifact.sh split the job: checkin.sh resolves how the gate behaves for the active level (ask / confirm / auto); artifact.sh captures the artifact and records the decision + loop-back signal. At L1/L2 the calling flow asks the human and feeds the answer to verdict; at L3 the gate degrades to auto-proceed-with-record — checkin.sh returns auto-proceed and the flow records verdict --mode auto (decision defaults to approve, level captured for audit).

Level Behaviour checkin.sh exit recorded verdict
L1 (supervised) Ask — stop for an explicit human decision 10 human approve/reject/edit
L2 (trusted dev) Confirm — proceed on confirmation 10 human approve/reject/edit
L3 (autonomous) Auto-proceed-with-record — don’t block; log it 0 --mode auto approve

Level is read from .claude/settings.local.json (_automation.active_level), defaulting to L1/ask when unset. With RDB enabled the verdict carries "rdb": true and the L1/L2 ask routes through ask_remote.

Run tdd-flow within warp-drive’s coding phase as the test-first way to produce a chunk; warp-drive’s existing testing phase then gates the commit unchanged. Because assert-red.sh and the loop evaluator resolve the test command the same way warp-drive does, the commit gate re-confirms the same green — it does not double-gate, and nothing here overrides test_before_commit.