tdd-flow — test-first development by construction

tdd-flow drives a single change test-first: write pseudocode, derive failing tests from it, prove they’re red, then implement until green — with a check-in gate between each step. Coverage is built by construction rather than bolted on afterwards.

Introduced in #604 (part of the test-driven-development capability #593).

Why it exists

/warp-drive already gates on tests — but after implementation: it runs the suite in its testing phase and blocks the commit on red. That catches regressions; it does not give you test-first discipline, and it assumes a suite already exists. tdd-flow adds the test-first path on top of that gate: the test is written and proven to fail before the code exists, so the test is known to actually exercise the new behaviour.

Per the Prime Directive it is built on reuse, not reinvention:

Concern	Reused from
Iterate-until-green loop	`loop-primitive` (`scripts/loop/run.js`)
“Green” definition	warp-drive’s test-command resolution (one source of truth)
Check-in level policy	warp-drive’s L1/L2/L3 automation conventions

The flow

pseudocode  ->  write failing tests  ->  implement until green  ->  check-in
     │                  │                        │                     │
  checkin.sh        assert-red.sh           loop-primitive        checkin.sh

1. Pseudocode → check-in

Write the intended behaviour as plain-language pseudocode, capture it as the first inspectable artifact, then gate:

scripts/tdd-flow/artifact.sh init
scripts/tdd-flow/artifact.sh capture --step pseudocode --stdin <<<'…pseudocode…'
scripts/tdd-flow/checkin.sh --gate after-pseudocode      # resolve the level policy
scripts/tdd-flow/artifact.sh verdict --step pseudocode --decision approve

2. Write failing tests → assert red

Translate the pseudocode into tests in the repo’s native framework, then prove they fail before writing any implementation:

For a non-default language, detect the framework and let the unit-test-generator agent author idiomatic tests for it (cross-language authoring, #605):

scripts/tdd-flow/detect-framework.sh      # {language, framework, runner, test_convention}
# unit-test-generator authors the failing tests for the detected framework
scripts/tdd-flow/assert-red.sh            # exit 0 only when the suite is RED
scripts/tdd-flow/checkin.sh --gate after-tests

detect-framework.sh maps repo markers (Cargo.toml, go.mod, pyproject.toml, package.json) to the native framework + runner, so authored tests run under the project’s existing runner with no bespoke harness. The agent emits idiomatic Rust/Go/Python/JS tests and never writes a vacuous (always-green) test — see the Rust and Go examples.

assert-red.sh then exits 0 when the tests fail (precondition satisfied), 1 when they already pass (test-first violated — the tests assert nothing new), and 2 when no test command can be resolved. tdd-flow gates on their redness regardless of who wrote them.

3. Implement until green

Run the implement loop on the delivered loop-primitive — no new runner:

cp ~/.claude/templates/loops/implement-until-green.json /tmp/impl.json
# set evaluator.command to the project's test command, then:
node ~/.claude/scripts/loop/run.js /tmp/impl.json --cwd "$(pwd)"

The template forbids editing tests to force green and halts on its guardrails (max_iterations, cost/time budgets) as a blockable event.

4. Check-in

scripts/tdd-flow/checkin.sh --gate after-green

Inspectable artifacts & check-in verdicts

Each step lands as a durable, on-disk artifact, and each check-in records an explicit verdict — so a reviewer can intervene early and audit the run afterwards rather than relying on terminal scrollback. Both live in artifact.sh, backed by a per-run store at .tdd-flow/runs/<run-id>/ (one file per step plus a manifest.json ledger). The store is gitignored working state.

scripts/tdd-flow/artifact.sh init                          # {run, dir}
scripts/tdd-flow/artifact.sh capture --step tests --file tests/foo_test.rs --ext rs
scripts/tdd-flow/artifact.sh verdict --step tests --decision approve
scripts/tdd-flow/artifact.sh manifest                      # review the whole run

The verdict’s exit code is what drives early intervention — a rejected step loops back for revision instead of aborting the flow:

Decision	Exit	Meaning
`approve`	`0`	Proceed to the next step.
`reject`	`20`	Loop back — revise this step, re-capture, re-gate.
`edit`	`21`	The reviewer edited the artifact — re-capture it, then proceed.

The ledger is append-only, so a reject → revise → approve cycle stays visible after the fact. Re-capturing a step bumps its revision so revisions are distinguishable from the original.

checkin.sh and artifact.sh split the job: checkin.sh resolves how the gate behaves for the active level (ask / confirm / auto); artifact.sh captures the artifact and records the decision + loop-back signal. At L1/L2 the calling flow asks the human and feeds the answer to verdict; at L3 the gate degrades to auto-proceed-with-record — checkin.sh returns auto-proceed and the flow records verdict --mode auto (decision defaults to approve, level captured for audit).

Check-in behaviour by automation level

Level	Behaviour	`checkin.sh` exit	recorded verdict
L1 (supervised)	Ask — stop for an explicit human decision	`10`	human approve/reject/edit
L2 (trusted dev)	Confirm — proceed on confirmation	`10`	human approve/reject/edit
L3 (autonomous)	Auto-proceed-with-record — don’t block; log it	`0`	`--mode auto` approve

Level is read from .claude/settings.local.json (_automation.active_level), defaulting to L1/ask when unset. With RDB enabled the verdict carries "rdb": true and the L1/L2 ask routes through ask_remote.

Composition with warp-drive

Run tdd-flow within warp-drive’s coding phase as the test-first way to produce a chunk; warp-drive’s existing testing phase then gates the commit unchanged. Because assert-red.sh and the loop evaluator resolve the test command the same way warp-drive does, the commit gate re-confirms the same green — it does not double-gate, and nothing here overrides test_before_commit.