Token Monitoring & Budget System

Warp-drive tracks token usage per chunk, per session, and across sessions. It estimates costs, enforces budgets via a circuit breaker, and generates optimization insights. This document covers the full pipeline: capture, storage, enforcement, and reporting.

Architecture

Claude Code session transcript (~/.claude/projects/<slug>/*.jsonl)
       │
       ▼
token-snapshot.js  ── reads transcript, computes token delta ──►  JSON snapshot
       │
       ▼
state-machine.js   ── stores snapshots in .warp-drive-state.json
       │                  chunk_snapshots[] (per chunk)
       │                  session_total    (at session end)
       │
       ├──► checkBudgets()    ── enforces hard limits, triggers circuit breaker
       │
       └──► persistTokenUsage() ── appends records to ~/.claude/token-usage.jsonl
                                        at session completion
       │
       ▼
token-report.js   ── reads JSONL, generates markdown/JSON reports
       │                  /token-report skill wraps this
       ▼
Optimization insights, cost estimates, trend analysis

Key files

File	Purpose
`scripts/warp-drive/token-snapshot.js`	Reads Claude Code transcript, computes token totals or deltas
`scripts/warp-drive/token-report.js`	Aggregates JSONL records into reports with cost estimates
`scripts/warp-drive/state-machine.js`	Orchestrates capture, stores state, enforces budgets, persists data
`registry/skills/token-report/SKILL.md`	User-invocable `/token-report` skill definition
`~/.claude/token-usage.jsonl`	Persistent append-only log of all session and chunk records
`.claude/.warp-drive-state.json`	Per-project state file (transient, deleted at session end)

Data Flow

When tokens are captured

Event	What happens	State field
`chunks_defined`	Mark chunk start timestamp	`token_usage.current_chunk_started_at`
`next_chunk`	Capture delta since chunk started, push snapshot	`token_usage.chunk_snapshots[]`
`requirement_done`	Capture final chunk delta, push snapshot	`token_usage.chunk_snapshots[]`
`session_ended`	Capture full-session total (no `--since` filter)	`token_usage.session_total`
`session_ended`	Persist all records to `~/.claude/token-usage.jsonl`	— (written to disk)

How snapshots work

token-snapshot.js reads the Claude Code session transcript — the .jsonl file that Claude Code writes to ~/.claude/projects/<slug>/. Each line is a JSON record with a message.usage object containing token counts.

Full snapshot (no --since): sums all message.usage records in the transcript.
Delta snapshot (--since <ISO>): sums only records with timestamp >= since. Used for per-chunk deltas.

The state machine calls captureTokenSnapshot(projectRoot, sinceTimestamp) which spawns the script and parses its JSON output. If the script fails or doesn’t exist, it returns null and the session continues without token data.

Budget System

The budget system prevents runaway sessions. It has five constraints, checked before every state transition by checkBudgets().

Constraints

Constraint	Config key	Default	Counter	Enforcement
Phase timeout	`max_phase_minutes`	30	Elapsed time in current phase	Configurable: `warn`, `block`, or `abort`
Retry limit	`max_retries_per_chunk`	5	`state.budgets.retry_count`	Hard (enforced)
Coding cycles	`max_coding_cycles`	3	`state.budgets.coding_cycles`	Hard (enforced)
Total chunks	`max_total_chunks`	20	`state.metrics.chunks_completed`	Hard (enforced)
Session duration	`max_session_minutes`	480	Elapsed since `session.started_at_epoch`	Hard (enforced)
Session tokens	`max_session_tokens`	0 (disabled)	`token_usage.session_total` (else chunk-snapshot sum)	Hard (enforced) when > 0
Session cost (optional)	`max_session_usd`	0 (disabled)	estimated via token-report’s cost model	Hard (enforced) when > 0

The cost ceiling (#587) emits a cost_budget_exceeded issue and routes to budget_exceeded like the other hard limits; its reason reports actual spend vs ceiling (e.g. cost_budget_exceeded: 1500 tokens exceeded 1000 tokens). Both ceilings default to 0 (disabled). Token count is the reliable signal; max_session_usd is optional/secondary.

Phase timeout enforcement mode is set via phase_timeout_enforcement in config:

warn (default): advisory only, included in response but doesn’t block
block: rejects the transition
abort: auto-transitions to budget_exceeded

Circuit breaker

When a hard limit is exceeded, the state machine doesn’t crash or silently continue. It transitions to the budget_exceeded phase — a first-class state in the state machine.

any phase ──[hard limit exceeded]──► budget_exceeded
                                          │
                                  ┌───────┴───────┐
                                  ▼               ▼
                           budget_continue    budget_abort
                                  │               │
                                  ▼               ▼
                              coding          aborted
                          (budgets reset)   (session ends)

The circuit breaker:

Runs checkBudgets() before every transition.
Filters for enforced issues (hard limits, not advisory warnings).
If enforced issues exist and the event is not on the bypass list, transitions to budget_exceeded.
Stores diagnostic info: exceeded_reasons, exceeded_at, exceeded_from_phase.
Presents the user with a choice: continue (extend budget) or abort.

Bypass events — these skip the circuit breaker check to avoid deadlocks: abort, abort_resolved, session_ended, budget_continue, budget_abort

Human checkpoint — budget_exceeded requires human approval at all automation levels, including Level 3. This is a mandatory checkpoint that cannot be auto-bypassed.

Budget recovery

When the user chooses budget_continue:

retry_count and coding_cycles reset to 0
budget_extensions counter increments (tracks how many times the user extended)
Execution returns to the coding phase

Per-chunk budget resets

After each successful chunk (next_chunk and requirement_done events), retry_count and coding_cycles reset to 0. This means per-chunk limits apply fresh to each chunk, while max_total_chunks and max_session_minutes apply across the entire session.

Reasoning Budget

The reasoning budget controls Claude’s thinking effort per phase. High-reasoning phases get deeper analysis; standard phases get efficient execution. This is the reasoning sandwich pattern.

Defaults

Phase	Level	Rationale
prerequisites	standard	Mechanical setup
discovering	high	Work discovery needs judgment
planning	high	Architecture decisions need depth
chunking	high	Decomposition affects everything downstream
coding	standard	Implementation follows the plan
updating_docs	standard	Straightforward documentation
testing	high	Test interpretation needs careful analysis
committing	standard	Mechanical commit creation
reporting	standard	Structured output
chunk_complete	standard	Status check
requirement_complete	high	Final verification — last chance to catch issues
merging	standard	Mechanical merge/PR
session_ending	standard	Reporting
budget_exceeded	standard	Decision presentation
aborted	standard	Cleanup

Configuration

Override per-phase reasoning in .claude/settings.local.json:

{
  "_workflow": {
    "reasoning_budget": {
      "coding": "high",
      "testing": "standard"
    }
  }
}

Config overrides take precedence over defaults. The level is injected into the state machine’s systemMessage for each phase transition, where it instructs Claude to adjust its reasoning effort.

State File Structure

During a session, token and budget data lives in .claude/.warp-drive-state.json:

{
  "token_usage": {
    "session_total": null,
    "chunk_snapshots": [
      {
        "chunk_index": 0,
        "acs": ["AC-01", "AC-02"],
        "input_tokens": 61250,
        "output_tokens": 22250,
        "cache_read_tokens": 18400,
        "cache_creation_tokens": 3200,
        "total_tokens": 83500,
        "message_count": 11,
        "timestamp": "2026-04-12T14:30:00Z"
      }
    ],
    "current_chunk_started_at": "2026-04-12T15:10:00Z"
  },
  "budgets": {
    "phase_started_at": "2026-04-12T15:10:00Z",
    "retry_count": 0,
    "coding_cycles": 0,
    "merge_retries": 0,
    "push_retries": 0,
    "budget_extensions": 0,
    "exceeded_reasons": null,
    "exceeded_at": null,
    "exceeded_from_phase": null,
    "aborted_at": null,
    "aborted_from_phase": null
  },
  "metrics": {
    "commits": 0,
    "reports_filed": 0,
    "tests_run": 0,
    "chunks_completed": 0,
    "session_duration_minutes": 0
  }
}

session_total is null during the session and populated at session end by a full (unfiltered) snapshot. chunk_snapshots accumulates one entry per completed chunk. current_chunk_started_at is the ISO timestamp used as the --since argument for the next chunk delta.

Persistent Storage

token-usage.jsonl

At session completion, persistTokenUsage() appends records to ~/.claude/token-usage.jsonl. Each line is a self-contained JSON object.

Chunk record

{
  "type": "chunk",
  "session_id": "abc123",
  "project": "paulirv/bodmail",
  "requirement": "#42",
  "branch": "feat/42-email-templates",
  "level": 2,
  "started_at": "2026-04-12T14:00:00Z",
  "chunk_index": 0,
  "acs": ["AC-01", "AC-02"],
  "input_tokens": 61250,
  "output_tokens": 22250,
  "cache_read_tokens": 18400,
  "cache_creation_tokens": 3200,
  "total_tokens": 83500,
  "message_count": 11,
  "timestamp": "2026-04-12T14:30:00Z"
}

Session record

{
  "type": "session",
  "session_id": "abc123",
  "project": "paulirv/bodmail",
  "requirement": "#42",
  "branch": "feat/42-email-templates",
  "level": 2,
  "started_at": "2026-04-12T14:00:00Z",
  "chunks_completed": 3,
  "commits": 3,
  "input_tokens": 245000,
  "output_tokens": 89000,
  "cache_read_tokens": 72000,
  "cache_creation_tokens": 12800,
  "total_tokens": 334000,
  "message_count": 42,
  "timestamp": "2026-04-12T16:45:00Z"
}

Both record types share a base set of fields (session_id, project, requirement, branch, level, started_at) for filtering and grouping.

CLI Tools

token-snapshot.js

Reads a Claude Code session transcript and computes token usage.

node ~/.claude/scripts/warp-drive/token-snapshot.js <project-root> [--since <ISO-timestamp>]

Argument	Required	Description
`<project-root>`	Yes	Absolute path to the project directory
`--since <ISO>`	No	Only count tokens from messages after this timestamp

Output: JSON to stdout with timestamp, session_id, project, project_root, input_tokens, output_tokens, cache_read_tokens, cache_creation_tokens, total_tokens, message_count.

Exit codes: 0 = success, 1 = missing argument, 2 = no transcript found.

How it finds the transcript: Slugifies the project path (/Users/paul/projects/bodmail becomes -Users-paul-projects-bodmail), looks in ~/.claude/projects/<slug>/ for the most recent .jsonl file (excluding subagent transcripts).

token-report.js

Aggregates ~/.claude/token-usage.jsonl into human-readable reports.

node ~/.claude/scripts/warp-drive/token-report.js [options]

Flag	Default	Description
`--last <N>`	5	Show last N sessions
`--all`	—	Show all sessions
`--project <name>`	—	Filter by project name (partial match)
`--json`	—	Output as JSON instead of markdown
`--insights`	—	Include optimization insights section
`--budget <N>`	5.00	Budget threshold in dollars (used by insights)

Standard output sections:

By Project — aggregated tokens and cost per project
Session History — per-session table: date, project, requirement, chunks, tokens, messages, cost
Totals — sum across displayed sessions
Averages — per-session averages (shown when 2+ sessions)

Insights output (with --insights):

Top Token Consumers — projects ranked by estimated cost
Cost Efficiency per AC — cost and messages per acceptance criterion
Sessions Over Budget — sessions exceeding the threshold, with overage amount
Optimization Suggestions — automated analysis:
- Cache reuse ratio (read/create) — flags low reuse (<5x)
- Messages per AC — flags high back-and-forth (>50)
- Output-to-input ratio — flags verbose sessions
- Cost trend — compares recent 3 sessions against earlier average

/token-report skill

The /token-report skill (provisioned from registry/skills/token-report/) wraps token-report.js for interactive use. It runs the script and presents the output as formatted markdown.

Cost Estimation

Costs are estimated using Claude Opus API pricing:

Token type	Rate
Input	$15.00 / MTok
Output	$75.00 / MTok
Cache read	$1.50 / MTok
Cache creation	$18.75 / MTok

These rates are defined in token-report.js (line 12). Update them if pricing changes. Cost estimates appear in both the standard report and insights output.

Configuration Reference

All budget and reasoning settings live in .claude/settings.local.json under the _workflow key:

{
  "_workflow": {
    "max_phase_minutes": 30,
    "max_retries_per_chunk": 5,
    "max_coding_cycles": 3,
    "max_total_chunks": 20,
    "max_session_minutes": 480,
    "phase_timeout_enforcement": "warn",
    "reasoning_budget": {
      "discovering": "high",
      "planning": "high",
      "coding": "standard",
      "testing": "high"
    }
  }
}

Key	Type	Default	Description
`max_phase_minutes`	number	30	Minutes before phase timeout triggers
`max_retries_per_chunk`	number	5	Hard limit on retries per chunk
`max_coding_cycles`	number	3	Hard limit on code/test cycles per chunk
`max_total_chunks`	number	20	Hard limit on total chunks per session
`max_session_minutes`	number	480	Hard limit on total session duration (8 hours)
`max_session_tokens`	number	0	Hard limit on cumulative session tokens; `0` disables
`max_session_usd`	number	0	Optional estimated-dollar ceiling; `0` disables
`phase_timeout_enforcement`	string	`"warn"`	`"warn"`, `"block"`, or `"abort"`
`reasoning_budget`	object	`{}`	Per-phase reasoning level overrides
`reasoning_budget.<phase>`	string	varies	`"standard"` or `"high"`

Session Summary Integration

Warp-drive session summaries (filed as GitHub Issues with label session-summary) include a Token Usage section with input, output, cache read, and total token counts read from state.token_usage.session_total. This makes cost visible in the project’s issue history without needing to run a separate report.

Troubleshooting

No data in token report

Complete at least one warp-drive session. Data is only persisted at session end (session_ended transition).
Check ~/.claude/token-usage.jsonl exists and has content.

Token counts are zero or null

token-snapshot.js couldn’t find the session transcript. Verify ~/.claude/projects/<slug>/ contains .jsonl files.
The slug is computed by replacing / and . with - in the project path.

Budget exceeded unexpectedly

Check state.budgets in .claude/.warp-drive-state.json for current counter values.
max_total_chunks default is 20 in state-machine.js but 50 in the warp-drive guide config table — the state machine default applies unless overridden in settings.local.json.
Run node ~/.claude/scripts/warp-drive/state-machine.js status "$(pwd)" to see current budget state.

Cache reuse ratio is low

Short sessions with few chunks create cache entries that are never reused. Longer sessions with more chunks per session improve the ratio.
Context-busting tool calls (large file reads, many parallel agents) force cache recreation.

Pricing is outdated

Update the PRICING object in token-report.js (line 12) when API pricing changes. The skill doc and this doc reference the same values.