Skip to content

Token Monitoring & Budget System

Warp-drive tracks token usage per chunk, per session, and across sessions. It estimates costs, enforces budgets via a circuit breaker, and generates optimization insights. This document covers the full pipeline: capture, storage, enforcement, and reporting.


Claude Code session transcript (~/.claude/projects/<slug>/*.jsonl)
token-snapshot.js ── reads transcript, computes token delta ──► JSON snapshot
state-machine.js ── stores snapshots in .warp-drive-state.json
│ chunk_snapshots[] (per chunk)
│ session_total (at session end)
├──► checkBudgets() ── enforces hard limits, triggers circuit breaker
└──► persistTokenUsage() ── appends records to ~/.claude/token-usage.jsonl
at session completion
token-report.js ── reads JSONL, generates markdown/JSON reports
│ /token-report skill wraps this
Optimization insights, cost estimates, trend analysis
File Purpose
scripts/warp-drive/token-snapshot.js Reads Claude Code transcript, computes token totals or deltas
scripts/warp-drive/token-report.js Aggregates JSONL records into reports with cost estimates
scripts/warp-drive/state-machine.js Orchestrates capture, stores state, enforces budgets, persists data
registry/skills/token-report/SKILL.md User-invocable /token-report skill definition
~/.claude/token-usage.jsonl Persistent append-only log of all session and chunk records
.claude/.warp-drive-state.json Per-project state file (transient, deleted at session end)

Event What happens State field
chunks_defined Mark chunk start timestamp token_usage.current_chunk_started_at
next_chunk Capture delta since chunk started, push snapshot token_usage.chunk_snapshots[]
requirement_done Capture final chunk delta, push snapshot token_usage.chunk_snapshots[]
session_ended Capture full-session total (no --since filter) token_usage.session_total
session_ended Persist all records to ~/.claude/token-usage.jsonl — (written to disk)

token-snapshot.js reads the Claude Code session transcript — the .jsonl file that Claude Code writes to ~/.claude/projects/<slug>/. Each line is a JSON record with a message.usage object containing token counts.

  • Full snapshot (no --since): sums all message.usage records in the transcript.
  • Delta snapshot (--since <ISO>): sums only records with timestamp >= since. Used for per-chunk deltas.

The state machine calls captureTokenSnapshot(projectRoot, sinceTimestamp) which spawns the script and parses its JSON output. If the script fails or doesn’t exist, it returns null and the session continues without token data.


The budget system prevents runaway sessions. It has five constraints, checked before every state transition by checkBudgets().

Constraint Config key Default Counter Enforcement
Phase timeout max_phase_minutes 30 Elapsed time in current phase Configurable: warn, block, or abort
Retry limit max_retries_per_chunk 5 state.budgets.retry_count Hard (enforced)
Coding cycles max_coding_cycles 3 state.budgets.coding_cycles Hard (enforced)
Total chunks max_total_chunks 20 state.metrics.chunks_completed Hard (enforced)
Session duration max_session_minutes 480 Elapsed since session.started_at_epoch Hard (enforced)
Session tokens max_session_tokens 0 (disabled) token_usage.session_total (else chunk-snapshot sum) Hard (enforced) when > 0
Session cost (optional) max_session_usd 0 (disabled) estimated via token-report’s cost model Hard (enforced) when > 0

The cost ceiling (#587) emits a cost_budget_exceeded issue and routes to budget_exceeded like the other hard limits; its reason reports actual spend vs ceiling (e.g. cost_budget_exceeded: 1500 tokens exceeded 1000 tokens). Both ceilings default to 0 (disabled). Token count is the reliable signal; max_session_usd is optional/secondary.

Phase timeout enforcement mode is set via phase_timeout_enforcement in config:

  • warn (default): advisory only, included in response but doesn’t block
  • block: rejects the transition
  • abort: auto-transitions to budget_exceeded

When a hard limit is exceeded, the state machine doesn’t crash or silently continue. It transitions to the budget_exceeded phase — a first-class state in the state machine.

any phase ──[hard limit exceeded]──► budget_exceeded
┌───────┴───────┐
▼ ▼
budget_continue budget_abort
│ │
▼ ▼
coding aborted
(budgets reset) (session ends)

The circuit breaker:

  1. Runs checkBudgets() before every transition.
  2. Filters for enforced issues (hard limits, not advisory warnings).
  3. If enforced issues exist and the event is not on the bypass list, transitions to budget_exceeded.
  4. Stores diagnostic info: exceeded_reasons, exceeded_at, exceeded_from_phase.
  5. Presents the user with a choice: continue (extend budget) or abort.

Bypass events — these skip the circuit breaker check to avoid deadlocks: abort, abort_resolved, session_ended, budget_continue, budget_abort

Human checkpointbudget_exceeded requires human approval at all automation levels, including Level 3. This is a mandatory checkpoint that cannot be auto-bypassed.

When the user chooses budget_continue:

  • retry_count and coding_cycles reset to 0
  • budget_extensions counter increments (tracks how many times the user extended)
  • Execution returns to the coding phase

After each successful chunk (next_chunk and requirement_done events), retry_count and coding_cycles reset to 0. This means per-chunk limits apply fresh to each chunk, while max_total_chunks and max_session_minutes apply across the entire session.


The reasoning budget controls Claude’s thinking effort per phase. High-reasoning phases get deeper analysis; standard phases get efficient execution. This is the reasoning sandwich pattern.

Phase Level Rationale
prerequisites standard Mechanical setup
discovering high Work discovery needs judgment
planning high Architecture decisions need depth
chunking high Decomposition affects everything downstream
coding standard Implementation follows the plan
updating_docs standard Straightforward documentation
testing high Test interpretation needs careful analysis
committing standard Mechanical commit creation
reporting standard Structured output
chunk_complete standard Status check
requirement_complete high Final verification — last chance to catch issues
merging standard Mechanical merge/PR
session_ending standard Reporting
budget_exceeded standard Decision presentation
aborted standard Cleanup

Override per-phase reasoning in .claude/settings.local.json:

{
"_workflow": {
"reasoning_budget": {
"coding": "high",
"testing": "standard"
}
}
}

Config overrides take precedence over defaults. The level is injected into the state machine’s systemMessage for each phase transition, where it instructs Claude to adjust its reasoning effort.


During a session, token and budget data lives in .claude/.warp-drive-state.json:

{
"token_usage": {
"session_total": null,
"chunk_snapshots": [
{
"chunk_index": 0,
"acs": ["AC-01", "AC-02"],
"input_tokens": 61250,
"output_tokens": 22250,
"cache_read_tokens": 18400,
"cache_creation_tokens": 3200,
"total_tokens": 83500,
"message_count": 11,
"timestamp": "2026-04-12T14:30:00Z"
}
],
"current_chunk_started_at": "2026-04-12T15:10:00Z"
},
"budgets": {
"phase_started_at": "2026-04-12T15:10:00Z",
"retry_count": 0,
"coding_cycles": 0,
"merge_retries": 0,
"push_retries": 0,
"budget_extensions": 0,
"exceeded_reasons": null,
"exceeded_at": null,
"exceeded_from_phase": null,
"aborted_at": null,
"aborted_from_phase": null
},
"metrics": {
"commits": 0,
"reports_filed": 0,
"tests_run": 0,
"chunks_completed": 0,
"session_duration_minutes": 0
}
}

session_total is null during the session and populated at session end by a full (unfiltered) snapshot. chunk_snapshots accumulates one entry per completed chunk. current_chunk_started_at is the ISO timestamp used as the --since argument for the next chunk delta.


At session completion, persistTokenUsage() appends records to ~/.claude/token-usage.jsonl. Each line is a self-contained JSON object.

{
"type": "chunk",
"session_id": "abc123",
"project": "paulirv/bodmail",
"requirement": "#42",
"branch": "feat/42-email-templates",
"level": 2,
"started_at": "2026-04-12T14:00:00Z",
"chunk_index": 0,
"acs": ["AC-01", "AC-02"],
"input_tokens": 61250,
"output_tokens": 22250,
"cache_read_tokens": 18400,
"cache_creation_tokens": 3200,
"total_tokens": 83500,
"message_count": 11,
"timestamp": "2026-04-12T14:30:00Z"
}
{
"type": "session",
"session_id": "abc123",
"project": "paulirv/bodmail",
"requirement": "#42",
"branch": "feat/42-email-templates",
"level": 2,
"started_at": "2026-04-12T14:00:00Z",
"chunks_completed": 3,
"commits": 3,
"input_tokens": 245000,
"output_tokens": 89000,
"cache_read_tokens": 72000,
"cache_creation_tokens": 12800,
"total_tokens": 334000,
"message_count": 42,
"timestamp": "2026-04-12T16:45:00Z"
}

Both record types share a base set of fields (session_id, project, requirement, branch, level, started_at) for filtering and grouping.


Reads a Claude Code session transcript and computes token usage.

Terminal window
node ~/.claude/scripts/warp-drive/token-snapshot.js <project-root> [--since <ISO-timestamp>]
Argument Required Description
<project-root> Yes Absolute path to the project directory
--since <ISO> No Only count tokens from messages after this timestamp

Output: JSON to stdout with timestamp, session_id, project, project_root, input_tokens, output_tokens, cache_read_tokens, cache_creation_tokens, total_tokens, message_count.

Exit codes: 0 = success, 1 = missing argument, 2 = no transcript found.

How it finds the transcript: Slugifies the project path (/Users/paul/projects/bodmail becomes -Users-paul-projects-bodmail), looks in ~/.claude/projects/<slug>/ for the most recent .jsonl file (excluding subagent transcripts).

Aggregates ~/.claude/token-usage.jsonl into human-readable reports.

Terminal window
node ~/.claude/scripts/warp-drive/token-report.js [options]
Flag Default Description
--last <N> 5 Show last N sessions
--all Show all sessions
--project <name> Filter by project name (partial match)
--json Output as JSON instead of markdown
--insights Include optimization insights section
--budget <N> 5.00 Budget threshold in dollars (used by insights)

Standard output sections:

  1. By Project — aggregated tokens and cost per project
  2. Session History — per-session table: date, project, requirement, chunks, tokens, messages, cost
  3. Totals — sum across displayed sessions
  4. Averages — per-session averages (shown when 2+ sessions)

Insights output (with --insights):

  1. Top Token Consumers — projects ranked by estimated cost
  2. Cost Efficiency per AC — cost and messages per acceptance criterion
  3. Sessions Over Budget — sessions exceeding the threshold, with overage amount
  4. Optimization Suggestions — automated analysis:
    • Cache reuse ratio (read/create) — flags low reuse (<5x)
    • Messages per AC — flags high back-and-forth (>50)
    • Output-to-input ratio — flags verbose sessions
    • Cost trend — compares recent 3 sessions against earlier average

The /token-report skill (provisioned from registry/skills/token-report/) wraps token-report.js for interactive use. It runs the script and presents the output as formatted markdown.


Costs are estimated using Claude Opus API pricing:

Token type Rate
Input $15.00 / MTok
Output $75.00 / MTok
Cache read $1.50 / MTok
Cache creation $18.75 / MTok

These rates are defined in token-report.js (line 12). Update them if pricing changes. Cost estimates appear in both the standard report and insights output.


All budget and reasoning settings live in .claude/settings.local.json under the _workflow key:

{
"_workflow": {
"max_phase_minutes": 30,
"max_retries_per_chunk": 5,
"max_coding_cycles": 3,
"max_total_chunks": 20,
"max_session_minutes": 480,
"phase_timeout_enforcement": "warn",
"reasoning_budget": {
"discovering": "high",
"planning": "high",
"coding": "standard",
"testing": "high"
}
}
}
Key Type Default Description
max_phase_minutes number 30 Minutes before phase timeout triggers
max_retries_per_chunk number 5 Hard limit on retries per chunk
max_coding_cycles number 3 Hard limit on code/test cycles per chunk
max_total_chunks number 20 Hard limit on total chunks per session
max_session_minutes number 480 Hard limit on total session duration (8 hours)
max_session_tokens number 0 Hard limit on cumulative session tokens; 0 disables
max_session_usd number 0 Optional estimated-dollar ceiling; 0 disables
phase_timeout_enforcement string "warn" "warn", "block", or "abort"
reasoning_budget object {} Per-phase reasoning level overrides
reasoning_budget.<phase> string varies "standard" or "high"

Warp-drive session summaries (filed as GitHub Issues with label session-summary) include a Token Usage section with input, output, cache read, and total token counts read from state.token_usage.session_total. This makes cost visible in the project’s issue history without needing to run a separate report.


No data in token report

  • Complete at least one warp-drive session. Data is only persisted at session end (session_ended transition).
  • Check ~/.claude/token-usage.jsonl exists and has content.

Token counts are zero or null

  • token-snapshot.js couldn’t find the session transcript. Verify ~/.claude/projects/<slug>/ contains .jsonl files.
  • The slug is computed by replacing / and . with - in the project path.

Budget exceeded unexpectedly

  • Check state.budgets in .claude/.warp-drive-state.json for current counter values.
  • max_total_chunks default is 20 in state-machine.js but 50 in the warp-drive guide config table — the state machine default applies unless overridden in settings.local.json.
  • Run node ~/.claude/scripts/warp-drive/state-machine.js status "$(pwd)" to see current budget state.

Cache reuse ratio is low

  • Short sessions with few chunks create cache entries that are never reused. Longer sessions with more chunks per session improve the ratio.
  • Context-busting tool calls (large file reads, many parallel agents) force cache recreation.

Pricing is outdated

  • Update the PRICING object in token-report.js (line 12) when API pricing changes. The skill doc and this doc reference the same values.