Token Monitoring & Budget System
Warp-drive tracks token usage per chunk, per session, and across sessions. It estimates costs, enforces budgets via a circuit breaker, and generates optimization insights. This document covers the full pipeline: capture, storage, enforcement, and reporting.
Architecture
Section titled “Architecture”Claude Code session transcript (~/.claude/projects/<slug>/*.jsonl) │ ▼token-snapshot.js ── reads transcript, computes token delta ──► JSON snapshot │ ▼state-machine.js ── stores snapshots in .warp-drive-state.json │ chunk_snapshots[] (per chunk) │ session_total (at session end) │ ├──► checkBudgets() ── enforces hard limits, triggers circuit breaker │ └──► persistTokenUsage() ── appends records to ~/.claude/token-usage.jsonl at session completion │ ▼token-report.js ── reads JSONL, generates markdown/JSON reports │ /token-report skill wraps this ▼Optimization insights, cost estimates, trend analysisKey files
Section titled “Key files”| File | Purpose |
|---|---|
scripts/warp-drive/token-snapshot.js |
Reads Claude Code transcript, computes token totals or deltas |
scripts/warp-drive/token-report.js |
Aggregates JSONL records into reports with cost estimates |
scripts/warp-drive/state-machine.js |
Orchestrates capture, stores state, enforces budgets, persists data |
registry/skills/token-report/SKILL.md |
User-invocable /token-report skill definition |
~/.claude/token-usage.jsonl |
Persistent append-only log of all session and chunk records |
.claude/.warp-drive-state.json |
Per-project state file (transient, deleted at session end) |
Data Flow
Section titled “Data Flow”When tokens are captured
Section titled “When tokens are captured”| Event | What happens | State field |
|---|---|---|
chunks_defined |
Mark chunk start timestamp | token_usage.current_chunk_started_at |
next_chunk |
Capture delta since chunk started, push snapshot | token_usage.chunk_snapshots[] |
requirement_done |
Capture final chunk delta, push snapshot | token_usage.chunk_snapshots[] |
session_ended |
Capture full-session total (no --since filter) |
token_usage.session_total |
session_ended |
Persist all records to ~/.claude/token-usage.jsonl |
— (written to disk) |
How snapshots work
Section titled “How snapshots work”token-snapshot.js reads the Claude Code session transcript — the .jsonl file that Claude Code writes to ~/.claude/projects/<slug>/. Each line is a JSON record with a message.usage object containing token counts.
- Full snapshot (no
--since): sums allmessage.usagerecords in the transcript. - Delta snapshot (
--since <ISO>): sums only records withtimestamp >= since. Used for per-chunk deltas.
The state machine calls captureTokenSnapshot(projectRoot, sinceTimestamp) which spawns the script and parses its JSON output. If the script fails or doesn’t exist, it returns null and the session continues without token data.
Budget System
Section titled “Budget System”The budget system prevents runaway sessions. It has five constraints, checked before every state transition by checkBudgets().
Constraints
Section titled “Constraints”| Constraint | Config key | Default | Counter | Enforcement |
|---|---|---|---|---|
| Phase timeout | max_phase_minutes |
30 | Elapsed time in current phase | Configurable: warn, block, or abort |
| Retry limit | max_retries_per_chunk |
5 | state.budgets.retry_count |
Hard (enforced) |
| Coding cycles | max_coding_cycles |
3 | state.budgets.coding_cycles |
Hard (enforced) |
| Total chunks | max_total_chunks |
20 | state.metrics.chunks_completed |
Hard (enforced) |
| Session duration | max_session_minutes |
480 | Elapsed since session.started_at_epoch |
Hard (enforced) |
| Session tokens | max_session_tokens |
0 (disabled) | token_usage.session_total (else chunk-snapshot sum) |
Hard (enforced) when > 0 |
| Session cost (optional) | max_session_usd |
0 (disabled) | estimated via token-report’s cost model | Hard (enforced) when > 0 |
The cost ceiling (#587) emits a cost_budget_exceeded issue and routes to budget_exceeded like the other hard limits; its reason reports actual spend vs ceiling (e.g. cost_budget_exceeded: 1500 tokens exceeded 1000 tokens). Both ceilings default to 0 (disabled). Token count is the reliable signal; max_session_usd is optional/secondary.
Phase timeout enforcement mode is set via phase_timeout_enforcement in config:
warn(default): advisory only, included in response but doesn’t blockblock: rejects the transitionabort: auto-transitions tobudget_exceeded
Circuit breaker
Section titled “Circuit breaker”When a hard limit is exceeded, the state machine doesn’t crash or silently continue. It transitions to the budget_exceeded phase — a first-class state in the state machine.
any phase ──[hard limit exceeded]──► budget_exceeded │ ┌───────┴───────┐ ▼ ▼ budget_continue budget_abort │ │ ▼ ▼ coding aborted (budgets reset) (session ends)The circuit breaker:
- Runs
checkBudgets()before every transition. - Filters for enforced issues (hard limits, not advisory warnings).
- If enforced issues exist and the event is not on the bypass list, transitions to
budget_exceeded. - Stores diagnostic info:
exceeded_reasons,exceeded_at,exceeded_from_phase. - Presents the user with a choice: continue (extend budget) or abort.
Bypass events — these skip the circuit breaker check to avoid deadlocks:
abort, abort_resolved, session_ended, budget_continue, budget_abort
Human checkpoint — budget_exceeded requires human approval at all automation levels, including Level 3. This is a mandatory checkpoint that cannot be auto-bypassed.
Budget recovery
Section titled “Budget recovery”When the user chooses budget_continue:
retry_countandcoding_cyclesreset to 0budget_extensionscounter increments (tracks how many times the user extended)- Execution returns to the
codingphase
Per-chunk budget resets
Section titled “Per-chunk budget resets”After each successful chunk (next_chunk and requirement_done events), retry_count and coding_cycles reset to 0. This means per-chunk limits apply fresh to each chunk, while max_total_chunks and max_session_minutes apply across the entire session.
Reasoning Budget
Section titled “Reasoning Budget”The reasoning budget controls Claude’s thinking effort per phase. High-reasoning phases get deeper analysis; standard phases get efficient execution. This is the reasoning sandwich pattern.
Defaults
Section titled “Defaults”| Phase | Level | Rationale |
|---|---|---|
| prerequisites | standard | Mechanical setup |
| discovering | high | Work discovery needs judgment |
| planning | high | Architecture decisions need depth |
| chunking | high | Decomposition affects everything downstream |
| coding | standard | Implementation follows the plan |
| updating_docs | standard | Straightforward documentation |
| testing | high | Test interpretation needs careful analysis |
| committing | standard | Mechanical commit creation |
| reporting | standard | Structured output |
| chunk_complete | standard | Status check |
| requirement_complete | high | Final verification — last chance to catch issues |
| merging | standard | Mechanical merge/PR |
| session_ending | standard | Reporting |
| budget_exceeded | standard | Decision presentation |
| aborted | standard | Cleanup |
Configuration
Section titled “Configuration”Override per-phase reasoning in .claude/settings.local.json:
{ "_workflow": { "reasoning_budget": { "coding": "high", "testing": "standard" } }}Config overrides take precedence over defaults. The level is injected into the state machine’s systemMessage for each phase transition, where it instructs Claude to adjust its reasoning effort.
State File Structure
Section titled “State File Structure”During a session, token and budget data lives in .claude/.warp-drive-state.json:
{ "token_usage": { "session_total": null, "chunk_snapshots": [ { "chunk_index": 0, "acs": ["AC-01", "AC-02"], "input_tokens": 61250, "output_tokens": 22250, "cache_read_tokens": 18400, "cache_creation_tokens": 3200, "total_tokens": 83500, "message_count": 11, "timestamp": "2026-04-12T14:30:00Z" } ], "current_chunk_started_at": "2026-04-12T15:10:00Z" }, "budgets": { "phase_started_at": "2026-04-12T15:10:00Z", "retry_count": 0, "coding_cycles": 0, "merge_retries": 0, "push_retries": 0, "budget_extensions": 0, "exceeded_reasons": null, "exceeded_at": null, "exceeded_from_phase": null, "aborted_at": null, "aborted_from_phase": null }, "metrics": { "commits": 0, "reports_filed": 0, "tests_run": 0, "chunks_completed": 0, "session_duration_minutes": 0 }}session_total is null during the session and populated at session end by a full (unfiltered) snapshot. chunk_snapshots accumulates one entry per completed chunk. current_chunk_started_at is the ISO timestamp used as the --since argument for the next chunk delta.
Persistent Storage
Section titled “Persistent Storage”token-usage.jsonl
Section titled “token-usage.jsonl”At session completion, persistTokenUsage() appends records to ~/.claude/token-usage.jsonl. Each line is a self-contained JSON object.
Chunk record
Section titled “Chunk record”{ "type": "chunk", "session_id": "abc123", "project": "paulirv/bodmail", "requirement": "#42", "branch": "feat/42-email-templates", "level": 2, "started_at": "2026-04-12T14:00:00Z", "chunk_index": 0, "acs": ["AC-01", "AC-02"], "input_tokens": 61250, "output_tokens": 22250, "cache_read_tokens": 18400, "cache_creation_tokens": 3200, "total_tokens": 83500, "message_count": 11, "timestamp": "2026-04-12T14:30:00Z"}Session record
Section titled “Session record”{ "type": "session", "session_id": "abc123", "project": "paulirv/bodmail", "requirement": "#42", "branch": "feat/42-email-templates", "level": 2, "started_at": "2026-04-12T14:00:00Z", "chunks_completed": 3, "commits": 3, "input_tokens": 245000, "output_tokens": 89000, "cache_read_tokens": 72000, "cache_creation_tokens": 12800, "total_tokens": 334000, "message_count": 42, "timestamp": "2026-04-12T16:45:00Z"}Both record types share a base set of fields (session_id, project, requirement, branch, level, started_at) for filtering and grouping.
CLI Tools
Section titled “CLI Tools”token-snapshot.js
Section titled “token-snapshot.js”Reads a Claude Code session transcript and computes token usage.
node ~/.claude/scripts/warp-drive/token-snapshot.js <project-root> [--since <ISO-timestamp>]| Argument | Required | Description |
|---|---|---|
<project-root> |
Yes | Absolute path to the project directory |
--since <ISO> |
No | Only count tokens from messages after this timestamp |
Output: JSON to stdout with timestamp, session_id, project, project_root, input_tokens, output_tokens, cache_read_tokens, cache_creation_tokens, total_tokens, message_count.
Exit codes: 0 = success, 1 = missing argument, 2 = no transcript found.
How it finds the transcript: Slugifies the project path (/Users/paul/projects/bodmail becomes -Users-paul-projects-bodmail), looks in ~/.claude/projects/<slug>/ for the most recent .jsonl file (excluding subagent transcripts).
token-report.js
Section titled “token-report.js”Aggregates ~/.claude/token-usage.jsonl into human-readable reports.
node ~/.claude/scripts/warp-drive/token-report.js [options]| Flag | Default | Description |
|---|---|---|
--last <N> |
5 | Show last N sessions |
--all |
— | Show all sessions |
--project <name> |
— | Filter by project name (partial match) |
--json |
— | Output as JSON instead of markdown |
--insights |
— | Include optimization insights section |
--budget <N> |
5.00 | Budget threshold in dollars (used by insights) |
Standard output sections:
- By Project — aggregated tokens and cost per project
- Session History — per-session table: date, project, requirement, chunks, tokens, messages, cost
- Totals — sum across displayed sessions
- Averages — per-session averages (shown when 2+ sessions)
Insights output (with --insights):
- Top Token Consumers — projects ranked by estimated cost
- Cost Efficiency per AC — cost and messages per acceptance criterion
- Sessions Over Budget — sessions exceeding the threshold, with overage amount
- Optimization Suggestions — automated analysis:
- Cache reuse ratio (read/create) — flags low reuse (<5x)
- Messages per AC — flags high back-and-forth (>50)
- Output-to-input ratio — flags verbose sessions
- Cost trend — compares recent 3 sessions against earlier average
/token-report skill
Section titled “/token-report skill”The /token-report skill (provisioned from registry/skills/token-report/) wraps token-report.js for interactive use. It runs the script and presents the output as formatted markdown.
Cost Estimation
Section titled “Cost Estimation”Costs are estimated using Claude Opus API pricing:
| Token type | Rate |
|---|---|
| Input | $15.00 / MTok |
| Output | $75.00 / MTok |
| Cache read | $1.50 / MTok |
| Cache creation | $18.75 / MTok |
These rates are defined in token-report.js (line 12). Update them if pricing changes. Cost estimates appear in both the standard report and insights output.
Configuration Reference
Section titled “Configuration Reference”All budget and reasoning settings live in .claude/settings.local.json under the _workflow key:
{ "_workflow": { "max_phase_minutes": 30, "max_retries_per_chunk": 5, "max_coding_cycles": 3, "max_total_chunks": 20, "max_session_minutes": 480, "phase_timeout_enforcement": "warn", "reasoning_budget": { "discovering": "high", "planning": "high", "coding": "standard", "testing": "high" } }}| Key | Type | Default | Description |
|---|---|---|---|
max_phase_minutes |
number | 30 | Minutes before phase timeout triggers |
max_retries_per_chunk |
number | 5 | Hard limit on retries per chunk |
max_coding_cycles |
number | 3 | Hard limit on code/test cycles per chunk |
max_total_chunks |
number | 20 | Hard limit on total chunks per session |
max_session_minutes |
number | 480 | Hard limit on total session duration (8 hours) |
max_session_tokens |
number | 0 | Hard limit on cumulative session tokens; 0 disables |
max_session_usd |
number | 0 | Optional estimated-dollar ceiling; 0 disables |
phase_timeout_enforcement |
string | "warn" |
"warn", "block", or "abort" |
reasoning_budget |
object | {} |
Per-phase reasoning level overrides |
reasoning_budget.<phase> |
string | varies | "standard" or "high" |
Session Summary Integration
Section titled “Session Summary Integration”Warp-drive session summaries (filed as GitHub Issues with label session-summary) include a Token Usage section with input, output, cache read, and total token counts read from state.token_usage.session_total. This makes cost visible in the project’s issue history without needing to run a separate report.
Troubleshooting
Section titled “Troubleshooting”No data in token report
- Complete at least one warp-drive session. Data is only persisted at session end (
session_endedtransition). - Check
~/.claude/token-usage.jsonlexists and has content.
Token counts are zero or null
token-snapshot.jscouldn’t find the session transcript. Verify~/.claude/projects/<slug>/contains.jsonlfiles.- The slug is computed by replacing
/and.with-in the project path.
Budget exceeded unexpectedly
- Check
state.budgetsin.claude/.warp-drive-state.jsonfor current counter values. max_total_chunksdefault is 20 instate-machine.jsbut 50 in the warp-drive guide config table — the state machine default applies unless overridden insettings.local.json.- Run
node ~/.claude/scripts/warp-drive/state-machine.js status "$(pwd)"to see current budget state.
Cache reuse ratio is low
- Short sessions with few chunks create cache entries that are never reused. Longer sessions with more chunks per session improve the ratio.
- Context-busting tool calls (large file reads, many parallel agents) force cache recreation.
Pricing is outdated
- Update the
PRICINGobject intoken-report.js(line 12) when API pricing changes. The skill doc and this doc reference the same values.