Skip to content

Persist cascading budget counters and add period reset #21

@EightRice

Description

@EightRice

Goal

Cascading per-agent budgets for Claude Max that are: durable across daemon restarts, period-resetting, enforced inside the cognitive loop (including the bridge's native loop), expressed as percentages at the user/agent boundary, and accurate per model class.

Status (2026-04-29)

Foundation landed — persistence, period rollover, mandatory budgets, cascading clamp, inner-loop enforcement for non-bridge providers, and the get_my_budget_status tool. 38 tests green. But this is not yet shippable for Claude Max users:

  • Budgets are token-denominated; user/agent should declare percentages.
  • Estimator blends Haiku/Sonnet/Opus into one ratio — meaningless number.
  • Bridge native loop enforces only post-orchestration; a runaway 50-turn execution can still blow the cap.

Remaining scope

Percent-based budget API at boundaries

  • New schema: {provider: {"pct": N, "of": "parent" | "subscription_5h", "period": "..."}}
  • Internal storage stays in tokens (recorder, counters, cascade math unchanged)
  • Conversion happens at the boundary:
    • pct of parent → tokens at registration (parent_token_cap * pct / 100)
    • pct of subscription_5h → tokens at lookup time, recomputed each refresh
  • Backward compat: int form ({provider: 1000}) and existing dict form ({"limit": N, ...}) still accepted.

Per-model-class estimator (haiku / sonnet / opus)

  • Three EMA buckets in BridgeProvider, one per class
  • Snapshot tuple becomes (tokens_delta_for_class, util_5h_delta, model_class)
  • Tag each turn's tokens with the model that produced them (from response.model)
  • Hard-coded bootstrap multipliers used until 2 snapshots exist for a class: relative cost haiku=1, sonnet=5, opus=25 normalized via Anthropic's headers once available
  • tokens_per_pct[class] is the conversion used when a budget says pct of subscription_5h

Bridge inner-loop enforcement

  • TS bridge emits usage event per assistant turn: {input, output, cache_read, cache_creation, model} — same channel as existing tool_use_start/result/compaction events
  • Python BridgeProvider._stream_events calls the same usage_recorder callback used by the base provider; on (False, blocker) calls self.interrupt() to stop the SDK
  • Engine sees stop_reason="budget_exceeded" (already wired)

Reconciliation pass

  • After each bridge orchestration, refresh subscription headers
  • Compare measured util_5h_delta against sum of per-turn events
  • Big discrepancy → recalibrate the estimator
  • Small → trust it. Safety net for when the estimator drifts.

Acceptance

  • A user/agent can declare {"claude_max": {"pct": 30, "of": "parent"}} or {"pct": 80, "of": "subscription_5h", "period": "weekly"} and registration computes the equivalent token cap.
  • The cascading clamp at registration uses percent terms naturally — child's pct of parent ≤ 100 minus siblings' total.
  • The bridge orchestration aborts mid-loop within ≤1 turn after crossing a cap, with stop_reason="budget_exceeded".
  • The estimator produces three independent tokens_per_pct ratios for haiku/sonnet/opus.
  • Reconciliation diff is logged per orchestration; estimator self-corrects after a configurable number of mismatched cycles.
  • Tests in tests/atn/ covering each.

Out of scope

Metadata

Metadata

Assignees

No one assigned

    Labels

    track:agentATN runtime, providers, orchestrator, bridgestrack:opsCaching, telemetry, cost tracking, observabilitytype:featureNew capability

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions