docs(m6): Add Usage Budget Management section

Add comprehensive usage budget management design to M6 orchestration architecture. FEATURES: - Real-time usage tracking across agents - Budget allocation per task/milestone/project - Usage projection and burn rate calculation - Throttling decisions to prevent budget exhaustion - Model tier optimization (Haiku/Sonnet/Opus) - Pre-commit usage validation DATA MODEL: - usage_budgets table (allocated/consumed/remaining) - agent_usage_logs table (per-agent tracking) - Valkey keys for real-time state BUDGET CHECKPOINTS: 1. Task assignment - can afford this task? 2. Agent spawn - verify budget headroom 3. Checkpoint intervals - periodic compliance 4. Pre-commit validation - usage efficiency PRIORITY: MVP (M6 Phase 3) for basic tracking, Phase 5 for advanced projection and optimization. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-04 07:50:22 -06:00
parent 65e56cac5e
commit bed440dc36
1 changed files with 337 additions and 0 deletions
--- a/docs/design/agent-orchestration.md
+++ b/docs/design/agent-orchestration.md
@@ -908,6 +908,343 @@ export class CoordinatorService {
 ---
 ## Usage Budget Management
 **Version:** 1.0
 **Date:** 2026-02-04
 **Status:** Required for MVP
 ### Problem Statement
 Autonomous agents using Claude Code can consume significant API tokens without proper governance. Without real-time usage tracking and budgeting, projects risk:
 1. **Cost overruns** — Agents exceed budget before milestone completion
 2. **Service disruption** — Hit API rate limits mid-task
 3. **Unpredictable momentum** — Can't estimate project velocity
 4. **Budget exhaustion** — Agents consume entire monthly budget in days
 ### Requirements
 The orchestration layer must provide:
 - **Real-time usage tracking** — Current token usage across all active agents
 - **Budget allocation** — Pre-allocate budgets per task/milestone/project
 - **Usage projection** — Estimate remaining work vs remaining budget
 - **Throttling decisions** — Pause/slow agents approaching limits
 - **Cost optimization** — Route tasks to appropriate model tiers (Haiku/Sonnet/Opus)
 ### Architecture
 ```
 ┌─────────────────────────────────────────────────────────┐
 │           Usage Budget Manager Service                  │
 │  ┌───────────────────┐  ┌────────────────────────────┐ │
 │  │  Usage Tracker    │  │   Budget Allocator         │ │
 │  │  - Query API      │  │   - Per-task budgets       │ │
 │  │  - Real-time sum  │  │   - Per-milestone budgets  │ │
 │  │  - Agent rollup   │  │   - Global limits          │ │
 │  └────────┬──────────┘  └──────────┬─────────────────┘ │
 │           │                        │                    │
 │  ┌────────▼────────────────────────▼─────────────────┐ │
 │  │         Projection Engine                         │ │
 │  │  - Estimate remaining work                        │ │
 │  │  - Calculate burn rate                            │ │
 │  │  - Predict budget exhaustion                      │ │
 │  │  - Recommend throttle/pause                       │ │
 │  └───────────────────────────┬───────────────────────┘ │
 └────────────────────────────────┼─────────────────────────┘
                                 │
                    ┌────────────┼────────────┐
                    │            │            │
            ┌───────▼──────┐ ┌──▼──────┐ ┌──▼────────────┐
            │Queue Manager │ │Agent Mgr│ │  Coordinator  │
            │- Check budget│ │- Spawn  │ │  - Pre-commit │
            │  before queue│ │  check  │ │    validation │
            └──────────────┘ └─────────┘ └───────────────┘
 ```
 ### Budget Check Points
 **1. Task Assignment** (Queue Manager)
 ```typescript
 async canAffordTask(taskId: string): Promise<BudgetDecision> {
  const task = await getTask(taskId);
  const currentUsage = await usageTracker.getCurrentUsage();
  const budget = await budgetAllocator.getTaskBudget(taskId);
  const projectedCost = estimateTaskCost(task);
  const remaining = budget.limit - currentUsage.total;
  if (projectedCost > remaining) {
    return {
      canProceed: false,
      reason: 'Insufficient budget',
      recommendation: 'Pause or reallocate budget',
    };
  }
  return { canProceed: true };
 }
 ```
 **2. Agent Spawn** (Agent Manager)
 Before spawning a Claude Code agent, verify budget headroom:
 ```typescript
 async spawnAgent(config: AgentConfig): Promise<Agent> {
  const budgetCheck = await usageBudgetManager.canAffordTask(config.taskId);
  if (!budgetCheck.canProceed) {
    throw new InsufficientBudgetError(budgetCheck.reason);
  }
  // Proceed with spawn
  const agent = await claudeCode.spawn(config);
  // Track agent for usage rollup
  await usageTracker.registerAgent(agent.id, config.taskId);
  return agent;
 }
 ```
 **3. Checkpoint Intervals** (Coordinator)
 During task execution, periodically verify budget compliance:
 ```typescript
 async checkpointBudgetCompliance(taskId: string): Promise<void> {
  const usage = await usageTracker.getTaskUsage(taskId);
  const budget = await budgetAllocator.getTaskBudget(taskId);
  const percentUsed = (usage.current / budget.allocated) * 100;
  if (percentUsed > 90) {
    await coordinator.sendWarning(taskId, 'Approaching budget limit');
  }
  if (percentUsed > 100) {
    await coordinator.pauseTask(taskId, 'Budget exceeded');
    await notifyUser(taskId, 'Task paused: budget exhausted');
  }
 }
 ```
 **4. Pre-commit Validation** (Quality Gates)
 Before committing work, verify usage is reasonable:
 ```typescript
 async validateUsageEfficiency(taskId: string): Promise<ValidationResult> {
  const usage = await usageTracker.getTaskUsage(taskId);
  const linesChanged = await git.getChangedLines(taskId);
  const testsAdded = await git.getTestFiles(taskId);
  // Cost per line heuristic (adjust based on learnings)
  const expectedTokensPerLine = 150; // baseline + TDD overhead
  const expectedUsage = linesChanged * expectedTokensPerLine;
  const efficiency = expectedUsage / usage.current;
  if (efficiency < 0.5) {
    return {
      valid: false,
      reason: 'Usage appears inefficient',
      recommendation: 'Review agent logs for token waste',
    };
  }
  return { valid: true };
 }
 ```
 ### Data Model
 #### `usage_budgets` Table
 ```sql
 CREATE TABLE usage_budgets (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  workspace_id UUID NOT NULL REFERENCES workspaces(id),
  -- Scope: 'global', 'project', 'milestone', 'task'
  scope VARCHAR(20) NOT NULL,
  scope_id VARCHAR(100), -- project_id, milestone_id, or task_id
  -- Budget limits (tokens)
  allocated BIGINT NOT NULL,
  consumed BIGINT NOT NULL DEFAULT 0,
  remaining BIGINT GENERATED ALWAYS AS (allocated - consumed) STORED,
  -- Tracking
  period_start TIMESTAMPTZ NOT NULL,
  period_end TIMESTAMPTZ NOT NULL,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
 );
 CREATE INDEX idx_usage_budgets_scope ON usage_budgets(scope, scope_id);
 CREATE INDEX idx_usage_budgets_workspace ON usage_budgets(workspace_id);
 ```
 #### `agent_usage_logs` Table
 ```sql
 CREATE TABLE agent_usage_logs (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  workspace_id UUID NOT NULL REFERENCES workspaces(id),
  agent_session_id UUID NOT NULL,
  task_id UUID REFERENCES agent_tasks(id),
  -- Usage details
  input_tokens BIGINT NOT NULL,
  output_tokens BIGINT NOT NULL,
  total_tokens BIGINT NOT NULL,
  model VARCHAR(100) NOT NULL, -- 'claude-sonnet-4', 'claude-haiku-3.5', etc.
  -- Cost tracking
  estimated_cost_usd DECIMAL(10, 6),
  -- Context
  operation VARCHAR(100), -- 'task_execution', 'quality_review', etc.
  logged_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
 );
 CREATE INDEX idx_agent_usage_task ON agent_usage_logs(task_id);
 CREATE INDEX idx_agent_usage_session ON agent_usage_logs(agent_session_id);
 CREATE INDEX idx_agent_usage_workspace ON agent_usage_logs(workspace_id);
 ```
 ### Valkey/Redis Keys
 ```
 # Current usage (real-time)
 usage:current:{workspace_id} -> HASH
  {
    "total_tokens": "1234567",
    "total_cost_usd": "12.34",
    "last_updated": "2026-02-04T10:30:00Z"
  }
 # Per-task usage
 usage:task:{task_id} -> HASH
  {
    "allocated": "50000",
    "consumed": "23456",
    "remaining": "26544",
    "agents": ["agent-1", "agent-2"]
  }
 # Budget alerts
 usage:alerts:{workspace_id} -> LIST
  [
    {
      "task_id": "task-123",
      "level": "warning",
      "percent_used": 92,
      "message": "Task approaching budget limit"
    }
  ]
 ```
 ### Cost Estimation Formulas
 Based on autonomous execution learnings:
 ```typescript
 function estimateTaskCost(task: Task): number {
  const baselineTokens = task.estimatedComplexity * 1000; // tokens per complexity point
  // Overhead factors
  const tddOverhead = 1.2; // +20% for test writing
  const baselineBuffer = 1.3; // +30% general buffer
  const phaseBuffer = 1.15; // +15% phase-specific uncertainty
  const estimated = baselineTokens * tddOverhead * baselineBuffer * phaseBuffer;
  return Math.ceil(estimated);
 }
 ```
 ### Model Tier Optimization
 Route tasks to appropriate model tiers for cost efficiency:
 | Model            | Cost/MTok (input) | Cost/MTok (output) | Use Case                             |
 | ---------------- | ----------------- | ------------------ | ------------------------------------ |
 | Claude Haiku 3.5 | $0.80             | $4.00              | Simple CRUD, boilerplate, linting    |
 | Claude Sonnet 4  | $3.00             | $15.00             | Standard development, refactoring    |
 | Claude Opus 4    | $15.00            | $75.00             | Complex architecture, critical fixes |
 **Routing logic:**
 ```typescript
 function selectModel(task: Task): ModelTier {
  if (task.priority === "critical" || task.complexity > 8) {
    return "opus";
  }
  if (task.type === "boilerplate" || task.estimatedTokens < 10000) {
    return "haiku";
  }
  return "sonnet"; // default
 }
 ```
 ### Projection Algorithm
 Predict budget exhaustion:
 ```typescript
 async function projectBudgetExhaustion(workspaceId: string): Promise<Projection> {
  const usage = await usageTracker.getCurrentUsage(workspaceId);
  const budget = await budgetAllocator.getGlobalBudget(workspaceId);
  const dailyBurnRate = usage.tokensLast24h;
  const daysRemaining = budget.remaining / dailyBurnRate;
  const exhaustionDate = new Date();
  exhaustionDate.setDate(exhaustionDate.getDate() + daysRemaining);
  return {
    remaining_tokens: budget.remaining,
    daily_burn_rate: dailyBurnRate,
    days_until_exhaustion: daysRemaining,
    projected_exhaustion_date: exhaustionDate,
    recommendation: daysRemaining < 7 ? "THROTTLE" : "CONTINUE",
  };
 }
 ```
 ### Implementation Priority
 **MVP (M6 Phase 3):**
 - ✅ Basic usage tracking (log tokens per task)
 - ✅ Simple budget checks (can afford this task?)
 - ✅ Alert on budget exceeded
 **Post-MVP (M6 Phase 5):**
 - Projection engine (when will budget run out?)
 - Model tier optimization (Haiku/Sonnet/Opus routing)
 - Historical analysis (actual vs estimated)
 - Budget reallocation (move budget between projects)
 ### Success Metrics
 - **Budget accuracy**: Estimated vs actual within 20%
 - **Cost optimization**: 40%+ savings from model tier routing
 - **No surprise exhaustion**: Zero instances of unexpected budget depletion
 - **Steady momentum**: Projects maintain velocity without budget interruptions
 ---
 ## Implementation Phases
 ### Phase 1: Foundation (Week 1-2)