Prediction integration for cost estimation #373

Closed
opened 2026-02-15 05:29:05 +00:00 by jason.woltje · 1 comment
Owner

Summary

Integrate the Mosaic Telemetry prediction API to provide pre-task cost and token estimates. Before executing expensive LLM operations or dispatching agent tasks, query predictions to inform budgeting decisions and display estimates to users.

Context

The telemetry system maintains a prediction model trained on historical task completion data. Given a (task_type, model, provider, complexity) tuple, it returns statistical distributions for tokens, cost, and duration. This enables:

  • Budget guards: Warn before expensive operations
  • Model selection: Choose cost-effective models for simple tasks
  • User transparency: Show estimated cost before confirming

Requirements

Prediction Query Flow

// Before task execution
const prediction = await telemetry.getPrediction({
  taskType: 'implementation',
  model: 'claude-opus-4-6',
  provider: 'anthropic',
  complexity: 'high',
});

if (prediction) {
  const estimatedCost = prediction.costUsdMicros.median;
  const estimatedTokens = prediction.inputTokens.median + prediction.outputTokens.median;
  // Use for budget checks, display to user, populate estimated_* fields in events
}

Prediction Response Fields

  • input_tokens — Distribution (p10, p25, median, p75, p90)
  • output_tokens — Distribution
  • cost_usd_micros — Cost distribution by percentile
  • duration_ms — Duration distribution
  • correction_factors — Input/output multipliers for adjustment
  • quality — Historical gate_pass_rate and success_rate
  • metadata — sample_size, confidence (none/low/medium/high), fallback_level

Integration Points

  1. Pre-LLM call estimation — Query prediction before chat/embed calls, populate estimated_* fields
  2. Orchestrator task budgeting — Before dispatching agent task, check if predicted cost exceeds budget
  3. Frontend display — Show cost estimate in UI before user confirms expensive actions
  4. Model selection hints — When multiple models available, use predictions to suggest cheapest viable option

Cache Strategy

  • Predictions cached in-memory (TTL: 6 hours by default)
  • Refresh on app startup for common (task_type, model, provider) combos
  • Background refresh periodically

Graceful Degradation

  • If prediction unavailable (confidence=none, fallback_level=-1): proceed without estimate
  • Never block task execution on prediction failure
  • Log missing predictions for coverage analysis

Acceptance Criteria

  • Predictions queried before LLM calls (populate estimated_* fields)
  • Orchestrator checks predicted cost against task budget
  • Cache populated on startup for common combinations
  • Graceful handling when no prediction data exists
  • Frontend can display cost estimate (API endpoint or WebSocket event)
  • Unit tests for prediction integration
  • Confidence level exposed (so UI can show "estimate confidence: high/low")
## Summary Integrate the Mosaic Telemetry prediction API to provide pre-task cost and token estimates. Before executing expensive LLM operations or dispatching agent tasks, query predictions to inform budgeting decisions and display estimates to users. ## Context The telemetry system maintains a prediction model trained on historical task completion data. Given a (task_type, model, provider, complexity) tuple, it returns statistical distributions for tokens, cost, and duration. This enables: - **Budget guards:** Warn before expensive operations - **Model selection:** Choose cost-effective models for simple tasks - **User transparency:** Show estimated cost before confirming ## Requirements ### Prediction Query Flow ```typescript // Before task execution const prediction = await telemetry.getPrediction({ taskType: 'implementation', model: 'claude-opus-4-6', provider: 'anthropic', complexity: 'high', }); if (prediction) { const estimatedCost = prediction.costUsdMicros.median; const estimatedTokens = prediction.inputTokens.median + prediction.outputTokens.median; // Use for budget checks, display to user, populate estimated_* fields in events } ``` ### Prediction Response Fields - `input_tokens` — Distribution (p10, p25, median, p75, p90) - `output_tokens` — Distribution - `cost_usd_micros` — Cost distribution by percentile - `duration_ms` — Duration distribution - `correction_factors` — Input/output multipliers for adjustment - `quality` — Historical gate_pass_rate and success_rate - `metadata` — sample_size, confidence (none/low/medium/high), fallback_level ### Integration Points 1. **Pre-LLM call estimation** — Query prediction before chat/embed calls, populate `estimated_*` fields 2. **Orchestrator task budgeting** — Before dispatching agent task, check if predicted cost exceeds budget 3. **Frontend display** — Show cost estimate in UI before user confirms expensive actions 4. **Model selection hints** — When multiple models available, use predictions to suggest cheapest viable option ### Cache Strategy - Predictions cached in-memory (TTL: 6 hours by default) - Refresh on app startup for common (task_type, model, provider) combos - Background refresh periodically ### Graceful Degradation - If prediction unavailable (confidence=none, fallback_level=-1): proceed without estimate - Never block task execution on prediction failure - Log missing predictions for coverage analysis ## Acceptance Criteria - [ ] Predictions queried before LLM calls (populate estimated_* fields) - [ ] Orchestrator checks predicted cost against task budget - [ ] Cache populated on startup for common combinations - [ ] Graceful handling when no prediction data exists - [ ] Frontend can display cost estimate (API endpoint or WebSocket event) - [ ] Unit tests for prediction integration - [ ] Confidence level exposed (so UI can show "estimate confidence: high/low")
jason.woltje added the ai label 2026-02-15 05:29:05 +00:00
jason.woltje added this to the M10-Telemetry (0.0.10) milestone 2026-02-15 05:31:19 +00:00
Author
Owner

Completed in commit d5bf501 on feature/m10-telemetry. PredictionService with 6hr TTL cache, startup refresh, GET /api/telemetry/estimate endpoint. Tests passing.

Completed in commit d5bf501 on feature/m10-telemetry. PredictionService with 6hr TTL cache, startup refresh, GET /api/telemetry/estimate endpoint. Tests passing.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: mosaic/stack#373