Track LLM task completions via Mosaic Telemetry #371

Closed
opened 2026-02-15 05:28:26 +00:00 by jason.woltje · 1 comment
Owner

Summary

Instrument the LLM service layer to emit TaskCompletionEvents through the Mosaic Telemetry client after each LLM interaction completes. This is the primary data source for token usage tracking, cost analysis, and prediction model training.

Context

The telemetry system tracks AI coding task completions with rich metadata. The LLM service (apps/api/src/llm/) is where all provider calls happen — this is the natural integration point.

Note: This is separate from the existing OpenTelemetry (OTEL) instrumentation which handles request tracing/spans. Mosaic Telemetry tracks higher-level task completion metrics for cost forecasting and quality analysis.

Requirements

Event Construction

After each LLM call completes, build a TaskCompletionEvent using EventBuilder:

const event = telemetry.eventBuilder.build({
  taskType: 'implementation',
  complexity: 'medium',
  harness: 'api_direct',
  model: 'claude-sonnet-4-5-20250929',
  provider: 'anthropic',
  taskDurationMs: elapsed,
  estimatedInputTokens: promptTokenEstimate,
  estimatedOutputTokens: completionTokenEstimate,
  actualInputTokens: response.usage.input_tokens,
  actualOutputTokens: response.usage.output_tokens,
  estimatedCostUsdMicros: preEstimate,
  actualCostUsdMicros: computedCost,
  qualityGatePassed: true,
  qualityGatesRun: ['build', 'typecheck'],
  qualityGatesFailed: [],
  contextCompactions: 0,
  contextRotations: 0,
  contextUtilizationFinal: 0.0,
  outcome: 'success',
  retryCount: 0,
});
telemetry.trackTaskCompletion(event);

Integration Points

  1. LlmService.chat() — Standard chat completions
  2. LlmService.chatStream() — Streaming completions (aggregate tokens after stream ends)
  3. LlmService.embed() — Embedding operations

Provider-Specific Token Extraction

Each provider returns usage data differently:

  • Anthropic: response.usage.input_tokens, response.usage.output_tokens
  • OpenAI: response.usage.prompt_tokens, response.usage.completion_tokens
  • Ollama: response.eval_count, response.prompt_eval_count

Normalize all to the common actual_input_tokens / actual_output_tokens fields.

Cost Calculation

  • Maintain a cost table (or use predictions) for $/token by model
  • Store in microdollars (USD * 1,000,000) as integers
  • Example: $0.003/1K input tokens = 3000 microdollars per 1K tokens

Task Type Inference

Map the calling context to a TaskType:

  • Chat conversations → implementation or planning (based on system prompt)
  • Brain queries → planning
  • Code generation requests → implementation
  • Review requests → code_review

Acceptance Criteria

  • All LLM calls emit TaskCompletionEvents
  • Token usage accurately captured per provider
  • Cost calculated in microdollars
  • Streaming responses aggregate tokens correctly
  • Events queued (non-blocking) — never delays LLM response to user
  • Task type inferred from context
  • Unit tests with mocked telemetry client
  • Integration test verifying event structure matches schema
## Summary Instrument the LLM service layer to emit `TaskCompletionEvent`s through the Mosaic Telemetry client after each LLM interaction completes. This is the primary data source for token usage tracking, cost analysis, and prediction model training. ## Context The telemetry system tracks AI coding task completions with rich metadata. The LLM service (`apps/api/src/llm/`) is where all provider calls happen — this is the natural integration point. **Note:** This is separate from the existing OpenTelemetry (OTEL) instrumentation which handles request tracing/spans. Mosaic Telemetry tracks higher-level task completion metrics for cost forecasting and quality analysis. ## Requirements ### Event Construction After each LLM call completes, build a `TaskCompletionEvent` using `EventBuilder`: ```typescript const event = telemetry.eventBuilder.build({ taskType: 'implementation', complexity: 'medium', harness: 'api_direct', model: 'claude-sonnet-4-5-20250929', provider: 'anthropic', taskDurationMs: elapsed, estimatedInputTokens: promptTokenEstimate, estimatedOutputTokens: completionTokenEstimate, actualInputTokens: response.usage.input_tokens, actualOutputTokens: response.usage.output_tokens, estimatedCostUsdMicros: preEstimate, actualCostUsdMicros: computedCost, qualityGatePassed: true, qualityGatesRun: ['build', 'typecheck'], qualityGatesFailed: [], contextCompactions: 0, contextRotations: 0, contextUtilizationFinal: 0.0, outcome: 'success', retryCount: 0, }); telemetry.trackTaskCompletion(event); ``` ### Integration Points 1. **`LlmService.chat()`** — Standard chat completions 2. **`LlmService.chatStream()`** — Streaming completions (aggregate tokens after stream ends) 3. **`LlmService.embed()`** — Embedding operations ### Provider-Specific Token Extraction Each provider returns usage data differently: - **Anthropic:** `response.usage.input_tokens`, `response.usage.output_tokens` - **OpenAI:** `response.usage.prompt_tokens`, `response.usage.completion_tokens` - **Ollama:** `response.eval_count`, `response.prompt_eval_count` Normalize all to the common `actual_input_tokens` / `actual_output_tokens` fields. ### Cost Calculation - Maintain a cost table (or use predictions) for $/token by model - Store in microdollars (USD * 1,000,000) as integers - Example: $0.003/1K input tokens = 3000 microdollars per 1K tokens ### Task Type Inference Map the calling context to a `TaskType`: - Chat conversations → `implementation` or `planning` (based on system prompt) - Brain queries → `planning` - Code generation requests → `implementation` - Review requests → `code_review` ## Acceptance Criteria - [ ] All LLM calls emit TaskCompletionEvents - [ ] Token usage accurately captured per provider - [ ] Cost calculated in microdollars - [ ] Streaming responses aggregate tokens correctly - [ ] Events queued (non-blocking) — never delays LLM response to user - [ ] Task type inferred from context - [ ] Unit tests with mocked telemetry client - [ ] Integration test verifying event structure matches schema
jason.woltje added the ai label 2026-02-15 05:28:26 +00:00
jason.woltje added this to the M10-Telemetry (0.0.10) milestone 2026-02-15 05:31:19 +00:00
Author
Owner

Completed in commit 639881f on feature/m10-telemetry. Created LlmTelemetryTrackerService with fire-and-forget tracking, llm-cost-table.ts with microdollar pricing. Instrumented LlmService chat/chatStream/embed. 69 unit tests.

Completed in commit 639881f on feature/m10-telemetry. Created LlmTelemetryTrackerService with fire-and-forget tracking, llm-cost-table.ts with microdollar pricing. Instrumented LlmService chat/chatStream/embed. 69 unit tests.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: mosaic/stack#371