merge: resolve conflicts with develop (telemetry + lockfile)
Some checks failed
ci/woodpecker/push/infra Pipeline was successful
ci/woodpecker/push/api Pipeline failed
ci/woodpecker/push/web Pipeline failed
ci/woodpecker/push/orchestrator Pipeline failed
ci/woodpecker/push/coordinator Pipeline was successful

Keep both Mosaic Telemetry section (from develop) and Matrix Dev
Environment section (from feature branch) in .env.example.
Regenerate pnpm-lock.yaml with both dependency trees merged.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-15 12:12:43 -06:00
42 changed files with 6276 additions and 15 deletions

735
docs/telemetry.md Normal file
View File

@@ -0,0 +1,735 @@
# Mosaic Telemetry Integration Guide
## 1. Overview
### What is Mosaic Telemetry?
Mosaic Telemetry is a task completion tracking system purpose-built for AI operations within Mosaic Stack. It captures detailed metrics about every AI task execution -- token usage, cost, duration, outcome, and quality gate results -- and submits them to a central telemetry API for aggregation and analysis.
The aggregated data powers a **prediction system** that provides pre-task estimates for cost, token usage, and expected quality, enabling informed decisions before dispatching work to AI agents.
### How It Differs from OpenTelemetry
Mosaic Stack uses **two separate telemetry systems** that serve different purposes:
| Aspect | OpenTelemetry (OTEL) | Mosaic Telemetry |
| --------------------------------- | --------------------------------------------- | -------------------------------------------- |
| **Purpose** | Distributed request tracing and observability | AI task completion metrics and predictions |
| **What it tracks** | HTTP requests, spans, latency, errors | Token counts, costs, outcomes, quality gates |
| **Data destination** | OTEL Collector (Jaeger, Grafana, etc.) | Mosaic Telemetry API (PostgreSQL-backed) |
| **Module location (API)** | `apps/api/src/telemetry/` | `apps/api/src/mosaic-telemetry/` |
| **Module location (Coordinator)** | `apps/coordinator/src/telemetry.py` | `apps/coordinator/src/mosaic_telemetry.py` |
Both systems can run simultaneously. They are completely independent.
### Architecture
```
+------------------+ +------------------+
| Mosaic API | | Coordinator |
| (NestJS) | | (FastAPI) |
+--------+---------+ +--------+---------+
| |
Track events Track events
| |
v v
+------------------------------------------+
| Telemetry Client SDK |
| (JS: @mosaicstack/telemetry-client) |
| (Py: mosaicstack-telemetry) |
| |
| - Event queue (in-memory) |
| - Batch submission (5-min intervals) |
| - Prediction cache (6hr TTL) |
+-------------------+----------------------+
|
HTTP POST /events
HTTP POST /predictions
|
v
+------------------------------------------+
| Mosaic Telemetry API |
| (Separate service) |
| |
| - Event ingestion & validation |
| - Aggregation & statistics |
| - Prediction generation |
+-------------------+----------------------+
|
v
+---------------+
| PostgreSQL |
+---------------+
```
**Data flow:**
1. Application code calls `trackTaskCompletion()` (JS) or `client.track()` (Python)
2. Events are queued in memory (up to 1,000 events)
3. A background timer flushes the queue every 5 minutes in batches of up to 100
4. The telemetry API ingests events, validates them, and stores them in PostgreSQL
5. Prediction queries are served from aggregated data with a 6-hour cache TTL
---
## 2. Configuration Guide
### Environment Variables
All configuration is done through environment variables prefixed with `MOSAIC_TELEMETRY_`:
| Variable | Type | Default | Description |
| ------------------------------ | ------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------ |
| `MOSAIC_TELEMETRY_ENABLED` | boolean | `true` | Master switch. Set to `false` to completely disable telemetry (no HTTP calls). |
| `MOSAIC_TELEMETRY_SERVER_URL` | string | (none) | URL of the telemetry API server. For Docker Compose: `http://telemetry-api:8000`. For production: `https://tel-api.mosaicstack.dev`. |
| `MOSAIC_TELEMETRY_API_KEY` | string | (none) | API key for authenticating with the telemetry server. Generate with: `openssl rand -hex 32` (64-char hex string). |
| `MOSAIC_TELEMETRY_INSTANCE_ID` | string | (none) | Unique UUID identifying this Mosaic Stack instance. Generate with: `uuidgen` or `python -c "import uuid; print(uuid.uuid4())"`. |
| `MOSAIC_TELEMETRY_DRY_RUN` | boolean | `false` | When `true`, events are logged to console instead of being sent via HTTP. Useful for development. |
### Enabling Telemetry
To enable telemetry, set all three required variables in your `.env` file:
```bash
MOSAIC_TELEMETRY_ENABLED=true
MOSAIC_TELEMETRY_SERVER_URL=http://telemetry-api:8000
MOSAIC_TELEMETRY_API_KEY=<your-64-char-hex-api-key>
MOSAIC_TELEMETRY_INSTANCE_ID=<your-uuid>
```
If `MOSAIC_TELEMETRY_ENABLED` is `true` but any of `SERVER_URL`, `API_KEY`, or `INSTANCE_ID` is missing, the service logs a warning and disables telemetry gracefully. This is intentional: telemetry configuration issues never prevent the application from starting.
### Disabling Telemetry
Set `MOSAIC_TELEMETRY_ENABLED=false` in your `.env`. No HTTP calls will be made, and all tracking methods become safe no-ops.
### Dry-Run Mode
For local development and debugging, enable dry-run mode:
```bash
MOSAIC_TELEMETRY_ENABLED=true
MOSAIC_TELEMETRY_DRY_RUN=true
MOSAIC_TELEMETRY_SERVER_URL=http://localhost:8000 # Not actually called
MOSAIC_TELEMETRY_API_KEY=0000000000000000000000000000000000000000000000000000000000000000
MOSAIC_TELEMETRY_INSTANCE_ID=00000000-0000-0000-0000-000000000000
```
In dry-run mode, the SDK logs event payloads to the console instead of submitting them via HTTP. This lets you verify that tracking points are firing correctly without needing a running telemetry API.
### Docker Compose Configuration
Both `docker-compose.yml` (root) and `docker/docker-compose.yml` pass telemetry environment variables to the API service:
```yaml
services:
mosaic-api:
environment:
# Telemetry (task completion tracking & predictions)
MOSAIC_TELEMETRY_ENABLED: ${MOSAIC_TELEMETRY_ENABLED:-false}
MOSAIC_TELEMETRY_SERVER_URL: ${MOSAIC_TELEMETRY_SERVER_URL:-http://telemetry-api:8000}
MOSAIC_TELEMETRY_API_KEY: ${MOSAIC_TELEMETRY_API_KEY:-}
MOSAIC_TELEMETRY_INSTANCE_ID: ${MOSAIC_TELEMETRY_INSTANCE_ID:-}
MOSAIC_TELEMETRY_DRY_RUN: ${MOSAIC_TELEMETRY_DRY_RUN:-false}
```
Note that telemetry defaults to `false` in Docker Compose. Set `MOSAIC_TELEMETRY_ENABLED=true` in your `.env` to activate it.
An optional local telemetry API service is available (commented out in `docker/docker-compose.yml`). Uncomment it to run a self-contained development environment:
```yaml
# Uncomment in docker/docker-compose.yml
telemetry-api:
image: git.mosaicstack.dev/mosaic/telemetry-api:latest
container_name: mosaic-telemetry-api
restart: unless-stopped
environment:
HOST: 0.0.0.0
PORT: 8000
ports:
- "8001:8000"
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 10s
networks:
- mosaic-network
```
---
## 3. What Gets Tracked
### TaskCompletionEvent Schema
Every tracked event conforms to the `TaskCompletionEvent` interface. This is the core data structure submitted to the telemetry API:
| Field | Type | Description |
| --------------------------- | ------------------- | -------------------------------------------------------------- |
| `instance_id` | `string` | UUID of the Mosaic Stack instance that generated the event |
| `event_id` | `string` | Unique UUID for this event (auto-generated by the SDK) |
| `schema_version` | `string` | Schema version for forward compatibility (auto-set by the SDK) |
| `timestamp` | `string` | ISO 8601 timestamp of event creation (auto-set by the SDK) |
| `task_duration_ms` | `number` | How long the task took in milliseconds |
| `task_type` | `TaskType` | Type of task performed (see enum below) |
| `complexity` | `Complexity` | Complexity level of the task |
| `harness` | `Harness` | The coding harness or tool used |
| `model` | `string` | AI model name (e.g., `"claude-sonnet-4-5"`) |
| `provider` | `Provider` | AI model provider |
| `estimated_input_tokens` | `number` | Pre-task estimated input tokens (from predictions) |
| `estimated_output_tokens` | `number` | Pre-task estimated output tokens (from predictions) |
| `actual_input_tokens` | `number` | Actual input tokens consumed |
| `actual_output_tokens` | `number` | Actual output tokens generated |
| `estimated_cost_usd_micros` | `number` | Pre-task estimated cost in microdollars (USD \* 1,000,000) |
| `actual_cost_usd_micros` | `number` | Actual cost in microdollars |
| `quality_gate_passed` | `boolean` | Whether all quality gates passed |
| `quality_gates_run` | `QualityGate[]` | List of quality gates that were executed |
| `quality_gates_failed` | `QualityGate[]` | List of quality gates that failed |
| `context_compactions` | `number` | Number of context window compactions during the task |
| `context_rotations` | `number` | Number of context window rotations during the task |
| `context_utilization_final` | `number` | Final context window utilization (0.0 to 1.0) |
| `outcome` | `Outcome` | Task outcome |
| `retry_count` | `number` | Number of retries before completion |
| `language` | `string?` | Primary programming language (optional) |
| `repo_size_category` | `RepoSizeCategory?` | Repository size category (optional) |
### Enum Values
**TaskType:**
`planning`, `implementation`, `code_review`, `testing`, `debugging`, `refactoring`, `documentation`, `configuration`, `security_audit`, `unknown`
**Complexity:**
`low`, `medium`, `high`, `critical`
**Harness:**
`claude_code`, `opencode`, `kilo_code`, `aider`, `api_direct`, `ollama_local`, `custom`, `unknown`
**Provider:**
`anthropic`, `openai`, `openrouter`, `ollama`, `google`, `mistral`, `custom`, `unknown`
**QualityGate:**
`build`, `lint`, `test`, `coverage`, `typecheck`, `security`
**Outcome:**
`success`, `failure`, `partial`, `timeout`
**RepoSizeCategory:**
`tiny`, `small`, `medium`, `large`, `huge`
### API Service: LLM Call Tracking
The NestJS API tracks every LLM service call (chat, streaming chat, and embeddings) via `LlmTelemetryTrackerService` at `apps/api/src/llm/llm-telemetry-tracker.service.ts`.
Tracked operations:
- **`chat`** -- Synchronous chat completions
- **`chatStream`** -- Streaming chat completions
- **`embed`** -- Embedding generation
For each call, the tracker captures:
- Model name and provider type
- Input and output token counts
- Duration in milliseconds
- Success or failure outcome
- Calculated cost from the built-in cost table (`apps/api/src/llm/llm-cost-table.ts`)
- Task type inferred from calling context (e.g., `"brain"` maps to `planning`, `"review"` maps to `code_review`)
The cost table uses longest-prefix matching on model names and covers all major Anthropic and OpenAI models. Ollama/local models are treated as zero-cost.
### Coordinator: Agent Task Dispatch Tracking
The FastAPI coordinator tracks agent task completions in `apps/coordinator/src/mosaic_telemetry.py` and `apps/coordinator/src/coordinator.py`.
After each agent task dispatch (success or failure), the coordinator emits a `TaskCompletionEvent` capturing:
- Task duration from start to finish
- Agent model, provider, and harness (resolved from the `assigned_agent` field)
- Task outcome (`success`, `failure`, `partial`, `timeout`)
- Quality gate results (build, lint, test, etc.)
- Retry count for the issue
- Complexity level from issue metadata
The coordinator uses the `build_task_event()` helper function which provides sensible defaults for the coordinator context (Claude Code harness, Anthropic provider, TypeScript language).
### Event Lifecycle
```
1. Application code calls trackTaskCompletion() or client.track()
|
v
2. Event is added to in-memory queue (max 1,000 events)
|
v
3. Background timer fires every 5 minutes (submitIntervalMs)
|
v
4. Queue is drained in batches of up to 100 events (batchSize)
|
v
5. Each batch is POSTed to the telemetry API
|
v
6. API validates, stores, and acknowledges each event
```
If the telemetry API is unreachable, events remain in the queue and are retried on the next interval (up to 3 retries per submission). Telemetry errors are logged but never propagated to calling code.
---
## 4. Prediction System
### How Predictions Work
The Mosaic Telemetry API aggregates historical task completion data across all contributing instances. From this data, it generates statistical predictions for new tasks based on their characteristics (task type, model, provider, complexity).
Predictions include percentile distributions (p10, p25, median, p75, p90) for token usage and cost, plus quality metrics (gate pass rate, success rate).
### Querying Predictions via API
The API exposes a prediction endpoint at:
```
GET /api/telemetry/estimate?taskType=<taskType>&model=<model>&provider=<provider>&complexity=<complexity>
```
**Authentication:** Requires a valid session (Bearer token via `AuthGuard`).
**Query Parameters (all required):**
| Parameter | Type | Example | Description |
| ------------ | ------------ | ------------------- | --------------------- |
| `taskType` | `TaskType` | `implementation` | Task type to estimate |
| `model` | `string` | `claude-sonnet-4-5` | Model name |
| `provider` | `Provider` | `anthropic` | Provider name |
| `complexity` | `Complexity` | `medium` | Complexity level |
**Example Request:**
```bash
curl -X GET \
'http://localhost:3001/api/telemetry/estimate?taskType=implementation&model=claude-sonnet-4-5&provider=anthropic&complexity=medium' \
-H 'Authorization: Bearer YOUR_SESSION_TOKEN'
```
**Response:**
```json
{
"data": {
"prediction": {
"input_tokens": {
"p10": 500,
"p25": 1200,
"median": 2500,
"p75": 5000,
"p90": 10000
},
"output_tokens": {
"p10": 200,
"p25": 800,
"median": 1500,
"p75": 3000,
"p90": 6000
},
"cost_usd_micros": {
"median": 30000
},
"duration_ms": {
"median": 5000
},
"correction_factors": {
"input": 1.0,
"output": 1.0
},
"quality": {
"gate_pass_rate": 0.85,
"success_rate": 0.92
}
},
"metadata": {
"sample_size": 150,
"fallback_level": 0,
"confidence": "high",
"last_updated": "2026-02-15T10:00:00Z",
"cache_hit": true
}
}
}
```
If no prediction data is available, the response returns `{ "data": null }`.
### Confidence Levels
The prediction system reports a confidence level based on sample size and data freshness:
| Confidence | Meaning |
| ---------- | -------------------------------------------------------------- |
| `high` | Substantial sample size, recent data, all dimensions matched |
| `medium` | Moderate sample, some dimension fallback |
| `low` | Small sample or significant fallback from requested dimensions |
| `none` | No data available for this combination |
### Fallback Behavior
When exact matches are unavailable, the prediction system falls back through progressively broader aggregations:
1. **Exact match** -- task_type + model + provider + complexity
2. **Drop complexity** -- task_type + model + provider
3. **Drop model** -- task_type + provider
4. **Global** -- task_type only
The `fallback_level` field in metadata indicates which level was used (0 = exact match).
### Cache Strategy
Predictions are cached in-memory by the SDK with a **6-hour TTL** (`predictionCacheTtlMs: 21_600_000`). The `PredictionService` pre-fetches common combinations on startup to warm the cache:
- **Models:** claude-sonnet-4-5, claude-opus-4, claude-haiku-4-5, gpt-4o, gpt-4o-mini
- **Task types:** implementation, planning, code_review
- **Complexities:** low, medium
This produces 30 pre-cached queries (5 models x 3 task types x 2 complexities). Subsequent requests for these combinations are served from cache without any HTTP call.
---
## 5. SDK Reference
### JavaScript: @mosaicstack/telemetry-client
**Registry:** Gitea npm registry at `git.mosaicstack.dev`
**Version:** 0.1.0
**Installation:**
```bash
pnpm add @mosaicstack/telemetry-client
```
**Key Exports:**
```typescript
// Client
import { TelemetryClient, EventBuilder, resolveConfig } from "@mosaicstack/telemetry-client";
// Types
import type {
TelemetryConfig,
TaskCompletionEvent,
EventBuilderParams,
PredictionQuery,
PredictionResponse,
PredictionData,
PredictionMetadata,
TokenDistribution,
} from "@mosaicstack/telemetry-client";
// Enums
import {
TaskType,
Complexity,
Harness,
Provider,
QualityGate,
Outcome,
RepoSizeCategory,
} from "@mosaicstack/telemetry-client";
```
**TelemetryClient API:**
| Method | Description |
| ------------------------------------------------------------------- | ------------------------------------------------------------ |
| `constructor(config: TelemetryConfig)` | Create a new client with the given configuration |
| `start(): void` | Start background batch submission (idempotent) |
| `stop(): Promise<void>` | Stop background submission, flush remaining events |
| `track(event: TaskCompletionEvent): void` | Queue an event for batch submission (never throws) |
| `getPrediction(query: PredictionQuery): PredictionResponse \| null` | Get a cached prediction (returns null if not cached/expired) |
| `refreshPredictions(queries: PredictionQuery[]): Promise<void>` | Force-refresh predictions from the server |
| `eventBuilder: EventBuilder` | Get the EventBuilder for constructing events |
| `queueSize: number` | Number of events currently queued |
| `isRunning: boolean` | Whether the client is currently running |
**TelemetryConfig Options:**
| Option | Type | Default | Description |
| ---------------------- | ------------------------ | ------------------- | ---------------------------------- |
| `serverUrl` | `string` | (required) | Base URL of the telemetry server |
| `apiKey` | `string` | (required) | 64-char hex API key |
| `instanceId` | `string` | (required) | UUID for this instance |
| `enabled` | `boolean` | `true` | Enable/disable telemetry |
| `submitIntervalMs` | `number` | `300_000` (5 min) | Interval between batch submissions |
| `maxQueueSize` | `number` | `1000` | Maximum queued events |
| `batchSize` | `number` | `100` | Maximum events per batch |
| `requestTimeoutMs` | `number` | `10_000` (10 sec) | HTTP request timeout |
| `predictionCacheTtlMs` | `number` | `21_600_000` (6 hr) | Prediction cache TTL |
| `dryRun` | `boolean` | `false` | Log events instead of sending |
| `maxRetries` | `number` | `3` | Retries per submission |
| `onError` | `(error: Error) => void` | noop | Error callback |
**EventBuilder Usage:**
```typescript
const event = client.eventBuilder.build({
task_duration_ms: 1500,
task_type: TaskType.IMPLEMENTATION,
complexity: Complexity.LOW,
harness: Harness.API_DIRECT,
model: "claude-sonnet-4-5",
provider: Provider.ANTHROPIC,
estimated_input_tokens: 0,
estimated_output_tokens: 0,
actual_input_tokens: 200,
actual_output_tokens: 500,
estimated_cost_usd_micros: 0,
actual_cost_usd_micros: 8100,
quality_gate_passed: true,
quality_gates_run: [QualityGate.LINT, QualityGate.TEST],
quality_gates_failed: [],
context_compactions: 0,
context_rotations: 0,
context_utilization_final: 0.3,
outcome: Outcome.SUCCESS,
retry_count: 0,
language: "typescript",
});
client.track(event);
```
### Python: mosaicstack-telemetry
**Registry:** Gitea PyPI registry at `git.mosaicstack.dev`
**Version:** 0.1.0
**Installation:**
```bash
pip install mosaicstack-telemetry
```
**Key Imports:**
```python
from mosaicstack_telemetry import (
TelemetryClient,
TelemetryConfig,
EventBuilder,
TaskType,
Complexity,
Harness,
Provider,
QualityGate,
Outcome,
)
```
**Python Client Usage:**
```python
# Create config (reads MOSAIC_TELEMETRY_* env vars automatically)
config = TelemetryConfig()
errors = config.validate()
# Create and start client
client = TelemetryClient(config)
await client.start_async()
# Build and track an event
builder = EventBuilder(instance_id=config.instance_id)
event = (
builder
.task_type(TaskType.IMPLEMENTATION)
.complexity_level(Complexity.MEDIUM)
.harness_type(Harness.CLAUDE_CODE)
.model("claude-sonnet-4-5")
.provider(Provider.ANTHROPIC)
.duration_ms(5000)
.outcome_value(Outcome.SUCCESS)
.tokens(
estimated_in=0,
estimated_out=0,
actual_in=3000,
actual_out=1500,
)
.cost(estimated=0, actual=52500)
.quality(
passed=True,
gates_run=[QualityGate.BUILD, QualityGate.LINT, QualityGate.TEST],
gates_failed=[],
)
.context(compactions=0, rotations=0, utilization=0.4)
.retry_count(0)
.language("typescript")
.build()
)
client.track(event)
# Shutdown (flushes remaining events)
await client.stop_async()
```
---
## 6. Development Guide
### Testing Locally with Dry-Run Mode
The fastest way to develop with telemetry is to use dry-run mode. This logs event payloads to the console without needing a running telemetry API:
```bash
# In your .env
MOSAIC_TELEMETRY_ENABLED=true
MOSAIC_TELEMETRY_DRY_RUN=true
MOSAIC_TELEMETRY_SERVER_URL=http://localhost:8000
MOSAIC_TELEMETRY_API_KEY=0000000000000000000000000000000000000000000000000000000000000000
MOSAIC_TELEMETRY_INSTANCE_ID=00000000-0000-0000-0000-000000000000
```
Start the API server and trigger LLM operations. You will see telemetry event payloads logged in the console output.
### Adding New Tracking Points
To add telemetry tracking to a new service in the NestJS API:
**Step 1:** Inject `MosaicTelemetryService` into your service. Because `MosaicTelemetryModule` is global, no module import is needed:
```typescript
import { Injectable } from "@nestjs/common";
import { MosaicTelemetryService } from "../mosaic-telemetry/mosaic-telemetry.service";
import { TaskType, Complexity, Harness, Provider, Outcome } from "@mosaicstack/telemetry-client";
@Injectable()
export class MyService {
constructor(private readonly telemetry: MosaicTelemetryService) {}
}
```
**Step 2:** Build and track events after task completion:
```typescript
async performTask(): Promise<void> {
const start = Date.now();
// ... perform the task ...
const duration = Date.now() - start;
const builder = this.telemetry.eventBuilder;
if (builder) {
const event = builder.build({
task_duration_ms: duration,
task_type: TaskType.IMPLEMENTATION,
complexity: Complexity.MEDIUM,
harness: Harness.API_DIRECT,
model: "claude-sonnet-4-5",
provider: Provider.ANTHROPIC,
estimated_input_tokens: 0,
estimated_output_tokens: 0,
actual_input_tokens: inputTokens,
actual_output_tokens: outputTokens,
estimated_cost_usd_micros: 0,
actual_cost_usd_micros: costMicros,
quality_gate_passed: true,
quality_gates_run: [],
quality_gates_failed: [],
context_compactions: 0,
context_rotations: 0,
context_utilization_final: 0,
outcome: Outcome.SUCCESS,
retry_count: 0,
});
this.telemetry.trackTaskCompletion(event);
}
}
```
**Step 3:** For LLM-specific tracking, use `LlmTelemetryTrackerService` instead, which handles cost calculation and task type inference automatically:
```typescript
import { LlmTelemetryTrackerService } from "../llm/llm-telemetry-tracker.service";
@Injectable()
export class MyLlmService {
constructor(private readonly telemetryTracker: LlmTelemetryTrackerService) {}
async chat(): Promise<void> {
const start = Date.now();
// ... call LLM ...
this.telemetryTracker.trackLlmCompletion({
model: "claude-sonnet-4-5",
providerType: "claude",
operation: "chat",
durationMs: Date.now() - start,
inputTokens: 150,
outputTokens: 300,
callingContext: "brain", // Used for task type inference
success: true,
});
}
}
```
### Adding Tracking in the Coordinator (Python)
Use the `build_task_event()` helper from `src/mosaic_telemetry.py`:
```python
from src.mosaic_telemetry import build_task_event, get_telemetry_client
client = get_telemetry_client(app)
if client is not None:
event = build_task_event(
instance_id=instance_id,
task_type=TaskType.IMPLEMENTATION,
complexity=Complexity.MEDIUM,
outcome=Outcome.SUCCESS,
duration_ms=5000,
model="claude-sonnet-4-5",
provider=Provider.ANTHROPIC,
harness=Harness.CLAUDE_CODE,
actual_input_tokens=3000,
actual_output_tokens=1500,
actual_cost_micros=52500,
)
client.track(event)
```
### Troubleshooting
**Telemetry events not appearing:**
1. Check that `MOSAIC_TELEMETRY_ENABLED=true` is set
2. Verify all three required variables are set: `SERVER_URL`, `API_KEY`, `INSTANCE_ID`
3. Look for warning logs: `"Mosaic Telemetry is enabled but missing configuration"` indicates a missing variable
4. Try dry-run mode to confirm events are being generated
**Console shows "Mosaic Telemetry is disabled":**
This is the expected message when `MOSAIC_TELEMETRY_ENABLED=false`. If you intended telemetry to be active, set it to `true`.
**Events queuing but not submitting:**
- Check that the telemetry API server at `MOSAIC_TELEMETRY_SERVER_URL` is reachable
- Verify the API key is a valid 64-character hex string
- The default submission interval is 5 minutes; wait at least one interval or call `stop()` to force a flush
**Prediction endpoint returns null:**
- Predictions require sufficient historical data in the telemetry API
- Check the `metadata.confidence` field; `"none"` means no data exists for this combination
- Predictions are cached for 6 hours; new data takes time to appear
- The `PredictionService` logs startup refresh status; check logs for errors
**"Telemetry client error" in logs:**
- These are non-fatal. The SDK never blocks application logic.
- Common causes: network timeout, invalid API key, server-side validation failure
- Check the telemetry API logs for corresponding errors