Files
telemetry-client-js/docs/integration-guide.md
Jason Woltje 20f56edb49
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
docs(#1): document dev/release package versioning convention
Add versioning table to README and integration guide showing dist-tags,
version formats, and .npmrc registry configuration for the Gitea npm
registry.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 23:02:46 -06:00

423 lines
13 KiB
Markdown

# Integration Guide
This guide covers how to integrate `@mosaicstack/telemetry-client` into your applications. The SDK targets **Mosaic Telemetry API v1** (event schema version `1.0`).
## Prerequisites
- Node.js >= 18 (for native `fetch` and `crypto.randomUUID()`)
- A Mosaic Telemetry API key and instance ID (issued by an administrator via the admin API)
## Installation
Configure the Gitea npm registry in your project's `.npmrc`:
```ini
@mosaicstack:registry=https://git.mosaicstack.dev/api/packages/mosaic/npm/
```
Then install:
```bash
# Latest stable release (from main)
npm install @mosaicstack/telemetry-client
# Latest dev build (from develop)
npm install @mosaicstack/telemetry-client@dev
```
| Branch | Dist-tag | Version format | Example |
|--------|----------|----------------|---------|
| `main` | `latest` | `{version}` | `0.1.0` |
| `develop` | `dev` | `{version}-dev.{YYYYMMDDHHmmss}` | `0.1.0-dev.20260215050000` |
The package ships ESM-only with TypeScript declarations. Zero runtime dependencies.
## Environment Setup
Store your credentials in environment variables — never hardcode them.
```bash
# .env (not committed — add to .gitignore)
TELEMETRY_API_URL=https://tel-api.mosaicstack.dev
TELEMETRY_API_KEY=msk_your_api_key_here
TELEMETRY_INSTANCE_ID=a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d
```
```bash
# .env.example (committed — documents required variables)
TELEMETRY_API_URL=https://tel-api.mosaicstack.dev
TELEMETRY_API_KEY=your-api-key
TELEMETRY_INSTANCE_ID=your-instance-uuid
```
---
## Instrumenting a Next.js App
Next.js server actions and API routes run on Node.js, so the SDK works directly. Create a shared singleton and track events from your server-side code.
### 1. Create a telemetry singleton
```typescript
// lib/telemetry.ts
import {
TelemetryClient,
TaskType,
Complexity,
Harness,
Provider,
Outcome,
QualityGate,
} from '@mosaicstack/telemetry-client';
let client: TelemetryClient | null = null;
export function getTelemetryClient(): TelemetryClient {
if (!client) {
client = new TelemetryClient({
serverUrl: process.env.TELEMETRY_API_URL!,
apiKey: process.env.TELEMETRY_API_KEY!,
instanceId: process.env.TELEMETRY_INSTANCE_ID!,
enabled: process.env.NODE_ENV === 'production',
onError: (err) => console.error('[telemetry]', err.message),
});
client.start();
}
return client;
}
// Re-export enums for convenience
export { TaskType, Complexity, Harness, Provider, Outcome, QualityGate };
```
### 2. Track events from an API route
```typescript
// app/api/task-complete/route.ts
import { NextResponse } from 'next/server';
import { getTelemetryClient, TaskType, Complexity, Harness, Provider, Outcome } from '@/lib/telemetry';
export async function POST(request: Request) {
const body = await request.json();
const client = getTelemetryClient();
const event = client.eventBuilder.build({
task_duration_ms: body.durationMs,
task_type: TaskType.IMPLEMENTATION,
complexity: Complexity.MEDIUM,
harness: Harness.CLAUDE_CODE,
model: body.model,
provider: Provider.ANTHROPIC,
estimated_input_tokens: body.estimatedInputTokens,
estimated_output_tokens: body.estimatedOutputTokens,
actual_input_tokens: body.actualInputTokens,
actual_output_tokens: body.actualOutputTokens,
estimated_cost_usd_micros: body.estimatedCostMicros,
actual_cost_usd_micros: body.actualCostMicros,
quality_gate_passed: body.qualityGatePassed,
quality_gates_run: body.qualityGatesRun,
quality_gates_failed: body.qualityGatesFailed,
context_compactions: body.contextCompactions,
context_rotations: body.contextRotations,
context_utilization_final: body.contextUtilization,
outcome: Outcome.SUCCESS,
retry_count: 0,
language: 'typescript',
});
client.track(event);
return NextResponse.json({ status: 'queued' });
}
```
### 3. Graceful shutdown
Next.js doesn't provide a built-in shutdown hook, but you can handle `SIGTERM`:
```typescript
// instrumentation.ts (Next.js instrumentation file)
export async function register() {
if (process.env.NEXT_RUNTIME === 'nodejs') {
const { getTelemetryClient } = await import('./lib/telemetry');
// Ensure the client starts on server boot
getTelemetryClient();
// Flush remaining events on shutdown
const shutdown = async () => {
const { getTelemetryClient } = await import('./lib/telemetry');
const client = getTelemetryClient();
await client.stop();
process.exit(0);
};
process.on('SIGTERM', shutdown);
process.on('SIGINT', shutdown);
}
}
```
---
## Instrumenting a Node.js Service
For a standalone Node.js service (Express, Fastify, plain script, etc.).
### 1. Initialize and start
```typescript
// src/telemetry.ts
import { TelemetryClient } from '@mosaicstack/telemetry-client';
export const telemetry = new TelemetryClient({
serverUrl: process.env.TELEMETRY_API_URL ?? 'https://tel-api.mosaicstack.dev',
apiKey: process.env.TELEMETRY_API_KEY!,
instanceId: process.env.TELEMETRY_INSTANCE_ID!,
onError: (err) => console.error('[telemetry]', err.message),
});
telemetry.start();
```
### 2. Track events after task completion
```typescript
// src/task-runner.ts
import {
TaskType,
Complexity,
Harness,
Provider,
Outcome,
QualityGate,
} from '@mosaicstack/telemetry-client';
import { telemetry } from './telemetry.js';
async function runTask() {
const startTime = Date.now();
// ... run your AI coding task ...
const durationMs = Date.now() - startTime;
const event = telemetry.eventBuilder.build({
task_duration_ms: durationMs,
task_type: TaskType.IMPLEMENTATION,
complexity: Complexity.HIGH,
harness: Harness.CLAUDE_CODE,
model: 'claude-sonnet-4-5-20250929',
provider: Provider.ANTHROPIC,
estimated_input_tokens: 200000,
estimated_output_tokens: 80000,
actual_input_tokens: 215000,
actual_output_tokens: 72000,
estimated_cost_usd_micros: 1200000,
actual_cost_usd_micros: 1150000,
quality_gate_passed: true,
quality_gates_run: [
QualityGate.BUILD,
QualityGate.LINT,
QualityGate.TEST,
QualityGate.TYPECHECK,
],
quality_gates_failed: [],
context_compactions: 3,
context_rotations: 1,
context_utilization_final: 0.85,
outcome: Outcome.SUCCESS,
retry_count: 0,
language: 'typescript',
repo_size_category: 'medium',
});
telemetry.track(event);
}
```
### 3. Graceful shutdown
```typescript
// src/main.ts
import { telemetry } from './telemetry.js';
async function main() {
// ... your application logic ...
// On shutdown, flush remaining events
process.on('SIGTERM', async () => {
await telemetry.stop();
process.exit(0);
});
}
main();
```
---
## Using Predictions
The telemetry API provides crowd-sourced predictions for token usage, cost, and duration based on historical data. The SDK caches these predictions locally.
### Pre-populate the cache
Call `refreshPredictions()` at startup with the dimension combinations your application uses:
```typescript
import { TaskType, Provider, Complexity } from '@mosaicstack/telemetry-client';
import { telemetry } from './telemetry.js';
// Fetch predictions for all combinations you'll need
await telemetry.refreshPredictions([
{ task_type: TaskType.IMPLEMENTATION, model: 'claude-sonnet-4-5-20250929', provider: Provider.ANTHROPIC, complexity: Complexity.LOW },
{ task_type: TaskType.IMPLEMENTATION, model: 'claude-sonnet-4-5-20250929', provider: Provider.ANTHROPIC, complexity: Complexity.MEDIUM },
{ task_type: TaskType.IMPLEMENTATION, model: 'claude-sonnet-4-5-20250929', provider: Provider.ANTHROPIC, complexity: Complexity.HIGH },
{ task_type: TaskType.TESTING, model: 'claude-haiku-4-5-20251001', provider: Provider.ANTHROPIC, complexity: Complexity.LOW },
]);
```
### Read cached predictions
```typescript
const prediction = telemetry.getPrediction({
task_type: TaskType.IMPLEMENTATION,
model: 'claude-sonnet-4-5-20250929',
provider: Provider.ANTHROPIC,
complexity: Complexity.MEDIUM,
});
if (prediction?.prediction) {
const p = prediction.prediction;
console.log('Token predictions (median):', {
inputTokens: p.input_tokens.median,
outputTokens: p.output_tokens.median,
});
console.log('Cost prediction:', `$${(p.cost_usd_micros.median / 1_000_000).toFixed(2)}`);
console.log('Duration prediction:', `${(p.duration_ms.median / 1000).toFixed(0)}s`);
console.log('Correction factors:', {
input: p.correction_factors.input, // >1.0 means estimates tend to be too low
output: p.correction_factors.output,
});
console.log('Quality:', {
gatePassRate: `${(p.quality.gate_pass_rate * 100).toFixed(0)}%`,
successRate: `${(p.quality.success_rate * 100).toFixed(0)}%`,
});
// Check confidence level
if (prediction.metadata.confidence === 'low') {
console.warn('Low confidence — small sample size or fallback was applied');
}
}
```
### Understand fallback behavior
When the server doesn't have enough data for an exact match, it broadens the query by dropping dimensions (e.g., ignoring complexity). The `metadata` fields tell you what happened:
| `fallback_level` | Meaning |
|-------------------|---------|
| `0` | Exact match on all dimensions |
| `1+` | Some dimensions were dropped to find data |
| `-1` | No prediction data available at any level |
---
## Environment-Specific Configuration
### Development
```typescript
const client = new TelemetryClient({
serverUrl: 'http://localhost:8000', // Local dev server
apiKey: process.env.TELEMETRY_API_KEY!,
instanceId: process.env.TELEMETRY_INSTANCE_ID!,
dryRun: true, // Don't send real data
submitIntervalMs: 10_000, // Flush more frequently for debugging
onError: (err) => console.error('[telemetry]', err),
});
```
### Production
```typescript
const client = new TelemetryClient({
serverUrl: 'https://tel-api.mosaicstack.dev',
apiKey: process.env.TELEMETRY_API_KEY!,
instanceId: process.env.TELEMETRY_INSTANCE_ID!,
submitIntervalMs: 300_000, // 5 min (default)
maxRetries: 3, // Retry on transient failures
onError: (err) => {
// Route to your observability stack
logger.error('Telemetry submission failed', { error: err.message });
},
});
```
### Conditional enable/disable
```typescript
const client = new TelemetryClient({
serverUrl: process.env.TELEMETRY_API_URL!,
apiKey: process.env.TELEMETRY_API_KEY!,
instanceId: process.env.TELEMETRY_INSTANCE_ID!,
enabled: process.env.TELEMETRY_ENABLED !== 'false', // Opt-out via env var
});
```
When `enabled` is `false`, `track()` returns immediately without queuing.
---
## Error Handling
The SDK is designed to never disrupt your application:
- **`track()` never throws.** All errors are caught and routed to the `onError` callback.
- **Failed batches are re-queued.** If a submission fails, events are prepended back to the queue for the next flush cycle.
- **Exponential backoff with jitter.** Retries use 1s base delay, doubling up to 60s, with random jitter to prevent thundering herd.
- **`Retry-After` header support.** On HTTP 429 (rate limited), the SDK respects the server's `Retry-After` header.
- **HTTP 403 is not retried.** An API key / instance ID mismatch is a permanent error.
### Custom error handling
```typescript
const client = new TelemetryClient({
// ...
onError: (error) => {
if (error.message.includes('HTTP 403')) {
console.error('Telemetry auth failed — check API key and instance ID');
} else if (error.message.includes('HTTP 429')) {
console.warn('Telemetry rate limited — events will be retried');
} else {
console.error('Telemetry error:', error.message);
}
},
});
```
---
## Batch Submission Behavior
The SDK batches events for efficiency:
1. `track(event)` adds the event to an in-memory queue (bounded, FIFO eviction at capacity).
2. Every `submitIntervalMs` (default: 5 minutes), the background timer drains the queue in batches of up to `batchSize` (default/max: 100).
3. Each batch is POSTed to `POST /v1/events/batch` with exponential backoff on failure.
4. Calling `stop()` flushes all remaining events before resolving.
The server accepts up to **100 events per batch** and supports **partial success** — some events may be accepted while others (e.g., duplicates) are rejected.
---
## API Version Compatibility
| SDK Version | API Version | Schema Version |
|-------------|-------------|----------------|
| 0.1.x | v1 (`/v1/` endpoints) | `1.0` |
The `EventBuilder` automatically sets `schema_version: "1.0"` on every event. The SDK submits to `/v1/events/batch` and queries `/v1/predictions/batch`.
When the telemetry API introduces a v2, this SDK will add support in a new major release. The server supports two API versions simultaneously during a 6-month deprecation window.