From 231a799a4622a87047cbacf049eb9dede4493597 Mon Sep 17 00:00:00 2001 From: Jason Woltje Date: Sat, 14 Feb 2026 22:38:19 -0600 Subject: [PATCH] docs(#1): SDK integration guide, API reference, and CI pipeline - Rewrite README with quick start, config table, prediction usage, API version note - Add docs/integration-guide.md with Next.js and Node.js examples, env-specific config, error handling patterns, batch behavior, and API version compatibility - Add docs/api-reference.md with full reference for all exported classes, methods, types, and enums - Add .woodpecker.yml with quality gates (lint, typecheck, format, security audit, test with coverage) and npm publish to Gitea registry - Add AGENTS.md and update CLAUDE.md with project conventions Fixes #1 Co-Authored-By: Claude Opus 4.6 --- .woodpecker.yml | 91 ++++++ AGENTS.md | 72 +++++ CLAUDE.md | 52 ++++ README.md | 133 +++++---- docs/api-reference.md | 602 ++++++++++++++++++++++++++++++++++++++ docs/integration-guide.md | 405 +++++++++++++++++++++++++ 6 files changed, 1303 insertions(+), 52 deletions(-) create mode 100644 .woodpecker.yml create mode 100644 AGENTS.md create mode 100644 docs/api-reference.md create mode 100644 docs/integration-guide.md diff --git a/.woodpecker.yml b/.woodpecker.yml new file mode 100644 index 0000000..50c24fb --- /dev/null +++ b/.woodpecker.yml @@ -0,0 +1,91 @@ +when: + - event: [push, pull_request, manual] + +variables: + - &node_image "node:22-alpine" + - &install_deps | + corepack enable + npm ci + +steps: + install: + image: *node_image + commands: + - *install_deps + + lint: + image: *node_image + commands: + - *install_deps + - npm run lint + depends_on: + - install + + typecheck: + image: *node_image + commands: + - *install_deps + - npm run typecheck + depends_on: + - install + + format-check: + image: *node_image + commands: + - *install_deps + - npm run format:check + depends_on: + - install + + security-audit: + image: *node_image + commands: + - npm audit --audit-level=high + depends_on: + - install + + test: + image: *node_image + commands: + - *install_deps + - npm run test:coverage + depends_on: + - install + + build: + image: *node_image + commands: + - *install_deps + - npm run build + depends_on: + - lint + - typecheck + - format-check + - security-audit + - test + + publish: + image: *node_image + environment: + GITEA_TOKEN: + from_secret: gitea_token + commands: + - *install_deps + - npm run build + - | + echo "//git.mosaicstack.dev/api/packages/mosaic/npm/:_authToken=$$GITEA_TOKEN" > .npmrc + echo "@mosaicstack:registry=https://git.mosaicstack.dev/api/packages/mosaic/npm/" >> .npmrc + - | + CURRENT=$(node -p "require('./package.json').version") + PUBLISHED=$(npm view @mosaicstack/telemetry-client version 2>/dev/null || echo "0.0.0") + if [ "$$CURRENT" = "$$PUBLISHED" ]; then + echo "Version $$CURRENT already published, skipping" + exit 0 + fi + echo "Publishing $$CURRENT (was $$PUBLISHED)" + npm publish --access public + when: + - branch: [main, develop] + event: [push, manual, tag] + depends_on: + - build diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000..806ed87 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,72 @@ +# mosaic-telemetry-client-js — Agent Context + +> Patterns, gotchas, and orchestrator integration for AI agents working on this project. +> **Update this file** when you discover reusable patterns or non-obvious requirements. + +## Codebase Patterns + + + + + + + +## Common Gotchas + + + + + + + +## Quality Gates + +**All must pass before any commit:** + +```bash +npm run lint ${QUALITY_GATES}${QUALITY_GATES} npm run typecheck ${QUALITY_GATES}${QUALITY_GATES} npm test +``` + +## Orchestrator Integration + +### Task Prefix +Use `MOSAIC-TELEMETRY-CLIENT-JS` as the prefix for orchestrated tasks (e.g., `MOSAIC-TELEMETRY-CLIENT-JS-SEC-001`). + +### Package/Directory Names + + +| Directory | Purpose | +|-----------|---------| +| `src/` | Main source code | +| `tests/` | Test files | +| `docs/scratchpads/` | Working documents | + +### Worker Checklist +When completing an orchestrated task: +1. Read the finding details from the report +2. Implement the fix following existing code patterns +3. Run quality gates (ALL must pass) +4. Commit with: `git commit -m "fix({finding_id}): brief description"` +5. Report result as JSON to orchestrator + +### Post-Coding Review +After implementing changes, the orchestrator will run: +1. **Codex code review** — `~/.claude/scripts/codex/codex-code-review.sh --uncommitted` +2. **Codex security review** — `~/.claude/scripts/codex/codex-security-review.sh --uncommitted` +3. If blockers/critical findings: remediation task created +4. If clean: task marked done + +## Directory-Specific Context + + + + + + +## Testing Approaches + + + + + + diff --git a/CLAUDE.md b/CLAUDE.md index cc2c41c..0efaf0f 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -28,3 +28,55 @@ npm run build # Build to dist/ - `track()` never throws — catches everything, routes to `onError` callback - Zero runtime deps: uses native `fetch` (Node 18+), `crypto.randomUUID()`, `setInterval` - All types are standalone — no dependency on the telemetry server package + +## Conditional Documentation Loading + +**Read the relevant guide before starting work:** + +| Task Type | Guide | +|-----------|-------| +| Bootstrapping a new project | `~/.claude/agent-guides/bootstrap.md` | +| Orchestrating autonomous tasks | `~/.claude/agent-guides/orchestrator.md` | +| Ralph autonomous development | `~/.claude/agent-guides/ralph-autonomous.md` | +| Frontend development | `~/.claude/agent-guides/frontend.md` | +| Backend/API development | `~/.claude/agent-guides/backend.md` | +| TypeScript strict typing | `~/.claude/agent-guides/typescript.md` | +| Code review | `~/.claude/agent-guides/code-review.md` | +| Authentication/Authorization | `~/.claude/agent-guides/authentication.md` | +| Infrastructure/DevOps | `~/.claude/agent-guides/infrastructure.md` | +| QA/Testing | `~/.claude/agent-guides/qa-testing.md` | +| Secrets management (Vault) | `~/.claude/agent-guides/vault-secrets.md` | + + +## Commits + +``` +(#issue): Brief description + +Detailed explanation if needed. + +Fixes #123 +``` + +Types: `feat`, `fix`, `docs`, `test`, `refactor`, `chore` + + +## Secrets Management + +**NEVER hardcode secrets.** Use `.env` files (gitignored) or a secrets manager. + +```bash +# .env.example is committed (with placeholders) +# .env is NOT committed (contains real values) +``` + +Ensure `.gitignore` includes `.env*` (except `.env.example`). + + +## Multi-Agent Coordination + +When multiple agents work on this project: +1. `git pull --rebase` before editing +2. `git pull --rebase` before pushing +3. If conflicts, **alert the user** — don't auto-resolve data conflicts + diff --git a/README.md b/README.md index b00b0c8..d69fb35 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,9 @@ TypeScript client SDK for [Mosaic Stack Telemetry](https://tel.mosaicstack.dev). Reports task-completion metrics from AI coding harnesses and queries crowd-sourced predictions. -**Zero runtime dependencies** — uses native `fetch`, `crypto.randomUUID()`, and `setInterval`. +**Zero runtime dependencies** — uses native `fetch`, `crypto.randomUUID()`, and `setInterval`. Requires Node.js 18+. + +**Targets Mosaic Telemetry API v1** (`/v1/` endpoints, event schema version `1.0`). ## Installation @@ -13,17 +15,26 @@ npm install @mosaicstack/telemetry-client ## Quick Start ```typescript -import { TelemetryClient, TaskType, Complexity, Harness, Provider, Outcome } from '@mosaicstack/telemetry-client'; +import { + TelemetryClient, + TaskType, + Complexity, + Harness, + Provider, + Outcome, + QualityGate, +} from '@mosaicstack/telemetry-client'; +// 1. Create and start the client const client = new TelemetryClient({ - serverUrl: 'https://tel.mosaicstack.dev', - apiKey: 'your-64-char-hex-api-key', - instanceId: 'your-instance-uuid', + serverUrl: 'https://tel-api.mosaicstack.dev', + apiKey: process.env.TELEMETRY_API_KEY!, + instanceId: process.env.TELEMETRY_INSTANCE_ID!, }); -client.start(); +client.start(); // begins background batch submission every 5 minutes -// Build and track an event +// 2. Build and track an event const event = client.eventBuilder.build({ task_duration_ms: 45000, task_type: TaskType.IMPLEMENTATION, @@ -31,83 +42,101 @@ const event = client.eventBuilder.build({ harness: Harness.CLAUDE_CODE, model: 'claude-sonnet-4-5-20250929', provider: Provider.ANTHROPIC, - estimated_input_tokens: 5000, - estimated_output_tokens: 2000, - actual_input_tokens: 5500, - actual_output_tokens: 2200, - estimated_cost_usd_micros: 30000, - actual_cost_usd_micros: 33000, + estimated_input_tokens: 105000, + estimated_output_tokens: 45000, + actual_input_tokens: 112340, + actual_output_tokens: 38760, + estimated_cost_usd_micros: 630000, + actual_cost_usd_micros: 919200, quality_gate_passed: true, - quality_gates_run: [], + quality_gates_run: [QualityGate.BUILD, QualityGate.LINT, QualityGate.TEST], quality_gates_failed: [], - context_compactions: 0, + context_compactions: 2, context_rotations: 0, - context_utilization_final: 0.4, + context_utilization_final: 0.72, outcome: Outcome.SUCCESS, retry_count: 0, + language: 'typescript', + repo_size_category: 'medium', }); -client.track(event); +client.track(event); // queues the event (never throws) -// When shutting down -await client.stop(); -``` - -## Querying Predictions - -```typescript -const query = { +// 3. Query predictions +const prediction = client.getPrediction({ task_type: TaskType.IMPLEMENTATION, model: 'claude-sonnet-4-5-20250929', provider: Provider.ANTHROPIC, complexity: Complexity.MEDIUM, -}; +}); -// Fetch from server and cache locally -await client.refreshPredictions([query]); - -// Get cached prediction (returns null if not cached) -const prediction = client.getPrediction(query); -if (prediction?.prediction) { - console.log('Median input tokens:', prediction.prediction.input_tokens.median); - console.log('Median cost (microdollars):', prediction.prediction.cost_usd_micros.median); -} +// 4. Shut down gracefully (flushes remaining events) +await client.stop(); ``` ## Configuration -```typescript -const client = new TelemetryClient({ - serverUrl: 'https://tel.mosaicstack.dev', // Required - apiKey: 'your-api-key', // Required (64-char hex) - instanceId: 'your-uuid', // Required +| Option | Type | Default | Description | +|--------|------|---------|-------------| +| `serverUrl` | `string` | **required** | Telemetry API base URL | +| `apiKey` | `string` | **required** | Bearer token for authentication | +| `instanceId` | `string` | **required** | UUID identifying this instance | +| `enabled` | `boolean` | `true` | Set `false` to disable — `track()` becomes a no-op | +| `submitIntervalMs` | `number` | `300_000` | Background flush interval (5 min) | +| `maxQueueSize` | `number` | `1000` | Max queued events before FIFO eviction | +| `batchSize` | `number` | `100` | Events per batch submission (server max: 100) | +| `requestTimeoutMs` | `number` | `10_000` | HTTP request timeout | +| `predictionCacheTtlMs` | `number` | `21_600_000` | Prediction cache TTL (6 hours) | +| `dryRun` | `boolean` | `false` | Log events instead of sending them | +| `maxRetries` | `number` | `3` | Retry attempts with exponential backoff | +| `onError` | `(error: Error) => void` | silent | Error callback | - // Optional - enabled: true, // Set false to disable (track() becomes no-op) - submitIntervalMs: 300_000, // Background flush interval (default: 5 min) - maxQueueSize: 1000, // Max queued events (default: 1000, FIFO eviction) - batchSize: 100, // Events per batch (default/max: 100) - requestTimeoutMs: 10_000, // HTTP timeout (default: 10s) - predictionCacheTtlMs: 21_600_000, // Prediction cache TTL (default: 6 hours) - dryRun: false, // Log events instead of sending - maxRetries: 3, // Retry attempts on failure - onError: (err) => console.error(err), // Error callback +## Querying Predictions + +Predictions are crowd-sourced token/cost/duration estimates from the telemetry API. The SDK caches them locally with a configurable TTL. + +```typescript +// Fetch predictions from the server and cache locally +await client.refreshPredictions([ + { task_type: TaskType.IMPLEMENTATION, model: 'claude-sonnet-4-5-20250929', provider: Provider.ANTHROPIC, complexity: Complexity.MEDIUM }, + { task_type: TaskType.TESTING, model: 'claude-haiku-4-5-20251001', provider: Provider.ANTHROPIC, complexity: Complexity.LOW }, +]); + +// Read from cache (returns null if not cached or expired) +const prediction = client.getPrediction({ + task_type: TaskType.IMPLEMENTATION, + model: 'claude-sonnet-4-5-20250929', + provider: Provider.ANTHROPIC, + complexity: Complexity.MEDIUM, }); + +if (prediction?.prediction) { + console.log('Median input tokens:', prediction.prediction.input_tokens.median); + console.log('Median cost ($):', prediction.prediction.cost_usd_micros.median / 1_000_000); + console.log('Confidence:', prediction.metadata.confidence); +} ``` ## Dry-Run Mode -For testing without sending data: +For development and testing without sending data to the server: ```typescript const client = new TelemetryClient({ - serverUrl: 'https://tel.mosaicstack.dev', + serverUrl: 'https://tel-api.mosaicstack.dev', apiKey: 'test-key', instanceId: 'test-uuid', dryRun: true, }); ``` +In dry-run mode, `track()` still queues events and `flush()` still runs, but the `BatchSubmitter` returns synthetic `accepted` responses without making HTTP calls. + +## Documentation + +- **[Integration Guide](docs/integration-guide.md)** — Next.js and Node.js examples, environment-specific configuration, error handling patterns +- **[API Reference](docs/api-reference.md)** — Full reference for all exported classes, methods, types, and enums + ## License MPL-2.0 diff --git a/docs/api-reference.md b/docs/api-reference.md new file mode 100644 index 0000000..483c379 --- /dev/null +++ b/docs/api-reference.md @@ -0,0 +1,602 @@ +# API Reference + +Complete reference for all classes, methods, types, and enums exported by `@mosaicstack/telemetry-client`. + +**SDK version:** 0.1.0 +**Targets:** Mosaic Telemetry API v1, event schema version `1.0` + +--- + +## TelemetryClient + +Main entry point. Queues task-completion events for background batch submission and provides access to cached predictions. + +```typescript +import { TelemetryClient } from '@mosaicstack/telemetry-client'; +``` + +### Constructor + +```typescript +new TelemetryClient(config: TelemetryConfig) +``` + +Creates a new client instance. Does **not** start background submission — call `start()` to begin. + +### Properties + +| Property | Type | Description | +|----------|------|-------------| +| `eventBuilder` | `EventBuilder` | Builder for constructing `TaskCompletionEvent` objects | +| `queueSize` | `number` | Number of events currently in the queue | +| `isRunning` | `boolean` | Whether background submission is active | + +### Methods + +#### `start(): void` + +Start background batch submission via `setInterval`. Idempotent — calling `start()` multiple times has no effect. + +#### `stop(): Promise` + +Stop background submission and flush all remaining events. Idempotent. Returns a promise that resolves when the final flush completes. + +#### `track(event: TaskCompletionEvent): void` + +Queue an event for batch submission. **Never throws** — all errors are caught and routed to the `onError` callback. + +When `enabled` is `false`, this method returns immediately without queuing. + +When the queue is at capacity (`maxQueueSize`), the oldest event is evicted to make room. + +#### `getPrediction(query: PredictionQuery): PredictionResponse | null` + +Get a cached prediction for the given query dimensions. Returns `null` if no prediction is cached or the cache entry has expired. + +#### `refreshPredictions(queries: PredictionQuery[]): Promise` + +Fetch predictions from the server via `POST /v1/predictions/batch` and store them in the local cache. The predictions endpoint is public — no authentication required. + +Accepts up to 50 queries per call (server limit). + +--- + +## EventBuilder + +Convenience builder that auto-fills `event_id`, `timestamp`, `instance_id`, and `schema_version`. + +```typescript +import { EventBuilder } from '@mosaicstack/telemetry-client'; +``` + +Access via `client.eventBuilder` — you don't normally construct this directly. + +### Methods + +#### `build(params: EventBuilderParams): TaskCompletionEvent` + +Build a complete `TaskCompletionEvent` from the given parameters. + +Auto-generated fields: +- `event_id` — `crypto.randomUUID()` +- `timestamp` — `new Date().toISOString()` +- `instance_id` — from client config +- `schema_version` — `"1.0"` + +--- + +## EventQueue + +Bounded FIFO queue for telemetry events. Used internally by `TelemetryClient`. + +```typescript +import { EventQueue } from '@mosaicstack/telemetry-client'; +``` + +### Constructor + +```typescript +new EventQueue(maxSize: number) +``` + +### Properties + +| Property | Type | Description | +|----------|------|-------------| +| `size` | `number` | Current number of events in the queue | +| `isEmpty` | `boolean` | Whether the queue is empty | + +### Methods + +#### `enqueue(event: TaskCompletionEvent): void` + +Add an event. Evicts the oldest event if at capacity. + +#### `drain(maxItems: number): TaskCompletionEvent[]` + +Remove and return up to `maxItems` events from the front. + +#### `prepend(events: TaskCompletionEvent[]): void` + +Prepend events back to the front (used for re-enqueue on submission failure). Respects `maxSize` — excess events are dropped. + +--- + +## BatchSubmitter + +Handles HTTP submission of event batches with retry logic. + +```typescript +import { BatchSubmitter } from '@mosaicstack/telemetry-client'; +``` + +### Methods + +#### `submit(events: TaskCompletionEvent[]): Promise` + +Submit a batch to `POST /v1/events/batch`. Retries with exponential backoff (1s base, 60s max, with jitter) on transient failures. Respects the server's `Retry-After` header on HTTP 429. + +In dry-run mode, returns a synthetic success response without making HTTP calls. + +--- + +## PredictionCache + +In-memory TTL cache for prediction responses. + +```typescript +import { PredictionCache } from '@mosaicstack/telemetry-client'; +``` + +### Constructor + +```typescript +new PredictionCache(ttlMs: number) +``` + +### Properties + +| Property | Type | Description | +|----------|------|-------------| +| `size` | `number` | Number of entries in cache (may include expired entries) | + +### Methods + +#### `get(query: PredictionQuery): PredictionResponse | null` + +Retrieve a cached prediction. Returns `null` if not cached or expired (expired entries are lazily deleted). + +#### `set(query: PredictionQuery, response: PredictionResponse): void` + +Store a prediction with TTL. + +#### `clear(): void` + +Clear all cached predictions. + +--- + +## Configuration Types + +### TelemetryConfig + +User-facing configuration passed to the `TelemetryClient` constructor. + +```typescript +import type { TelemetryConfig } from '@mosaicstack/telemetry-client'; +``` + +| Field | Type | Required | Default | Description | +|-------|------|----------|---------|-------------| +| `serverUrl` | `string` | Yes | — | Telemetry API base URL (e.g., `"https://tel-api.mosaicstack.dev"`) | +| `apiKey` | `string` | Yes | — | Bearer token for `POST /v1/events/batch` authentication | +| `instanceId` | `string` | Yes | — | UUID identifying this Mosaic Stack instance | +| `enabled` | `boolean` | No | `true` | When `false`, `track()` is a no-op | +| `submitIntervalMs` | `number` | No | `300_000` | Background flush interval in ms (5 min) | +| `maxQueueSize` | `number` | No | `1000` | Maximum events held in queue before FIFO eviction | +| `batchSize` | `number` | No | `100` | Events per batch (server max: 100) | +| `requestTimeoutMs` | `number` | No | `10_000` | HTTP request timeout in ms | +| `predictionCacheTtlMs` | `number` | No | `21_600_000` | Prediction cache TTL in ms (6 hours) | +| `dryRun` | `boolean` | No | `false` | Simulate submissions without HTTP calls | +| `maxRetries` | `number` | No | `3` | Retry attempts on transient failure | +| `onError` | `(error: Error) => void` | No | silent | Callback invoked on errors | + +### ResolvedConfig + +Internal configuration with all defaults applied. All fields are required (non-optional). + +```typescript +import type { ResolvedConfig } from '@mosaicstack/telemetry-client'; +``` + +### resolveConfig + +```typescript +import { resolveConfig } from '@mosaicstack/telemetry-client'; + +function resolveConfig(config: TelemetryConfig): ResolvedConfig +``` + +Apply defaults to a `TelemetryConfig`, producing a `ResolvedConfig`. Strips trailing slashes from `serverUrl`. + +--- + +## Event Types + +### EventBuilderParams + +Parameters accepted by `EventBuilder.build()`. Excludes auto-generated fields (`event_id`, `timestamp`, `instance_id`, `schema_version`). + +```typescript +import type { EventBuilderParams } from '@mosaicstack/telemetry-client'; +``` + +| Field | Type | Required | Description | +|-------|------|----------|-------------| +| `task_duration_ms` | `number` | Yes | Wall-clock time in ms (0–86,400,000) | +| `task_type` | `TaskType` | Yes | Category of work performed | +| `complexity` | `Complexity` | Yes | Task complexity level | +| `harness` | `Harness` | Yes | Coding tool / execution environment | +| `model` | `string` | Yes | Model identifier (1–100 chars) | +| `provider` | `Provider` | Yes | LLM provider | +| `estimated_input_tokens` | `number` | Yes | Pre-task input token estimate (0–10,000,000) | +| `estimated_output_tokens` | `number` | Yes | Pre-task output token estimate (0–10,000,000) | +| `actual_input_tokens` | `number` | Yes | Actual input tokens consumed (0–10,000,000) | +| `actual_output_tokens` | `number` | Yes | Actual output tokens generated (0–10,000,000) | +| `estimated_cost_usd_micros` | `number` | Yes | Estimated cost in microdollars (0–100,000,000) | +| `actual_cost_usd_micros` | `number` | Yes | Actual cost in microdollars (0–100,000,000) | +| `quality_gate_passed` | `boolean` | Yes | Whether all quality gates passed | +| `quality_gates_run` | `QualityGate[]` | Yes | Gates that were executed | +| `quality_gates_failed` | `QualityGate[]` | Yes | Gates that failed | +| `context_compactions` | `number` | Yes | Context compaction count (0–100) | +| `context_rotations` | `number` | Yes | Context rotation count (0–50) | +| `context_utilization_final` | `number` | Yes | Final context utilization ratio (0.0–1.0) | +| `outcome` | `Outcome` | Yes | Task result | +| `retry_count` | `number` | Yes | Number of retries (0–20) | +| `language` | `string \| null` | No | Primary programming language (max 30 chars) | +| `repo_size_category` | `RepoSizeCategory \| null` | No | Repository size bucket | + +### TaskCompletionEvent + +Full event object submitted to the server. Extends `EventBuilderParams` with auto-generated identity fields. + +```typescript +import type { TaskCompletionEvent } from '@mosaicstack/telemetry-client'; +``` + +Additional fields (auto-generated by `EventBuilder`): + +| Field | Type | Description | +|-------|------|-------------| +| `instance_id` | `string` | UUID identifying the submitting instance | +| `event_id` | `string` | Unique UUID for deduplication | +| `schema_version` | `string` | Always `"1.0"` | +| `timestamp` | `string` | ISO 8601 datetime | + +--- + +## Prediction Types + +### PredictionQuery + +Query parameters for fetching a prediction. + +```typescript +import type { PredictionQuery } from '@mosaicstack/telemetry-client'; +``` + +| Field | Type | Description | +|-------|------|-------------| +| `task_type` | `TaskType` | Task type to predict for | +| `model` | `string` | Model identifier | +| `provider` | `Provider` | LLM provider | +| `complexity` | `Complexity` | Complexity level | + +### PredictionResponse + +Response from the predictions endpoint. + +```typescript +import type { PredictionResponse } from '@mosaicstack/telemetry-client'; +``` + +| Field | Type | Description | +|-------|------|-------------| +| `prediction` | `PredictionData \| null` | Prediction data, or `null` if no data available | +| `metadata` | `PredictionMetadata` | Sample size, confidence, fallback info | + +### PredictionData + +Statistical prediction for a dimension combination. + +```typescript +import type { PredictionData } from '@mosaicstack/telemetry-client'; +``` + +| Field | Type | Description | +|-------|------|-------------| +| `input_tokens` | `TokenDistribution` | Input token distribution (p10/p25/median/p75/p90) | +| `output_tokens` | `TokenDistribution` | Output token distribution | +| `cost_usd_micros` | `Record` | Cost stats — `{ median: number }` | +| `duration_ms` | `Record` | Duration stats — `{ median: number }` | +| `correction_factors` | `CorrectionFactors` | Actual-to-estimated token ratios | +| `quality` | `QualityPrediction` | Quality gate pass rate and success rate | + +### TokenDistribution + +Percentile distribution of token counts. + +```typescript +import type { TokenDistribution } from '@mosaicstack/telemetry-client'; +``` + +| Field | Type | Description | +|-------|------|-------------| +| `p10` | `number` | 10th percentile | +| `p25` | `number` | 25th percentile | +| `median` | `number` | 50th percentile (median) | +| `p75` | `number` | 75th percentile | +| `p90` | `number` | 90th percentile | + +### CorrectionFactors + +Ratio of actual to estimated tokens. Values >1.0 mean estimates tend to be too low. + +```typescript +import type { CorrectionFactors } from '@mosaicstack/telemetry-client'; +``` + +| Field | Type | Description | +|-------|------|-------------| +| `input` | `number` | Actual / estimated input tokens | +| `output` | `number` | Actual / estimated output tokens | + +### QualityPrediction + +Predicted quality gate and success rates. + +```typescript +import type { QualityPrediction } from '@mosaicstack/telemetry-client'; +``` + +| Field | Type | Description | +|-------|------|-------------| +| `gate_pass_rate` | `number` | Fraction of events where all quality gates pass (0.0–1.0) | +| `success_rate` | `number` | Fraction of events with `outcome: "success"` (0.0–1.0) | + +### PredictionMetadata + +Metadata about a prediction response. + +```typescript +import type { PredictionMetadata } from '@mosaicstack/telemetry-client'; +``` + +| Field | Type | Description | +|-------|------|-------------| +| `sample_size` | `number` | Number of events used to compute this prediction | +| `fallback_level` | `number` | 0 = exact match, 1+ = dimensions dropped, -1 = no data | +| `confidence` | `'none' \| 'low' \| 'medium' \| 'high'` | Confidence level | +| `last_updated` | `string \| null` | ISO 8601 timestamp of last computation | +| `dimensions_matched` | `Record \| null` | Matched dimensions (`null` values indicate fallback) | +| `fallback_note` | `string \| null` | Human-readable fallback explanation | +| `cache_hit` | `boolean` | Whether served from server-side cache | + +**Confidence level criteria:** + +| Level | Criteria | +|-------|----------| +| `none` | No data available. `prediction` is `null`. | +| `low` | Sample size < 30 or fallback was applied | +| `medium` | Sample size 30–99, exact match | +| `high` | Sample size >= 100, exact match | + +--- + +## Batch Types + +### BatchEventRequest + +Request body for `POST /v1/events/batch`. + +```typescript +import type { BatchEventRequest } from '@mosaicstack/telemetry-client'; +``` + +| Field | Type | Description | +|-------|------|-------------| +| `events` | `TaskCompletionEvent[]` | 1–100 events to submit | + +### BatchEventResponse + +Response from `POST /v1/events/batch`. + +```typescript +import type { BatchEventResponse } from '@mosaicstack/telemetry-client'; +``` + +| Field | Type | Description | +|-------|------|-------------| +| `accepted` | `number` | Count of accepted events | +| `rejected` | `number` | Count of rejected events | +| `results` | `BatchEventResult[]` | Per-event result details | + +### BatchEventResult + +Per-event result within a batch response. + +```typescript +import type { BatchEventResult } from '@mosaicstack/telemetry-client'; +``` + +| Field | Type | Description | +|-------|------|-------------| +| `event_id` | `string` | The event's UUID | +| `status` | `'accepted' \| 'rejected'` | Whether the event was accepted | +| `error` | `string \| null` | Error message if rejected | + +### SubmitResult + +Internal result type from `BatchSubmitter.submit()`. + +```typescript +import type { SubmitResult } from '@mosaicstack/telemetry-client'; +``` + +| Field | Type | Description | +|-------|------|-------------| +| `success` | `boolean` | Whether the submission succeeded | +| `response` | `BatchEventResponse \| undefined` | Server response (on success) | +| `retryAfterMs` | `number \| undefined` | Retry delay from 429 response | +| `error` | `Error \| undefined` | Error details (on failure) | + +### BatchPredictionRequest + +Request body for `POST /v1/predictions/batch`. + +```typescript +import type { BatchPredictionRequest } from '@mosaicstack/telemetry-client'; +``` + +| Field | Type | Description | +|-------|------|-------------| +| `queries` | `PredictionQuery[]` | 1–50 prediction queries | + +### BatchPredictionResponse + +Response from `POST /v1/predictions/batch`. + +```typescript +import type { BatchPredictionResponse } from '@mosaicstack/telemetry-client'; +``` + +| Field | Type | Description | +|-------|------|-------------| +| `results` | `PredictionResponse[]` | One response per query, in request order | + +--- + +## Enums + +All enums use string values matching the server's API contract. + +### TaskType + +```typescript +import { TaskType } from '@mosaicstack/telemetry-client'; +``` + +| Member | Value | Description | +|--------|-------|-------------| +| `PLANNING` | `"planning"` | Architecture design, task breakdown | +| `IMPLEMENTATION` | `"implementation"` | Writing new code | +| `CODE_REVIEW` | `"code_review"` | Reviewing existing code | +| `TESTING` | `"testing"` | Writing or running tests | +| `DEBUGGING` | `"debugging"` | Investigating and fixing bugs | +| `REFACTORING` | `"refactoring"` | Restructuring existing code | +| `DOCUMENTATION` | `"documentation"` | Writing docs, comments, READMEs | +| `CONFIGURATION` | `"configuration"` | Config files, CI/CD, infrastructure | +| `SECURITY_AUDIT` | `"security_audit"` | Security review, vulnerability analysis | +| `UNKNOWN` | `"unknown"` | Unclassified task type (fallback) | + +### Complexity + +```typescript +import { Complexity } from '@mosaicstack/telemetry-client'; +``` + +| Member | Value | Description | Typical Token Budget | +|--------|-------|-------------|---------------------| +| `LOW` | `"low"` | Simple fixes, typos, config changes | 50,000 | +| `MEDIUM` | `"medium"` | Standard features, moderate logic | 150,000 | +| `HIGH` | `"high"` | Complex features, multi-file changes | 350,000 | +| `CRITICAL` | `"critical"` | Major refactoring, architectural changes | 750,000 | + +### Harness + +```typescript +import { Harness } from '@mosaicstack/telemetry-client'; +``` + +| Member | Value | Description | +|--------|-------|-------------| +| `CLAUDE_CODE` | `"claude_code"` | Anthropic Claude Code CLI | +| `OPENCODE` | `"opencode"` | OpenCode CLI | +| `KILO_CODE` | `"kilo_code"` | Kilo Code VS Code extension | +| `AIDER` | `"aider"` | Aider AI pair programming | +| `API_DIRECT` | `"api_direct"` | Direct API calls (no harness) | +| `OLLAMA_LOCAL` | `"ollama_local"` | Ollama local inference | +| `CUSTOM` | `"custom"` | Custom or unrecognized harness | +| `UNKNOWN` | `"unknown"` | Harness not reported | + +### Provider + +```typescript +import { Provider } from '@mosaicstack/telemetry-client'; +``` + +| Member | Value | Description | +|--------|-------|-------------| +| `ANTHROPIC` | `"anthropic"` | Anthropic (Claude models) | +| `OPENAI` | `"openai"` | OpenAI (GPT models) | +| `OPENROUTER` | `"openrouter"` | OpenRouter (multi-provider routing) | +| `OLLAMA` | `"ollama"` | Ollama (local/self-hosted) | +| `GOOGLE` | `"google"` | Google (Gemini models) | +| `MISTRAL` | `"mistral"` | Mistral AI | +| `CUSTOM` | `"custom"` | Custom or unrecognized provider | +| `UNKNOWN` | `"unknown"` | Provider not reported | + +### QualityGate + +```typescript +import { QualityGate } from '@mosaicstack/telemetry-client'; +``` + +| Member | Value | Description | +|--------|-------|-------------| +| `BUILD` | `"build"` | Code compiles/builds successfully | +| `LINT` | `"lint"` | Linter passes with no errors | +| `TEST` | `"test"` | Unit/integration tests pass | +| `COVERAGE` | `"coverage"` | Code coverage meets threshold (85%) | +| `TYPECHECK` | `"typecheck"` | Type checker passes | +| `SECURITY` | `"security"` | Security scan passes | + +### Outcome + +```typescript +import { Outcome } from '@mosaicstack/telemetry-client'; +``` + +| Member | Value | Description | +|--------|-------|-------------| +| `SUCCESS` | `"success"` | Task completed, all quality gates passed | +| `FAILURE` | `"failure"` | Task failed after all retries | +| `PARTIAL` | `"partial"` | Task partially completed (some gates passed) | +| `TIMEOUT` | `"timeout"` | Task exceeded time or token budget | + +### RepoSizeCategory + +```typescript +import { RepoSizeCategory } from '@mosaicstack/telemetry-client'; +``` + +| Member | Value | Approximate LOC | Description | +|--------|-------|-----------------|-------------| +| `TINY` | `"tiny"` | < 1,000 | Scripts, single-file projects | +| `SMALL` | `"small"` | 1,000–10,000 | Small libraries, tools | +| `MEDIUM` | `"medium"` | 10,000–100,000 | Standard applications | +| `LARGE` | `"large"` | 100,000–1,000,000 | Large applications, monorepos | +| `HUGE` | `"huge"` | > 1,000,000 | Enterprise codebases | + +--- + +## Server API Endpoints Used + +The SDK communicates with these Mosaic Telemetry API v1 endpoints: + +| SDK Method | HTTP Endpoint | Auth Required | +|------------|---------------|---------------| +| `flush()` (internal) | `POST /v1/events/batch` | Yes (Bearer token) | +| `refreshPredictions()` | `POST /v1/predictions/batch` | No (public) | + +For the full server API specification, see the [Mosaic Telemetry API Reference](https://tel-api.mosaicstack.dev/v1/docs). diff --git a/docs/integration-guide.md b/docs/integration-guide.md new file mode 100644 index 0000000..aa5208c --- /dev/null +++ b/docs/integration-guide.md @@ -0,0 +1,405 @@ +# Integration Guide + +This guide covers how to integrate `@mosaicstack/telemetry-client` into your applications. The SDK targets **Mosaic Telemetry API v1** (event schema version `1.0`). + +## Prerequisites + +- Node.js >= 18 (for native `fetch` and `crypto.randomUUID()`) +- A Mosaic Telemetry API key and instance ID (issued by an administrator via the admin API) + +## Installation + +```bash +npm install @mosaicstack/telemetry-client +``` + +The package ships ESM-only with TypeScript declarations. Zero runtime dependencies. + +## Environment Setup + +Store your credentials in environment variables — never hardcode them. + +```bash +# .env (not committed — add to .gitignore) +TELEMETRY_API_URL=https://tel-api.mosaicstack.dev +TELEMETRY_API_KEY=msk_your_api_key_here +TELEMETRY_INSTANCE_ID=a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d +``` + +```bash +# .env.example (committed — documents required variables) +TELEMETRY_API_URL=https://tel-api.mosaicstack.dev +TELEMETRY_API_KEY=your-api-key +TELEMETRY_INSTANCE_ID=your-instance-uuid +``` + +--- + +## Instrumenting a Next.js App + +Next.js server actions and API routes run on Node.js, so the SDK works directly. Create a shared singleton and track events from your server-side code. + +### 1. Create a telemetry singleton + +```typescript +// lib/telemetry.ts +import { + TelemetryClient, + TaskType, + Complexity, + Harness, + Provider, + Outcome, + QualityGate, +} from '@mosaicstack/telemetry-client'; + +let client: TelemetryClient | null = null; + +export function getTelemetryClient(): TelemetryClient { + if (!client) { + client = new TelemetryClient({ + serverUrl: process.env.TELEMETRY_API_URL!, + apiKey: process.env.TELEMETRY_API_KEY!, + instanceId: process.env.TELEMETRY_INSTANCE_ID!, + enabled: process.env.NODE_ENV === 'production', + onError: (err) => console.error('[telemetry]', err.message), + }); + client.start(); + } + return client; +} + +// Re-export enums for convenience +export { TaskType, Complexity, Harness, Provider, Outcome, QualityGate }; +``` + +### 2. Track events from an API route + +```typescript +// app/api/task-complete/route.ts +import { NextResponse } from 'next/server'; +import { getTelemetryClient, TaskType, Complexity, Harness, Provider, Outcome } from '@/lib/telemetry'; + +export async function POST(request: Request) { + const body = await request.json(); + + const client = getTelemetryClient(); + const event = client.eventBuilder.build({ + task_duration_ms: body.durationMs, + task_type: TaskType.IMPLEMENTATION, + complexity: Complexity.MEDIUM, + harness: Harness.CLAUDE_CODE, + model: body.model, + provider: Provider.ANTHROPIC, + estimated_input_tokens: body.estimatedInputTokens, + estimated_output_tokens: body.estimatedOutputTokens, + actual_input_tokens: body.actualInputTokens, + actual_output_tokens: body.actualOutputTokens, + estimated_cost_usd_micros: body.estimatedCostMicros, + actual_cost_usd_micros: body.actualCostMicros, + quality_gate_passed: body.qualityGatePassed, + quality_gates_run: body.qualityGatesRun, + quality_gates_failed: body.qualityGatesFailed, + context_compactions: body.contextCompactions, + context_rotations: body.contextRotations, + context_utilization_final: body.contextUtilization, + outcome: Outcome.SUCCESS, + retry_count: 0, + language: 'typescript', + }); + + client.track(event); + + return NextResponse.json({ status: 'queued' }); +} +``` + +### 3. Graceful shutdown + +Next.js doesn't provide a built-in shutdown hook, but you can handle `SIGTERM`: + +```typescript +// instrumentation.ts (Next.js instrumentation file) +export async function register() { + if (process.env.NEXT_RUNTIME === 'nodejs') { + const { getTelemetryClient } = await import('./lib/telemetry'); + + // Ensure the client starts on server boot + getTelemetryClient(); + + // Flush remaining events on shutdown + const shutdown = async () => { + const { getTelemetryClient } = await import('./lib/telemetry'); + const client = getTelemetryClient(); + await client.stop(); + process.exit(0); + }; + + process.on('SIGTERM', shutdown); + process.on('SIGINT', shutdown); + } +} +``` + +--- + +## Instrumenting a Node.js Service + +For a standalone Node.js service (Express, Fastify, plain script, etc.). + +### 1. Initialize and start + +```typescript +// src/telemetry.ts +import { TelemetryClient } from '@mosaicstack/telemetry-client'; + +export const telemetry = new TelemetryClient({ + serverUrl: process.env.TELEMETRY_API_URL ?? 'https://tel-api.mosaicstack.dev', + apiKey: process.env.TELEMETRY_API_KEY!, + instanceId: process.env.TELEMETRY_INSTANCE_ID!, + onError: (err) => console.error('[telemetry]', err.message), +}); + +telemetry.start(); +``` + +### 2. Track events after task completion + +```typescript +// src/task-runner.ts +import { + TaskType, + Complexity, + Harness, + Provider, + Outcome, + QualityGate, +} from '@mosaicstack/telemetry-client'; +import { telemetry } from './telemetry.js'; + +async function runTask() { + const startTime = Date.now(); + + // ... run your AI coding task ... + + const durationMs = Date.now() - startTime; + + const event = telemetry.eventBuilder.build({ + task_duration_ms: durationMs, + task_type: TaskType.IMPLEMENTATION, + complexity: Complexity.HIGH, + harness: Harness.CLAUDE_CODE, + model: 'claude-sonnet-4-5-20250929', + provider: Provider.ANTHROPIC, + estimated_input_tokens: 200000, + estimated_output_tokens: 80000, + actual_input_tokens: 215000, + actual_output_tokens: 72000, + estimated_cost_usd_micros: 1200000, + actual_cost_usd_micros: 1150000, + quality_gate_passed: true, + quality_gates_run: [ + QualityGate.BUILD, + QualityGate.LINT, + QualityGate.TEST, + QualityGate.TYPECHECK, + ], + quality_gates_failed: [], + context_compactions: 3, + context_rotations: 1, + context_utilization_final: 0.85, + outcome: Outcome.SUCCESS, + retry_count: 0, + language: 'typescript', + repo_size_category: 'medium', + }); + + telemetry.track(event); +} +``` + +### 3. Graceful shutdown + +```typescript +// src/main.ts +import { telemetry } from './telemetry.js'; + +async function main() { + // ... your application logic ... + + // On shutdown, flush remaining events + process.on('SIGTERM', async () => { + await telemetry.stop(); + process.exit(0); + }); +} + +main(); +``` + +--- + +## Using Predictions + +The telemetry API provides crowd-sourced predictions for token usage, cost, and duration based on historical data. The SDK caches these predictions locally. + +### Pre-populate the cache + +Call `refreshPredictions()` at startup with the dimension combinations your application uses: + +```typescript +import { TaskType, Provider, Complexity } from '@mosaicstack/telemetry-client'; +import { telemetry } from './telemetry.js'; + +// Fetch predictions for all combinations you'll need +await telemetry.refreshPredictions([ + { task_type: TaskType.IMPLEMENTATION, model: 'claude-sonnet-4-5-20250929', provider: Provider.ANTHROPIC, complexity: Complexity.LOW }, + { task_type: TaskType.IMPLEMENTATION, model: 'claude-sonnet-4-5-20250929', provider: Provider.ANTHROPIC, complexity: Complexity.MEDIUM }, + { task_type: TaskType.IMPLEMENTATION, model: 'claude-sonnet-4-5-20250929', provider: Provider.ANTHROPIC, complexity: Complexity.HIGH }, + { task_type: TaskType.TESTING, model: 'claude-haiku-4-5-20251001', provider: Provider.ANTHROPIC, complexity: Complexity.LOW }, +]); +``` + +### Read cached predictions + +```typescript +const prediction = telemetry.getPrediction({ + task_type: TaskType.IMPLEMENTATION, + model: 'claude-sonnet-4-5-20250929', + provider: Provider.ANTHROPIC, + complexity: Complexity.MEDIUM, +}); + +if (prediction?.prediction) { + const p = prediction.prediction; + console.log('Token predictions (median):', { + inputTokens: p.input_tokens.median, + outputTokens: p.output_tokens.median, + }); + console.log('Cost prediction:', `$${(p.cost_usd_micros.median / 1_000_000).toFixed(2)}`); + console.log('Duration prediction:', `${(p.duration_ms.median / 1000).toFixed(0)}s`); + console.log('Correction factors:', { + input: p.correction_factors.input, // >1.0 means estimates tend to be too low + output: p.correction_factors.output, + }); + console.log('Quality:', { + gatePassRate: `${(p.quality.gate_pass_rate * 100).toFixed(0)}%`, + successRate: `${(p.quality.success_rate * 100).toFixed(0)}%`, + }); + + // Check confidence level + if (prediction.metadata.confidence === 'low') { + console.warn('Low confidence — small sample size or fallback was applied'); + } +} +``` + +### Understand fallback behavior + +When the server doesn't have enough data for an exact match, it broadens the query by dropping dimensions (e.g., ignoring complexity). The `metadata` fields tell you what happened: + +| `fallback_level` | Meaning | +|-------------------|---------| +| `0` | Exact match on all dimensions | +| `1+` | Some dimensions were dropped to find data | +| `-1` | No prediction data available at any level | + +--- + +## Environment-Specific Configuration + +### Development + +```typescript +const client = new TelemetryClient({ + serverUrl: 'http://localhost:8000', // Local dev server + apiKey: process.env.TELEMETRY_API_KEY!, + instanceId: process.env.TELEMETRY_INSTANCE_ID!, + dryRun: true, // Don't send real data + submitIntervalMs: 10_000, // Flush more frequently for debugging + onError: (err) => console.error('[telemetry]', err), +}); +``` + +### Production + +```typescript +const client = new TelemetryClient({ + serverUrl: 'https://tel-api.mosaicstack.dev', + apiKey: process.env.TELEMETRY_API_KEY!, + instanceId: process.env.TELEMETRY_INSTANCE_ID!, + submitIntervalMs: 300_000, // 5 min (default) + maxRetries: 3, // Retry on transient failures + onError: (err) => { + // Route to your observability stack + logger.error('Telemetry submission failed', { error: err.message }); + }, +}); +``` + +### Conditional enable/disable + +```typescript +const client = new TelemetryClient({ + serverUrl: process.env.TELEMETRY_API_URL!, + apiKey: process.env.TELEMETRY_API_KEY!, + instanceId: process.env.TELEMETRY_INSTANCE_ID!, + enabled: process.env.TELEMETRY_ENABLED !== 'false', // Opt-out via env var +}); +``` + +When `enabled` is `false`, `track()` returns immediately without queuing. + +--- + +## Error Handling + +The SDK is designed to never disrupt your application: + +- **`track()` never throws.** All errors are caught and routed to the `onError` callback. +- **Failed batches are re-queued.** If a submission fails, events are prepended back to the queue for the next flush cycle. +- **Exponential backoff with jitter.** Retries use 1s base delay, doubling up to 60s, with random jitter to prevent thundering herd. +- **`Retry-After` header support.** On HTTP 429 (rate limited), the SDK respects the server's `Retry-After` header. +- **HTTP 403 is not retried.** An API key / instance ID mismatch is a permanent error. + +### Custom error handling + +```typescript +const client = new TelemetryClient({ + // ... + onError: (error) => { + if (error.message.includes('HTTP 403')) { + console.error('Telemetry auth failed — check API key and instance ID'); + } else if (error.message.includes('HTTP 429')) { + console.warn('Telemetry rate limited — events will be retried'); + } else { + console.error('Telemetry error:', error.message); + } + }, +}); +``` + +--- + +## Batch Submission Behavior + +The SDK batches events for efficiency: + +1. `track(event)` adds the event to an in-memory queue (bounded, FIFO eviction at capacity). +2. Every `submitIntervalMs` (default: 5 minutes), the background timer drains the queue in batches of up to `batchSize` (default/max: 100). +3. Each batch is POSTed to `POST /v1/events/batch` with exponential backoff on failure. +4. Calling `stop()` flushes all remaining events before resolving. + +The server accepts up to **100 events per batch** and supports **partial success** — some events may be accepted while others (e.g., duplicates) are rejected. + +--- + +## API Version Compatibility + +| SDK Version | API Version | Schema Version | +|-------------|-------------|----------------| +| 0.1.x | v1 (`/v1/` endpoints) | `1.0` | + +The `EventBuilder` automatically sets `schema_version: "1.0"` on every event. The SDK submits to `/v1/events/batch` and queries `/v1/predictions/batch`. + +When the telemetry API introduces a v2, this SDK will add support in a new major release. The server supports two API versions simultaneously during a 6-month deprecation window.