telemetry-client-py/docs/api-reference.md

# SDK API Reference

Complete reference for all public classes, methods, types, and enums exported by `mosaicstack-telemetry`.

**SDK version:** 0.1.0
**Telemetry API version:** v1
**Event schema version:** 1.0

---

## TelemetryClient

`mosaicstack_telemetry.TelemetryClient`

The main entry point for the SDK. Manages event queuing, background submission, and prediction caching.

### Constructor

```python
TelemetryClient(config: TelemetryConfig)
```

Validates the config on construction. If validation errors are found and telemetry is enabled, warnings are logged (but the client is still created).

### Methods

#### `start() -> None`

Start background event submission using a threading-based loop. Spawns a daemon thread that flushes the queue every `config.submit_interval_seconds`.

No-op if `config.enabled` is `False`.

#### `await start_async() -> None`

Start background event submission using an asyncio task. Creates an `asyncio.Task` that flushes the queue periodically.

No-op if `config.enabled` is `False`.

#### `stop() -> None`

Stop the sync background submitter. Performs a final flush of all remaining queued events before returning. Safe to call if not started.

#### `await stop_async() -> None`

Stop the async background submitter. Performs a final flush of all remaining queued events before returning. Safe to call if not started.

#### `track(event: TaskCompletionEvent) -> None`

Queue an event for submission. **Always synchronous.** **Never blocks.** **Never throws.**

If telemetry is disabled, the event is silently dropped. If the queue is full, the oldest event is evicted. Any unexpected error is caught and logged.

This method is thread-safe and can be called from any thread or coroutine.

#### `get_prediction(query: PredictionQuery) -> PredictionResponse | None`

Return a cached prediction for the given query, or `None` if not cached or expired.

#### `refresh_predictions_sync(queries: list[PredictionQuery]) -> None`

Fetch predictions from the server synchronously using `POST /v1/predictions/batch`. Results are stored in the internal prediction cache.

No-op if `queries` is empty.

#### `await refresh_predictions(queries: list[PredictionQuery]) -> None`

Fetch predictions from the server asynchronously using `POST /v1/predictions/batch`. Results are stored in the internal prediction cache.

No-op if `queries` is empty.

### Properties

#### `queue_size: int`

Number of events currently in the in-memory queue.

#### `is_running: bool`

Whether the background submitter (sync or async) is currently active.

### Context Managers

```python
# Sync: calls start() on entry, stop() on exit
with TelemetryClient(config) as client:
    client.track(event)

# Async: calls start_async() on entry, stop_async() on exit
async with TelemetryClient(config) as client:
    client.track(event)
```

---

## TelemetryConfig

`mosaicstack_telemetry.TelemetryConfig`

A dataclass holding all configuration for the telemetry client. Supports environment variable overrides.

### Fields

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `server_url` | `str` | `""` | Telemetry API base URL. Trailing slashes are stripped. |
| `api_key` | `str` | `""` | 64-character hex API key for Bearer token auth. |
| `instance_id` | `str` | `""` | UUID string identifying this Mosaic Stack instance. |
| `enabled` | `bool` | `True` | Master switch for telemetry. When `False`, `track()` is a no-op and no background threads/tasks are created. |
| `submit_interval_seconds` | `float` | `300.0` | Interval between background queue flushes (seconds). |
| `max_queue_size` | `int` | `1000` | Maximum events in the in-memory queue. Older events are evicted when full. |
| `batch_size` | `int` | `100` | Events per HTTP batch request. Server maximum is 100. |
| `request_timeout_seconds` | `float` | `10.0` | HTTP request timeout (seconds). |
| `prediction_cache_ttl_seconds` | `float` | `21600.0` | Prediction cache time-to-live (seconds). Default: 6 hours. |
| `dry_run` | `bool` | `False` | When `True`, batches are logged but not sent to the server. |
| `max_retries` | `int` | `3` | Maximum retry attempts for transient failures. |
| `user_agent` | `str` | `"mosaicstack-telemetry-python/0.1.0"` | User-Agent header sent with all requests. |

### Environment Variables

These are read during `__post_init__` and only apply when the corresponding constructor field is empty/default:

| Env Var | Field | Notes |
|---------|-------|-------|
| `MOSAIC_TELEMETRY_ENABLED` | `enabled` | Accepts `1`, `true`, `yes` (case-insensitive) as truthy. Always overrides the constructor value. |
| `MOSAIC_TELEMETRY_SERVER_URL` | `server_url` | Only used if `server_url` is not set in the constructor. |
| `MOSAIC_TELEMETRY_API_KEY` | `api_key` | Only used if `api_key` is not set in the constructor. |
| `MOSAIC_TELEMETRY_INSTANCE_ID` | `instance_id` | Only used if `instance_id` is not set in the constructor. |

### Methods

#### `validate() -> list[str]`

Validate the configuration and return a list of error messages. Returns an empty list if valid.

Checks performed:
- `server_url` is non-empty and starts with `http://` or `https://`
- `api_key` is a 64-character hex string
- `instance_id` is a valid UUID string
- `submit_interval_seconds` is positive
- `max_queue_size` is positive
- `batch_size` is between 1 and 100
- `request_timeout_seconds` is positive

---

## EventBuilder

`mosaicstack_telemetry.EventBuilder`

Fluent builder for constructing `TaskCompletionEvent` instances with a chainable API and sensible defaults.

### Constructor

```python
EventBuilder(instance_id: str | UUID)
```

### Setter Methods

All setter methods return `self` for chaining.

| Method | Parameter Type | Sets Field | Default |
|--------|---------------|-----------|---------|
| `.event_id(value)` | `str \| UUID` | `event_id` | Auto-generated UUIDv4 |
| `.timestamp(value)` | `datetime` | `timestamp` | `datetime.now(UTC)` |
| `.task_type(value)` | `TaskType` | `task_type` | `TaskType.UNKNOWN` |
| `.complexity_level(value)` | `Complexity` | `complexity` | `Complexity.MEDIUM` |
| `.harness_type(value)` | `Harness` | `harness` | `Harness.UNKNOWN` |
| `.model(value)` | `str` | `model` | `"unknown"` |
| `.provider(value)` | `Provider` | `provider` | `Provider.UNKNOWN` |
| `.duration_ms(value)` | `int` | `task_duration_ms` | `0` |
| `.outcome_value(value)` | `Outcome` | `outcome` | `Outcome.FAILURE` |
| `.retry_count(value)` | `int` | `retry_count` | `0` |
| `.language(value)` | `str \| None` | `language` | `None` |
| `.repo_size(value)` | `RepoSizeCategory \| None` | `repo_size_category` | `None` |

#### `.tokens(*, estimated_in, estimated_out, actual_in, actual_out) -> EventBuilder`

Set all four token count fields. All parameters are keyword-only integers (default `0`).

#### `.cost(*, estimated, actual) -> EventBuilder`

Set estimated and actual cost in microdollars. Both are keyword-only integers (default `0`).

#### `.quality(*, passed, gates_run=None, gates_failed=None) -> EventBuilder`

Set quality gate results. `passed` is required. `gates_run` and `gates_failed` are optional lists of `QualityGate` values.

#### `.context(*, compactions=0, rotations=0, utilization=0.0) -> EventBuilder`

Set context window metrics. All parameters are keyword-only with defaults.

#### `.build() -> TaskCompletionEvent`

Construct and return the `TaskCompletionEvent`. The builder can be reused after calling `.build()`.

---

## TaskCompletionEvent

`mosaicstack_telemetry.TaskCompletionEvent`

Pydantic model representing a single telemetry event. Matches the server's v1 event schema (version 1.0).

### Fields

| Field | Type | Required | Constraints | Description |
|-------|------|----------|-------------|-------------|
| `instance_id` | `UUID` | Yes | Valid UUID | Mosaic Stack installation identifier |
| `event_id` | `UUID` | No | Valid UUID | Unique event identifier (default: auto-generated) |
| `schema_version` | `str` | No | -- | Event schema version (default: `"1.0"`) |
| `timestamp` | `datetime` | No | -- | When the task completed (default: now UTC) |
| `task_duration_ms` | `int` | Yes | 0--86,400,000 | Task wall-clock time in milliseconds |
| `task_type` | `TaskType` | Yes | Enum value | Type of work performed |
| `complexity` | `Complexity` | Yes | Enum value | Task complexity level |
| `harness` | `Harness` | Yes | Enum value | Coding tool / execution environment |
| `model` | `str` | Yes | 1--100 chars | Model identifier |
| `provider` | `Provider` | Yes | Enum value | LLM provider |
| `estimated_input_tokens` | `int` | Yes | 0--10,000,000 | Pre-task input token estimate |
| `estimated_output_tokens` | `int` | Yes | 0--10,000,000 | Pre-task output token estimate |
| `actual_input_tokens` | `int` | Yes | 0--10,000,000 | Actual input tokens consumed |
| `actual_output_tokens` | `int` | Yes | 0--10,000,000 | Actual output tokens generated |
| `estimated_cost_usd_micros` | `int` | Yes | 0--100,000,000 | Estimated cost in microdollars |
| `actual_cost_usd_micros` | `int` | Yes | 0--100,000,000 | Actual cost in microdollars |
| `quality_gate_passed` | `bool` | Yes | -- | Whether all quality gates passed |
| `quality_gates_run` | `list[QualityGate]` | No | -- | Gates executed (default: `[]`) |
| `quality_gates_failed` | `list[QualityGate]` | No | -- | Gates that failed (default: `[]`) |
| `context_compactions` | `int` | Yes | 0--100 | Context compaction events during task |
| `context_rotations` | `int` | Yes | 0--50 | Agent session rotations during task |
| `context_utilization_final` | `float` | Yes | 0.0--1.0 | Final context usage ratio |
| `outcome` | `Outcome` | Yes | Enum value | Final task result |
| `retry_count` | `int` | Yes | 0--20 | Retries before final outcome |
| `language` | `str \| None` | No | Max 30 chars | Primary programming language |
| `repo_size_category` | `RepoSizeCategory \| None` | No | Enum value | Repository size bucket |

---

## Prediction Types

### PredictionQuery

`mosaicstack_telemetry.PredictionQuery`

Query parameters for fetching a prediction.

| Field | Type | Description |
|-------|------|-------------|
| `task_type` | `TaskType` | Task type to predict for |
| `model` | `str` | Model identifier |
| `provider` | `Provider` | LLM provider |
| `complexity` | `Complexity` | Complexity level |

### PredictionResponse

`mosaicstack_telemetry.PredictionResponse`

Response from the prediction endpoint.

| Field | Type | Description |
|-------|------|-------------|
| `prediction` | `PredictionData \| None` | Prediction data, or `None` if no data is available |
| `metadata` | `PredictionMetadata` | Metadata about how the prediction was generated |

### PredictionData

`mosaicstack_telemetry.PredictionData`

| Field | Type | Description |
|-------|------|-------------|
| `input_tokens` | `TokenDistribution` | Input token distribution (p10/p25/median/p75/p90) |
| `output_tokens` | `TokenDistribution` | Output token distribution (p10/p25/median/p75/p90) |
| `cost_usd_micros` | `dict[str, int]` | `{"median": <value>}` -- median cost in microdollars |
| `duration_ms` | `dict[str, int]` | `{"median": <value>}` -- median duration in milliseconds |
| `correction_factors` | `CorrectionFactors` | Actual-to-estimated token ratios |
| `quality` | `QualityPrediction` | Gate pass rate and success rate |

### TokenDistribution

`mosaicstack_telemetry.TokenDistribution`

| Field | Type | Description |
|-------|------|-------------|
| `p10` | `int` | 10th percentile |
| `p25` | `int` | 25th percentile |
| `median` | `int` | 50th percentile (median) |
| `p75` | `int` | 75th percentile |
| `p90` | `int` | 90th percentile |

### CorrectionFactors

`mosaicstack_telemetry.CorrectionFactors`

| Field | Type | Description |
|-------|------|-------------|
| `input` | `float` | Ratio of actual to estimated input tokens (>1.0 = estimates too low) |
| `output` | `float` | Ratio of actual to estimated output tokens |

### QualityPrediction

`mosaicstack_telemetry.QualityPrediction`

| Field | Type | Description |
|-------|------|-------------|
| `gate_pass_rate` | `float` | Fraction of tasks where all quality gates passed |
| `success_rate` | `float` | Fraction of tasks with `outcome: success` |

### PredictionMetadata

`mosaicstack_telemetry.PredictionMetadata`

| Field | Type | Description |
|-------|------|-------------|
| `sample_size` | `int` | Number of events used to compute the prediction |
| `fallback_level` | `int` | `0` = exact match, `1+` = dimensions dropped, `-1` = no data |
| `confidence` | `str` | `"high"`, `"medium"`, `"low"`, or `"none"` |
| `last_updated` | `datetime \| None` | When the prediction was last computed |
| `dimensions_matched` | `dict[str, str \| None] \| None` | Which dimensions matched (`None` values = fallback) |
| `fallback_note` | `str \| None` | Explanation when fallback was used |
| `cache_hit` | `bool` | Whether the server served from its cache |

---

## Batch Types

### BatchEventRequest

`mosaicstack_telemetry.BatchEventRequest`

Request body for `POST /v1/events/batch`. Used internally by the submitter.

| Field | Type | Constraints | Description |
|-------|------|-------------|-------------|
| `events` | `list[TaskCompletionEvent]` | 1--100 items | Events to submit |

### BatchEventResponse

`mosaicstack_telemetry.BatchEventResponse`

Response from the batch event endpoint.

| Field | Type | Description |
|-------|------|-------------|
| `accepted` | `int` | Count of accepted events |
| `rejected` | `int` | Count of rejected events |
| `results` | `list[BatchEventResult]` | Per-event results |

### BatchEventResult

`mosaicstack_telemetry.BatchEventResult`

| Field | Type | Description |
|-------|------|-------------|
| `event_id` | `UUID` | The event's unique identifier |
| `status` | `str` | `"accepted"` or `"rejected"` |
| `error` | `str \| None` | Error message if rejected |

---

## Enumerations

All enums use `str, Enum` mixin (Python 3.10 compatible). Their `.value` is the lowercase string sent to the server.

### TaskType

`mosaicstack_telemetry.TaskType`

| Member | Value | Description |
|--------|-------|-------------|
| `PLANNING` | `"planning"` | Architecture design, task breakdown |
| `IMPLEMENTATION` | `"implementation"` | Writing new code |
| `CODE_REVIEW` | `"code_review"` | Reviewing existing code |
| `TESTING` | `"testing"` | Writing or running tests |
| `DEBUGGING` | `"debugging"` | Investigating and fixing bugs |
| `REFACTORING` | `"refactoring"` | Restructuring existing code |
| `DOCUMENTATION` | `"documentation"` | Writing docs, comments, READMEs |
| `CONFIGURATION` | `"configuration"` | Config files, CI/CD, infrastructure |
| `SECURITY_AUDIT` | `"security_audit"` | Security review, vulnerability analysis |
| `UNKNOWN` | `"unknown"` | Unclassified task type (fallback) |

### Complexity

`mosaicstack_telemetry.Complexity`

| Member | Value | Description |
|--------|-------|-------------|
| `LOW` | `"low"` | Simple fixes, typos, config changes |
| `MEDIUM` | `"medium"` | Standard features, moderate logic |
| `HIGH` | `"high"` | Complex features, multi-file changes |
| `CRITICAL` | `"critical"` | Major refactoring, architectural changes |

### Harness

`mosaicstack_telemetry.Harness`

| Member | Value | Description |
|--------|-------|-------------|
| `CLAUDE_CODE` | `"claude_code"` | Anthropic Claude Code CLI |
| `OPENCODE` | `"opencode"` | OpenCode CLI |
| `KILO_CODE` | `"kilo_code"` | Kilo Code VS Code extension |
| `AIDER` | `"aider"` | Aider AI pair programming |
| `API_DIRECT` | `"api_direct"` | Direct API calls (no harness) |
| `OLLAMA_LOCAL` | `"ollama_local"` | Ollama local inference |
| `CUSTOM` | `"custom"` | Custom or unrecognized harness |
| `UNKNOWN` | `"unknown"` | Harness not reported |

### Provider

`mosaicstack_telemetry.Provider`

| Member | Value | Description |
|--------|-------|-------------|
| `ANTHROPIC` | `"anthropic"` | Anthropic (Claude models) |
| `OPENAI` | `"openai"` | OpenAI (GPT models) |
| `OPENROUTER` | `"openrouter"` | OpenRouter (multi-provider routing) |
| `OLLAMA` | `"ollama"` | Ollama (local/self-hosted) |
| `GOOGLE` | `"google"` | Google (Gemini models) |
| `MISTRAL` | `"mistral"` | Mistral AI |
| `CUSTOM` | `"custom"` | Custom or unrecognized provider |
| `UNKNOWN` | `"unknown"` | Provider not reported |

### Outcome

`mosaicstack_telemetry.Outcome`

| Member | Value | Description |
|--------|-------|-------------|
| `SUCCESS` | `"success"` | Task completed, all quality gates passed |
| `FAILURE` | `"failure"` | Task failed after all retries |
| `PARTIAL` | `"partial"` | Task partially completed |
| `TIMEOUT` | `"timeout"` | Task exceeded time or token budget |

### QualityGate

`mosaicstack_telemetry.QualityGate`

| Member | Value | Description |
|--------|-------|-------------|
| `BUILD` | `"build"` | Code compiles/builds |
| `LINT` | `"lint"` | Linter passes |
| `TEST` | `"test"` | Tests pass |
| `COVERAGE` | `"coverage"` | Coverage meets threshold |
| `TYPECHECK` | `"typecheck"` | Type checker passes |
| `SECURITY` | `"security"` | Security scan passes |

### RepoSizeCategory

`mosaicstack_telemetry.RepoSizeCategory`

| Member | Value | Approximate LOC | Description |
|--------|-------|----------------|-------------|
| `TINY` | `"tiny"` | < 1,000 | Scripts, single-file projects |
| `SMALL` | `"small"` | 1,000--10,000 | Small libraries, tools |
| `MEDIUM` | `"medium"` | 10,000--100,000 | Standard applications |
| `LARGE` | `"large"` | 100,000--1,000,000 | Large applications, monorepos |
| `HUGE` | `"huge"` | > 1,000,000 | Enterprise codebases |

---

## Exceptions

### TelemetryError

`mosaicstack_telemetry.TelemetryError`

Base exception for telemetry client errors. Extends `Exception`. Currently unused by the public API (since `track()` never throws), but available for custom error handling in integrations.

---

## Internal Components

These are exported for advanced use cases but are managed automatically by `TelemetryClient`.

### EventQueue

`mosaicstack_telemetry.EventQueue`

Thread-safe bounded FIFO queue. When full, oldest events are evicted.

- `EventQueue(max_size: int = 1000)`
- `put(event: TaskCompletionEvent) -> None` -- Add event, evict oldest if full
- `drain(max_items: int) -> list[TaskCompletionEvent]` -- Remove and return up to N events
- `put_back(events: list[TaskCompletionEvent]) -> None` -- Re-queue events at the front (for retries)
- `size: int` -- Current queue length
- `is_empty: bool` -- Whether the queue is empty

### PredictionCache

`mosaicstack_telemetry.PredictionCache`

Thread-safe dict-based cache with TTL expiration.

- `PredictionCache(ttl_seconds: float = 21600.0)`
- `get(query: PredictionQuery) -> PredictionResponse | None` -- Get cached prediction
- `put(query: PredictionQuery, response: PredictionResponse) -> None` -- Store prediction
- `clear() -> None` -- Invalidate all entries
- `size: int` -- Number of entries (including possibly expired)