docs(#1): SDK integration guide, API reference, and CI pipeline

- Rewrite README with quick start, FastAPI snippet, async/sync patterns, config reference with env vars, and API version targeting (v1, schema 1.0) - Add docs/integration-guide.md with full FastAPI and generic Python integration examples, environment-specific config, prediction queries, error handling, and dry-run mode documentation - Add docs/api-reference.md covering all exported classes, methods, Pydantic models, enums (TaskType, Complexity, Harness, Provider, QualityGate, Outcome, RepoSizeCategory), and internal components - Add Woodpecker CI pipeline (.woodpecker.yml) with quality gates: lint, format check, typecheck, bandit security scan, pip-audit, and pytest with 85% coverage gate - Add bandit and pip-audit to dev dependencies Fixes #1 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 22:39:19 -06:00
parent f02207e33c
commit 883fd4d60f
7 changed files with 1807 additions and 100 deletions
--- a/docs/api-reference.md
+++ b/docs/api-reference.md
@@ -0,0 +1,496 @@
+# SDK API Reference
+
+Complete reference for all public classes, methods, types, and enums exported by `mosaicstack-telemetry`.
+
+**SDK version:** 0.1.0
+**Telemetry API version:** v1
+**Event schema version:** 1.0
+
+---
+
+## TelemetryClient
+
+`mosaicstack_telemetry.TelemetryClient`
+
+The main entry point for the SDK. Manages event queuing, background submission, and prediction caching.
+
+### Constructor
+
+```python
+TelemetryClient(config: TelemetryConfig)
+```
+
+Validates the config on construction. If validation errors are found and telemetry is enabled, warnings are logged (but the client is still created).
+
+### Methods
+
+#### `start() -> None`
+
+Start background event submission using a threading-based loop. Spawns a daemon thread that flushes the queue every `config.submit_interval_seconds`.
+
+No-op if `config.enabled` is `False`.
+
+#### `await start_async() -> None`
+
+Start background event submission using an asyncio task. Creates an `asyncio.Task` that flushes the queue periodically.
+
+No-op if `config.enabled` is `False`.
+
+#### `stop() -> None`
+
+Stop the sync background submitter. Performs a final flush of all remaining queued events before returning. Safe to call if not started.
+
+#### `await stop_async() -> None`
+
+Stop the async background submitter. Performs a final flush of all remaining queued events before returning. Safe to call if not started.
+
+#### `track(event: TaskCompletionEvent) -> None`
+
+Queue an event for submission. **Always synchronous.** **Never blocks.** **Never throws.**
+
+If telemetry is disabled, the event is silently dropped. If the queue is full, the oldest event is evicted. Any unexpected error is caught and logged.
+
+This method is thread-safe and can be called from any thread or coroutine.
+
+#### `get_prediction(query: PredictionQuery) -> PredictionResponse | None`
+
+Return a cached prediction for the given query, or `None` if not cached or expired.
+
+#### `refresh_predictions_sync(queries: list[PredictionQuery]) -> None`
+
+Fetch predictions from the server synchronously using `POST /v1/predictions/batch`. Results are stored in the internal prediction cache.
+
+No-op if `queries` is empty.
+
+#### `await refresh_predictions(queries: list[PredictionQuery]) -> None`
+
+Fetch predictions from the server asynchronously using `POST /v1/predictions/batch`. Results are stored in the internal prediction cache.
+
+No-op if `queries` is empty.
+
+### Properties
+
+#### `queue_size: int`
+
+Number of events currently in the in-memory queue.
+
+#### `is_running: bool`
+
+Whether the background submitter (sync or async) is currently active.
+
+### Context Managers
+
+```python
+# Sync: calls start() on entry, stop() on exit
+with TelemetryClient(config) as client:
+    client.track(event)
+
+# Async: calls start_async() on entry, stop_async() on exit
+async with TelemetryClient(config) as client:
+    client.track(event)
+```
+
+---
+
+## TelemetryConfig
+
+`mosaicstack_telemetry.TelemetryConfig`
+
+A dataclass holding all configuration for the telemetry client. Supports environment variable overrides.
+
+### Fields
+
+| Field | Type | Default | Description |
+|-------|------|---------|-------------|
+| `server_url` | `str` | `""` | Telemetry API base URL. Trailing slashes are stripped. |
+| `api_key` | `str` | `""` | 64-character hex API key for Bearer token auth. |
+| `instance_id` | `str` | `""` | UUID string identifying this Mosaic Stack instance. |
+| `enabled` | `bool` | `True` | Master switch for telemetry. When `False`, `track()` is a no-op and no background threads/tasks are created. |
+| `submit_interval_seconds` | `float` | `300.0` | Interval between background queue flushes (seconds). |
+| `max_queue_size` | `int` | `1000` | Maximum events in the in-memory queue. Older events are evicted when full. |
+| `batch_size` | `int` | `100` | Events per HTTP batch request. Server maximum is 100. |
+| `request_timeout_seconds` | `float` | `10.0` | HTTP request timeout (seconds). |
+| `prediction_cache_ttl_seconds` | `float` | `21600.0` | Prediction cache time-to-live (seconds). Default: 6 hours. |
+| `dry_run` | `bool` | `False` | When `True`, batches are logged but not sent to the server. |
+| `max_retries` | `int` | `3` | Maximum retry attempts for transient failures. |
+| `user_agent` | `str` | `"mosaicstack-telemetry-python/0.1.0"` | User-Agent header sent with all requests. |
+
+### Environment Variables
+
+These are read during `__post_init__` and only apply when the corresponding constructor field is empty/default:
+
+| Env Var | Field | Notes |
+|---------|-------|-------|
+| `MOSAIC_TELEMETRY_ENABLED` | `enabled` | Accepts `1`, `true`, `yes` (case-insensitive) as truthy. Always overrides the constructor value. |
+| `MOSAIC_TELEMETRY_SERVER_URL` | `server_url` | Only used if `server_url` is not set in the constructor. |
+| `MOSAIC_TELEMETRY_API_KEY` | `api_key` | Only used if `api_key` is not set in the constructor. |
+| `MOSAIC_TELEMETRY_INSTANCE_ID` | `instance_id` | Only used if `instance_id` is not set in the constructor. |
+
+### Methods
+
+#### `validate() -> list[str]`
+
+Validate the configuration and return a list of error messages. Returns an empty list if valid.
+
+Checks performed:
+- `server_url` is non-empty and starts with `http://` or `https://`
+- `api_key` is a 64-character hex string
+- `instance_id` is a valid UUID string
+- `submit_interval_seconds` is positive
+- `max_queue_size` is positive
+- `batch_size` is between 1 and 100
+- `request_timeout_seconds` is positive
+
+---
+
+## EventBuilder
+
+`mosaicstack_telemetry.EventBuilder`
+
+Fluent builder for constructing `TaskCompletionEvent` instances with a chainable API and sensible defaults.
+
+### Constructor
+
+```python
+EventBuilder(instance_id: str | UUID)
+```
+
+### Setter Methods
+
+All setter methods return `self` for chaining.
+
+| Method | Parameter Type | Sets Field | Default |
+|--------|---------------|-----------|---------|
+| `.event_id(value)` | `str \| UUID` | `event_id` | Auto-generated UUIDv4 |
+| `.timestamp(value)` | `datetime` | `timestamp` | `datetime.now(UTC)` |
+| `.task_type(value)` | `TaskType` | `task_type` | `TaskType.UNKNOWN` |
+| `.complexity_level(value)` | `Complexity` | `complexity` | `Complexity.MEDIUM` |
+| `.harness_type(value)` | `Harness` | `harness` | `Harness.UNKNOWN` |
+| `.model(value)` | `str` | `model` | `"unknown"` |
+| `.provider(value)` | `Provider` | `provider` | `Provider.UNKNOWN` |
+| `.duration_ms(value)` | `int` | `task_duration_ms` | `0` |
+| `.outcome_value(value)` | `Outcome` | `outcome` | `Outcome.FAILURE` |
+| `.retry_count(value)` | `int` | `retry_count` | `0` |
+| `.language(value)` | `str \| None` | `language` | `None` |
+| `.repo_size(value)` | `RepoSizeCategory \| None` | `repo_size_category` | `None` |
+
+#### `.tokens(*, estimated_in, estimated_out, actual_in, actual_out) -> EventBuilder`
+
+Set all four token count fields. All parameters are keyword-only integers (default `0`).
+
+#### `.cost(*, estimated, actual) -> EventBuilder`
+
+Set estimated and actual cost in microdollars. Both are keyword-only integers (default `0`).
+
+#### `.quality(*, passed, gates_run=None, gates_failed=None) -> EventBuilder`
+
+Set quality gate results. `passed` is required. `gates_run` and `gates_failed` are optional lists of `QualityGate` values.
+
+#### `.context(*, compactions=0, rotations=0, utilization=0.0) -> EventBuilder`
+
+Set context window metrics. All parameters are keyword-only with defaults.
+
+#### `.build() -> TaskCompletionEvent`
+
+Construct and return the `TaskCompletionEvent`. The builder can be reused after calling `.build()`.
+
+---
+
+## TaskCompletionEvent
+
+`mosaicstack_telemetry.TaskCompletionEvent`
+
+Pydantic model representing a single telemetry event. Matches the server's v1 event schema (version 1.0).
+
+### Fields
+
+| Field | Type | Required | Constraints | Description |
+|-------|------|----------|-------------|-------------|
+| `instance_id` | `UUID` | Yes | Valid UUID | Mosaic Stack installation identifier |
+| `event_id` | `UUID` | No | Valid UUID | Unique event identifier (default: auto-generated) |
+| `schema_version` | `str` | No | -- | Event schema version (default: `"1.0"`) |
+| `timestamp` | `datetime` | No | -- | When the task completed (default: now UTC) |
+| `task_duration_ms` | `int` | Yes | 0--86,400,000 | Task wall-clock time in milliseconds |
+| `task_type` | `TaskType` | Yes | Enum value | Type of work performed |
+| `complexity` | `Complexity` | Yes | Enum value | Task complexity level |
+| `harness` | `Harness` | Yes | Enum value | Coding tool / execution environment |
+| `model` | `str` | Yes | 1--100 chars | Model identifier |
+| `provider` | `Provider` | Yes | Enum value | LLM provider |
+| `estimated_input_tokens` | `int` | Yes | 0--10,000,000 | Pre-task input token estimate |
+| `estimated_output_tokens` | `int` | Yes | 0--10,000,000 | Pre-task output token estimate |
+| `actual_input_tokens` | `int` | Yes | 0--10,000,000 | Actual input tokens consumed |
+| `actual_output_tokens` | `int` | Yes | 0--10,000,000 | Actual output tokens generated |
+| `estimated_cost_usd_micros` | `int` | Yes | 0--100,000,000 | Estimated cost in microdollars |
+| `actual_cost_usd_micros` | `int` | Yes | 0--100,000,000 | Actual cost in microdollars |
+| `quality_gate_passed` | `bool` | Yes | -- | Whether all quality gates passed |
+| `quality_gates_run` | `list[QualityGate]` | No | -- | Gates executed (default: `[]`) |
+| `quality_gates_failed` | `list[QualityGate]` | No | -- | Gates that failed (default: `[]`) |
+| `context_compactions` | `int` | Yes | 0--100 | Context compaction events during task |
+| `context_rotations` | `int` | Yes | 0--50 | Agent session rotations during task |
+| `context_utilization_final` | `float` | Yes | 0.0--1.0 | Final context usage ratio |
+| `outcome` | `Outcome` | Yes | Enum value | Final task result |
+| `retry_count` | `int` | Yes | 0--20 | Retries before final outcome |
+| `language` | `str \| None` | No | Max 30 chars | Primary programming language |
+| `repo_size_category` | `RepoSizeCategory \| None` | No | Enum value | Repository size bucket |
+
+---
+
+## Prediction Types
+
+### PredictionQuery
+
+`mosaicstack_telemetry.PredictionQuery`
+
+Query parameters for fetching a prediction.
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `task_type` | `TaskType` | Task type to predict for |
+| `model` | `str` | Model identifier |
+| `provider` | `Provider` | LLM provider |
+| `complexity` | `Complexity` | Complexity level |
+
+### PredictionResponse
+
+`mosaicstack_telemetry.PredictionResponse`
+
+Response from the prediction endpoint.
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `prediction` | `PredictionData \| None` | Prediction data, or `None` if no data is available |
+| `metadata` | `PredictionMetadata` | Metadata about how the prediction was generated |
+
+### PredictionData
+
+`mosaicstack_telemetry.PredictionData`
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `input_tokens` | `TokenDistribution` | Input token distribution (p10/p25/median/p75/p90) |
+| `output_tokens` | `TokenDistribution` | Output token distribution (p10/p25/median/p75/p90) |
+| `cost_usd_micros` | `dict[str, int]` | `{"median": <value>}` -- median cost in microdollars |
+| `duration_ms` | `dict[str, int]` | `{"median": <value>}` -- median duration in milliseconds |
+| `correction_factors` | `CorrectionFactors` | Actual-to-estimated token ratios |
+| `quality` | `QualityPrediction` | Gate pass rate and success rate |
+
+### TokenDistribution
+
+`mosaicstack_telemetry.TokenDistribution`
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `p10` | `int` | 10th percentile |
+| `p25` | `int` | 25th percentile |
+| `median` | `int` | 50th percentile (median) |
+| `p75` | `int` | 75th percentile |
+| `p90` | `int` | 90th percentile |
+
+### CorrectionFactors
+
+`mosaicstack_telemetry.CorrectionFactors`
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `input` | `float` | Ratio of actual to estimated input tokens (>1.0 = estimates too low) |
+| `output` | `float` | Ratio of actual to estimated output tokens |
+
+### QualityPrediction
+
+`mosaicstack_telemetry.QualityPrediction`
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `gate_pass_rate` | `float` | Fraction of tasks where all quality gates passed |
+| `success_rate` | `float` | Fraction of tasks with `outcome: success` |
+
+### PredictionMetadata
+
+`mosaicstack_telemetry.PredictionMetadata`
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `sample_size` | `int` | Number of events used to compute the prediction |
+| `fallback_level` | `int` | `0` = exact match, `1+` = dimensions dropped, `-1` = no data |
+| `confidence` | `str` | `"high"`, `"medium"`, `"low"`, or `"none"` |
+| `last_updated` | `datetime \| None` | When the prediction was last computed |
+| `dimensions_matched` | `dict[str, str \| None] \| None` | Which dimensions matched (`None` values = fallback) |
+| `fallback_note` | `str \| None` | Explanation when fallback was used |
+| `cache_hit` | `bool` | Whether the server served from its cache |
+
+---
+
+## Batch Types
+
+### BatchEventRequest
+
+`mosaicstack_telemetry.BatchEventRequest`
+
+Request body for `POST /v1/events/batch`. Used internally by the submitter.
+
+| Field | Type | Constraints | Description |
+|-------|------|-------------|-------------|
+| `events` | `list[TaskCompletionEvent]` | 1--100 items | Events to submit |
+
+### BatchEventResponse
+
+`mosaicstack_telemetry.BatchEventResponse`
+
+Response from the batch event endpoint.
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `accepted` | `int` | Count of accepted events |
+| `rejected` | `int` | Count of rejected events |
+| `results` | `list[BatchEventResult]` | Per-event results |
+
+### BatchEventResult
+
+`mosaicstack_telemetry.BatchEventResult`
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `event_id` | `UUID` | The event's unique identifier |
+| `status` | `str` | `"accepted"` or `"rejected"` |
+| `error` | `str \| None` | Error message if rejected |
+
+---
+
+## Enumerations
+
+All enums use `str, Enum` mixin (Python 3.10 compatible). Their `.value` is the lowercase string sent to the server.
+
+### TaskType
+
+`mosaicstack_telemetry.TaskType`
+
+| Member | Value | Description |
+|--------|-------|-------------|
+| `PLANNING` | `"planning"` | Architecture design, task breakdown |
+| `IMPLEMENTATION` | `"implementation"` | Writing new code |
+| `CODE_REVIEW` | `"code_review"` | Reviewing existing code |
+| `TESTING` | `"testing"` | Writing or running tests |
+| `DEBUGGING` | `"debugging"` | Investigating and fixing bugs |
+| `REFACTORING` | `"refactoring"` | Restructuring existing code |
+| `DOCUMENTATION` | `"documentation"` | Writing docs, comments, READMEs |
+| `CONFIGURATION` | `"configuration"` | Config files, CI/CD, infrastructure |
+| `SECURITY_AUDIT` | `"security_audit"` | Security review, vulnerability analysis |
+| `UNKNOWN` | `"unknown"` | Unclassified task type (fallback) |
+
+### Complexity
+
+`mosaicstack_telemetry.Complexity`
+
+| Member | Value | Description |
+|--------|-------|-------------|
+| `LOW` | `"low"` | Simple fixes, typos, config changes |
+| `MEDIUM` | `"medium"` | Standard features, moderate logic |
+| `HIGH` | `"high"` | Complex features, multi-file changes |
+| `CRITICAL` | `"critical"` | Major refactoring, architectural changes |
+
+### Harness
+
+`mosaicstack_telemetry.Harness`
+
+| Member | Value | Description |
+|--------|-------|-------------|
+| `CLAUDE_CODE` | `"claude_code"` | Anthropic Claude Code CLI |
+| `OPENCODE` | `"opencode"` | OpenCode CLI |
+| `KILO_CODE` | `"kilo_code"` | Kilo Code VS Code extension |
+| `AIDER` | `"aider"` | Aider AI pair programming |
+| `API_DIRECT` | `"api_direct"` | Direct API calls (no harness) |
+| `OLLAMA_LOCAL` | `"ollama_local"` | Ollama local inference |
+| `CUSTOM` | `"custom"` | Custom or unrecognized harness |
+| `UNKNOWN` | `"unknown"` | Harness not reported |
+
+### Provider
+
+`mosaicstack_telemetry.Provider`
+
+| Member | Value | Description |
+|--------|-------|-------------|
+| `ANTHROPIC` | `"anthropic"` | Anthropic (Claude models) |
+| `OPENAI` | `"openai"` | OpenAI (GPT models) |
+| `OPENROUTER` | `"openrouter"` | OpenRouter (multi-provider routing) |
+| `OLLAMA` | `"ollama"` | Ollama (local/self-hosted) |
+| `GOOGLE` | `"google"` | Google (Gemini models) |
+| `MISTRAL` | `"mistral"` | Mistral AI |
+| `CUSTOM` | `"custom"` | Custom or unrecognized provider |
+| `UNKNOWN` | `"unknown"` | Provider not reported |
+
+### Outcome
+
+`mosaicstack_telemetry.Outcome`
+
+| Member | Value | Description |
+|--------|-------|-------------|
+| `SUCCESS` | `"success"` | Task completed, all quality gates passed |
+| `FAILURE` | `"failure"` | Task failed after all retries |
+| `PARTIAL` | `"partial"` | Task partially completed |
+| `TIMEOUT` | `"timeout"` | Task exceeded time or token budget |
+
+### QualityGate
+
+`mosaicstack_telemetry.QualityGate`
+
+| Member | Value | Description |
+|--------|-------|-------------|
+| `BUILD` | `"build"` | Code compiles/builds |
+| `LINT` | `"lint"` | Linter passes |
+| `TEST` | `"test"` | Tests pass |
+| `COVERAGE` | `"coverage"` | Coverage meets threshold |
+| `TYPECHECK` | `"typecheck"` | Type checker passes |
+| `SECURITY` | `"security"` | Security scan passes |
+
+### RepoSizeCategory
+
+`mosaicstack_telemetry.RepoSizeCategory`
+
+| Member | Value | Approximate LOC | Description |
+|--------|-------|----------------|-------------|
+| `TINY` | `"tiny"` | < 1,000 | Scripts, single-file projects |
+| `SMALL` | `"small"` | 1,000--10,000 | Small libraries, tools |
+| `MEDIUM` | `"medium"` | 10,000--100,000 | Standard applications |
+| `LARGE` | `"large"` | 100,000--1,000,000 | Large applications, monorepos |
+| `HUGE` | `"huge"` | > 1,000,000 | Enterprise codebases |
+
+---
+
+## Exceptions
+
+### TelemetryError
+
+`mosaicstack_telemetry.TelemetryError`
+
+Base exception for telemetry client errors. Extends `Exception`. Currently unused by the public API (since `track()` never throws), but available for custom error handling in integrations.
+
+---
+
+## Internal Components
+
+These are exported for advanced use cases but are managed automatically by `TelemetryClient`.
+
+### EventQueue
+
+`mosaicstack_telemetry.EventQueue`
+
+Thread-safe bounded FIFO queue. When full, oldest events are evicted.
+
+- `EventQueue(max_size: int = 1000)`
+- `put(event: TaskCompletionEvent) -> None` -- Add event, evict oldest if full
+- `drain(max_items: int) -> list[TaskCompletionEvent]` -- Remove and return up to N events
+- `put_back(events: list[TaskCompletionEvent]) -> None` -- Re-queue events at the front (for retries)
+- `size: int` -- Current queue length
+- `is_empty: bool` -- Whether the queue is empty
+
+### PredictionCache
+
+`mosaicstack_telemetry.PredictionCache`
+
+Thread-safe dict-based cache with TTL expiration.
+
+- `PredictionCache(ttl_seconds: float = 21600.0)`
+- `get(query: PredictionQuery) -> PredictionResponse | None` -- Get cached prediction
+- `put(query: PredictionQuery, response: PredictionResponse) -> None` -- Store prediction
+- `clear() -> None` -- Invalidate all entries
+- `size: int` -- Number of entries (including possibly expired)
--- a/docs/integration-guide.md
+++ b/docs/integration-guide.md
@@ -0,0 +1,610 @@
+# Integration Guide
+
+This guide covers installing and integrating `mosaicstack-telemetry` into Python applications. The SDK reports AI coding task-completion telemetry to a [Mosaic Stack Telemetry](https://github.com/mosaicstack/telemetry) server and queries crowd-sourced predictions.
+
+**Telemetry API version:** This SDK targets the Mosaic Telemetry API **v1** with event schema version **1.0**.
+
+## Installation
+
+```bash
+pip install mosaicstack-telemetry
+```
+
+Or with [uv](https://docs.astral.sh/uv/):
+
+```bash
+uv add mosaicstack-telemetry
+```
+
+**Requirements:** Python 3.10+. Runtime dependencies: `httpx` and `pydantic`.
+
+## Configuration
+
+### Constructor Parameters
+
+```python
+from mosaicstack_telemetry import TelemetryConfig
+
+config = TelemetryConfig(
+    server_url="https://tel-api.mosaicstack.dev",
+    api_key="your-64-char-hex-api-key-here...",
+    instance_id="a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d",
+)
+```
+
+All three fields (`server_url`, `api_key`, `instance_id`) are required when telemetry is enabled. The `api_key` must be a 64-character hexadecimal string issued by a Mosaic Telemetry administrator. The `instance_id` is a UUID that identifies your Mosaic Stack installation and must match the instance associated with your API key on the server.
+
+### Environment Variables
+
+Instead of passing values to the constructor, set environment variables:
+
+```bash
+export MOSAIC_TELEMETRY_ENABLED=true
+export MOSAIC_TELEMETRY_SERVER_URL=https://tel-api.mosaicstack.dev
+export MOSAIC_TELEMETRY_API_KEY=your-64-char-hex-api-key
+export MOSAIC_TELEMETRY_INSTANCE_ID=a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d
+```
+
+Then create the config with no arguments:
+
+```python
+config = TelemetryConfig()  # Reads from environment
+```
+
+Constructor values take priority over environment variables.
+
+### Full Configuration Reference
+
+| Parameter | Type | Default | Env Var | Description |
+|-----------|------|---------|---------|-------------|
+| `server_url` | `str` | `""` (required) | `MOSAIC_TELEMETRY_SERVER_URL` | Telemetry API base URL (no trailing slash) |
+| `api_key` | `str` | `""` (required) | `MOSAIC_TELEMETRY_API_KEY` | 64-character hex API key |
+| `instance_id` | `str` | `""` (required) | `MOSAIC_TELEMETRY_INSTANCE_ID` | UUID identifying this Mosaic Stack instance |
+| `enabled` | `bool` | `True` | `MOSAIC_TELEMETRY_ENABLED` | Enable/disable telemetry entirely |
+| `submit_interval_seconds` | `float` | `300.0` | -- | How often the background submitter flushes queued events (seconds) |
+| `max_queue_size` | `int` | `1000` | -- | Maximum events held in the in-memory queue |
+| `batch_size` | `int` | `100` | -- | Events per batch (server maximum is 100) |
+| `request_timeout_seconds` | `float` | `10.0` | -- | HTTP request timeout for API calls |
+| `prediction_cache_ttl_seconds` | `float` | `21600.0` | -- | Prediction cache TTL (default 6 hours) |
+| `dry_run` | `bool` | `False` | -- | Log batches but don't send to server |
+| `max_retries` | `int` | `3` | -- | Retries on transient failures (429, timeouts, network errors) |
+
+### Environment-Specific Configuration
+
+**Development:**
+
+```python
+config = TelemetryConfig(
+    server_url="http://localhost:8000",
+    api_key="a" * 64,
+    instance_id="12345678-1234-1234-1234-123456789abc",
+    dry_run=True,                    # Log but don't send
+    submit_interval_seconds=10.0,    # Flush quickly for testing
+)
+```
+
+**Production:**
+
+```python
+config = TelemetryConfig(
+    server_url="https://tel-api.mosaicstack.dev",
+    api_key=os.environ["MOSAIC_TELEMETRY_API_KEY"],
+    instance_id=os.environ["MOSAIC_TELEMETRY_INSTANCE_ID"],
+    submit_interval_seconds=300.0,   # Default: flush every 5 minutes
+    max_retries=3,                   # Retry transient failures
+)
+```
+
+---
+
+## Sync Usage (Threading)
+
+Best for scripts, CLI tools, aider integrations, and non-async contexts. The SDK spawns a daemon thread that periodically flushes queued events.
+
+```python
+from mosaicstack_telemetry import (
+    TelemetryClient,
+    TelemetryConfig,
+    EventBuilder,
+    TaskType,
+    Provider,
+    Harness,
+    Complexity,
+    Outcome,
+    QualityGate,
+)
+
+config = TelemetryConfig(
+    server_url="https://tel-api.mosaicstack.dev",
+    api_key="your-64-char-hex-api-key-here...",
+    instance_id="a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d",
+)
+
+client = TelemetryClient(config)
+client.start()  # Starts background daemon thread
+
+event = (
+    EventBuilder(instance_id=config.instance_id)
+    .task_type(TaskType.IMPLEMENTATION)
+    .model("claude-sonnet-4-5-20250929")
+    .provider(Provider.ANTHROPIC)
+    .harness_type(Harness.CLAUDE_CODE)
+    .complexity_level(Complexity.MEDIUM)
+    .outcome_value(Outcome.SUCCESS)
+    .duration_ms(45000)
+    .tokens(estimated_in=105000, estimated_out=45000, actual_in=112340, actual_out=38760)
+    .cost(estimated=630000, actual=919200)
+    .quality(
+        passed=True,
+        gates_run=[QualityGate.BUILD, QualityGate.LINT, QualityGate.TEST, QualityGate.TYPECHECK],
+    )
+    .context(compactions=2, rotations=0, utilization=0.72)
+    .language("typescript")
+    .build()
+)
+
+client.track(event)  # Non-blocking, thread-safe
+
+client.stop()  # Flushes remaining events and stops the thread
+```
+
+### Sync Context Manager
+
+The context manager calls `start()` on entry and `stop()` (with flush) on exit:
+
+```python
+with TelemetryClient(config) as client:
+    client.track(event)
+# Automatically flushed and stopped here
+```
+
+---
+
+## Async Usage (asyncio)
+
+For asyncio-based applications (FastAPI, aiohttp, etc.). The SDK creates an asyncio task that periodically flushes events.
+
+```python
+import asyncio
+from mosaicstack_telemetry import TelemetryClient, TelemetryConfig
+
+async def main():
+    config = TelemetryConfig(
+        server_url="https://tel-api.mosaicstack.dev",
+        api_key="your-64-char-hex-api-key-here...",
+        instance_id="a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d",
+    )
+
+    client = TelemetryClient(config)
+    await client.start_async()  # Starts asyncio background task
+
+    # track() is always synchronous and non-blocking
+    client.track(event)
+
+    await client.stop_async()  # Flushes remaining events
+
+asyncio.run(main())
+```
+
+### Async Context Manager
+
+```python
+async with TelemetryClient(config) as client:
+    client.track(event)
+# Automatically flushed and stopped here
+```
+
+### Key Difference: Sync vs Async
+
+| Aspect | Sync | Async |
+|--------|------|-------|
+| Start | `client.start()` | `await client.start_async()` |
+| Stop | `client.stop()` | `await client.stop_async()` |
+| Context manager | `with TelemetryClient(config)` | `async with TelemetryClient(config)` |
+| Background mechanism | `threading.Timer` (daemon thread) | `asyncio.Task` |
+| `track()` | Always synchronous | Always synchronous |
+| `refresh_predictions` | `refresh_predictions_sync(queries)` | `await refresh_predictions(queries)` |
+
+The `track()` method is always synchronous regardless of which mode you use. It simply appends to a thread-safe in-memory queue and returns immediately. The background submitter handles batching and sending.
+
+---
+
+## Integration Examples
+
+### Example 1: Instrumenting a FastAPI Service
+
+```python
+import os
+import time
+from contextlib import asynccontextmanager
+from uuid import UUID
+
+from fastapi import FastAPI
+
+from mosaicstack_telemetry import (
+    Complexity,
+    EventBuilder,
+    Harness,
+    Outcome,
+    Provider,
+    QualityGate,
+    TaskType,
+    TelemetryClient,
+    TelemetryConfig,
+)
+
+# Initialize telemetry once at startup
+config = TelemetryConfig(
+    server_url=os.environ.get("MOSAIC_TELEMETRY_SERVER_URL", "https://tel-api.mosaicstack.dev"),
+    api_key=os.environ["MOSAIC_TELEMETRY_API_KEY"],
+    instance_id=os.environ["MOSAIC_TELEMETRY_INSTANCE_ID"],
+)
+
+telemetry = TelemetryClient(config)
+
+
+@asynccontextmanager
+async def lifespan(app: FastAPI):
+    """Start telemetry on app startup, flush on shutdown."""
+    await telemetry.start_async()
+    yield
+    await telemetry.stop_async()
+
+
+app = FastAPI(lifespan=lifespan)
+
+
+@app.post("/tasks/complete")
+async def complete_task(
+    task_type: str,
+    model: str,
+    provider: str,
+    complexity: str,
+    actual_input_tokens: int,
+    actual_output_tokens: int,
+    actual_cost_usd_micros: int,
+    duration_ms: int,
+    outcome: str,
+):
+    """Record a completed AI coding task."""
+    event = (
+        EventBuilder(instance_id=config.instance_id)
+        .task_type(TaskType(task_type))
+        .model(model)
+        .provider(Provider(provider))
+        .harness_type(Harness.CLAUDE_CODE)
+        .complexity_level(Complexity(complexity))
+        .outcome_value(Outcome(outcome))
+        .duration_ms(duration_ms)
+        .tokens(
+            estimated_in=0,
+            estimated_out=0,
+            actual_in=actual_input_tokens,
+            actual_out=actual_output_tokens,
+        )
+        .cost(estimated=0, actual=actual_cost_usd_micros)
+        .quality(passed=outcome == "success", gates_run=[QualityGate.BUILD, QualityGate.TEST])
+        .context(compactions=0, rotations=0, utilization=0.0)
+        .build()
+    )
+
+    telemetry.track(event)  # Non-blocking, never throws
+    return {"status": "tracked"}
+```
+
+### Example 2: Instrumenting a Generic Python App
+
+```python
+"""
+Generic Python script that tracks AI coding tasks.
+Suitable for CLI tools, batch processors, or any non-async application.
+"""
+
+import os
+import time
+
+from mosaicstack_telemetry import (
+    Complexity,
+    EventBuilder,
+    Harness,
+    Outcome,
+    Provider,
+    QualityGate,
+    RepoSizeCategory,
+    TaskType,
+    TelemetryClient,
+    TelemetryConfig,
+)
+
+
+def run_coding_task() -> dict:
+    """Simulate an AI coding task. Returns task metrics."""
+    start = time.monotonic()
+
+    # ... your AI coding logic here ...
+
+    elapsed_ms = int((time.monotonic() - start) * 1000)
+    return {
+        "duration_ms": elapsed_ms,
+        "actual_input_tokens": 4500,
+        "actual_output_tokens": 1800,
+        "actual_cost_usd_micros": 48000,
+        "outcome": "success",
+        "quality_gates_passed": True,
+    }
+
+
+def main():
+    config = TelemetryConfig()  # Reads from MOSAIC_TELEMETRY_* env vars
+
+    with TelemetryClient(config) as client:
+        result = run_coding_task()
+
+        event = (
+            EventBuilder(instance_id=config.instance_id)
+            .task_type(TaskType.IMPLEMENTATION)
+            .model("claude-sonnet-4-5-20250929")
+            .provider(Provider.ANTHROPIC)
+            .harness_type(Harness.AIDER)
+            .complexity_level(Complexity.MEDIUM)
+            .outcome_value(Outcome(result["outcome"]))
+            .duration_ms(result["duration_ms"])
+            .tokens(
+                estimated_in=5000,
+                estimated_out=2000,
+                actual_in=result["actual_input_tokens"],
+                actual_out=result["actual_output_tokens"],
+            )
+            .cost(estimated=50000, actual=result["actual_cost_usd_micros"])
+            .quality(
+                passed=result["quality_gates_passed"],
+                gates_run=[QualityGate.BUILD, QualityGate.LINT, QualityGate.TEST],
+            )
+            .context(compactions=0, rotations=0, utilization=0.35)
+            .language("python")
+            .repo_size(RepoSizeCategory.MEDIUM)
+            .build()
+        )
+
+        client.track(event)
+        print(f"Tracked task: {event.event_id}")
+
+    # Client is automatically flushed and stopped after the `with` block
+
+
+if __name__ == "__main__":
+    main()
+```
+
+---
+
+## Building Events
+
+The `EventBuilder` provides a fluent API for constructing `TaskCompletionEvent` objects with sensible defaults. All setter methods return the builder instance for chaining.
+
+### Required Fields
+
+Every event requires these fields to be set (either via builder methods or from defaults):
+
+| Builder Method | Sets Field | Default |
+|----------------|-----------|---------|
+| `EventBuilder(instance_id=...)` | `instance_id` | (required) |
+| `.task_type(TaskType.X)` | `task_type` | `unknown` |
+| `.model("model-name")` | `model` | `"unknown"` |
+| `.provider(Provider.X)` | `provider` | `unknown` |
+| `.harness_type(Harness.X)` | `harness` | `unknown` |
+| `.complexity_level(Complexity.X)` | `complexity` | `medium` |
+| `.outcome_value(Outcome.X)` | `outcome` | `failure` |
+| `.duration_ms(N)` | `task_duration_ms` | `0` |
+| `.tokens(...)` | token fields | all `0` |
+| `.cost(...)` | cost fields | all `0` |
+| `.quality(...)` | quality fields | `passed=False, gates_run=[], gates_failed=[]` |
+| `.context(...)` | context fields | all `0` / `0.0` |
+
+### Auto-Generated Fields
+
+| Field | Auto-generated Value |
+|-------|---------------------|
+| `event_id` | Random UUID (override with `.event_id(uuid)`) |
+| `timestamp` | Current UTC time (override with `.timestamp(dt)`) |
+| `schema_version` | `"1.0"` (set automatically by `TaskCompletionEvent`) |
+
+### Optional Fields
+
+| Builder Method | Sets Field | Default |
+|----------------|-----------|---------|
+| `.language("python")` | `language` | `None` |
+| `.repo_size(RepoSizeCategory.MEDIUM)` | `repo_size_category` | `None` |
+| `.retry_count(N)` | `retry_count` | `0` |
+
+### Token and Cost Values
+
+Costs are expressed in **microdollars** (1 USD = 1,000,000 microdollars). This avoids floating-point precision issues.
+
+```python
+event = (
+    EventBuilder(instance_id=config.instance_id)
+    # ... other fields ...
+    .tokens(
+        estimated_in=105000,    # Pre-task estimate: input tokens
+        estimated_out=45000,    # Pre-task estimate: output tokens
+        actual_in=112340,       # Actual input tokens consumed
+        actual_out=38760,       # Actual output tokens generated
+    )
+    .cost(
+        estimated=630000,       # $0.63 in microdollars
+        actual=919200,          # $0.92 in microdollars
+    )
+    .build()
+)
+```
+
+### Quality Gates
+
+Record which quality gates were executed and their results:
+
+```python
+event = (
+    EventBuilder(instance_id=config.instance_id)
+    # ... other fields ...
+    .quality(
+        passed=False,
+        gates_run=[QualityGate.BUILD, QualityGate.LINT, QualityGate.TEST, QualityGate.COVERAGE],
+        gates_failed=[QualityGate.COVERAGE],
+    )
+    .build()
+)
+```
+
+Available gates: `BUILD`, `LINT`, `TEST`, `COVERAGE`, `TYPECHECK`, `SECURITY`.
+
+---
+
+## Querying Predictions
+
+The SDK can query crowd-sourced predictions from the telemetry server. Predictions provide percentile distributions for token usage, cost, duration, and quality metrics based on aggregated data from all participating instances.
+
+### Fetching Predictions
+
+```python
+from mosaicstack_telemetry import PredictionQuery, TaskType, Provider, Complexity
+
+queries = [
+    PredictionQuery(
+        task_type=TaskType.IMPLEMENTATION,
+        model="claude-sonnet-4-5-20250929",
+        provider=Provider.ANTHROPIC,
+        complexity=Complexity.MEDIUM,
+    ),
+    PredictionQuery(
+        task_type=TaskType.TESTING,
+        model="claude-haiku-4-5-20251001",
+        provider=Provider.ANTHROPIC,
+        complexity=Complexity.LOW,
+    ),
+]
+
+# Async
+await client.refresh_predictions(queries)
+
+# Sync
+client.refresh_predictions_sync(queries)
+```
+
+### Reading from Cache
+
+Predictions are stored in a TTL-based in-memory cache (default: 6 hours):
+
+```python
+prediction = client.get_prediction(queries[0])
+
+if prediction and prediction.prediction:
+    data = prediction.prediction
+    print(f"Input tokens (median): {data.input_tokens.median}")
+    print(f"Input tokens (p90): {data.input_tokens.p90}")
+    print(f"Output tokens (median): {data.output_tokens.median}")
+    print(f"Cost (median): ${data.cost_usd_micros['median'] / 1_000_000:.4f}")
+    print(f"Duration (median): {data.duration_ms['median'] / 1000:.1f}s")
+    print(f"Correction factor (input): {data.correction_factors.input:.2f}")
+    print(f"Quality gate pass rate: {data.quality.gate_pass_rate:.0%}")
+    print(f"Success rate: {data.quality.success_rate:.0%}")
+
+    meta = prediction.metadata
+    print(f"Sample size: {meta.sample_size}")
+    print(f"Confidence: {meta.confidence}")
+    if meta.fallback_note:
+        print(f"Note: {meta.fallback_note}")
+else:
+    print("No prediction data available for this combination")
+```
+
+### Prediction Confidence Levels
+
+| Level | Meaning |
+|-------|---------|
+| `high` | 100+ samples, exact dimension match |
+| `medium` | 30-99 samples, exact dimension match |
+| `low` | <30 samples or fallback was used |
+| `none` | No data available; `prediction` is `None` |
+
+---
+
+## Error Handling
+
+### The `track()` Contract
+
+**`track()` never throws and never blocks the caller.** If anything goes wrong (queue full, telemetry disabled, unexpected error), the event is silently dropped and the error is logged. This ensures telemetry instrumentation never affects your application's behavior.
+
+```python
+# This is always safe, even if telemetry is misconfigured
+client.track(event)
+```
+
+### Queue Overflow
+
+When the in-memory queue reaches `max_queue_size` (default 1000), the oldest events are evicted to make room for new ones. A warning is logged when this happens.
+
+### Submission Retries
+
+The background submitter retries transient failures with exponential backoff and jitter:
+
+- **429 Too Many Requests**: Honors the server's `Retry-After` header
+- **Timeouts**: Retried with backoff
+- **Network errors**: Retried with backoff
+- **403 Forbidden**: Not retried (configuration error)
+
+Failed batches are re-queued for the next submission cycle (up to queue capacity).
+
+### Logging
+
+All SDK logging uses the `mosaicstack_telemetry` logger. Enable it to see submission activity:
+
+```python
+import logging
+
+logging.basicConfig(level=logging.DEBUG)
+# Or target the SDK logger specifically:
+logging.getLogger("mosaicstack_telemetry").setLevel(logging.DEBUG)
+```
+
+---
+
+## Dry-Run Mode
+
+Test your integration without sending data to the server:
+
+```python
+config = TelemetryConfig(
+    server_url="https://tel-api.mosaicstack.dev",
+    api_key="a" * 64,
+    instance_id="12345678-1234-1234-1234-123456789abc",
+    dry_run=True,
+)
+
+with TelemetryClient(config) as client:
+    client.track(event)
+    # Logs: "[DRY RUN] Would submit batch of 1 events to ..."
+```
+
+## Disabling Telemetry
+
+Set `enabled=False` or the environment variable `MOSAIC_TELEMETRY_ENABLED=false`:
+
+```python
+config = TelemetryConfig(enabled=False)
+
+with TelemetryClient(config) as client:
+    client.track(event)  # Silently dropped, no background thread started
+```
+
+---
+
+## API Compatibility
+
+| SDK Version | Telemetry API | Event Schema | Notes |
+|-------------|---------------|--------------|-------|
+| 0.1.x | v1 (`/v1/*`) | 1.0 | Current release |
+
+The SDK submits events to `POST /v1/events/batch` and queries predictions from `POST /v1/predictions/batch`. These are the only two server endpoints the SDK communicates with.
+
+For the full server API documentation, see the [Mosaic Telemetry API Reference](https://github.com/mosaicstack/telemetry).