docs(#1): SDK integration guide, API reference, and CI pipeline
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
- Rewrite README with quick start, FastAPI snippet, async/sync patterns, config reference with env vars, and API version targeting (v1, schema 1.0) - Add docs/integration-guide.md with full FastAPI and generic Python integration examples, environment-specific config, prediction queries, error handling, and dry-run mode documentation - Add docs/api-reference.md covering all exported classes, methods, Pydantic models, enums (TaskType, Complexity, Harness, Provider, QualityGate, Outcome, RepoSizeCategory), and internal components - Add Woodpecker CI pipeline (.woodpecker.yml) with quality gates: lint, format check, typecheck, bandit security scan, pip-audit, and pytest with 85% coverage gate - Add bandit and pip-audit to dev dependencies Fixes #1 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
496
docs/api-reference.md
Normal file
496
docs/api-reference.md
Normal file
@@ -0,0 +1,496 @@
|
||||
# SDK API Reference
|
||||
|
||||
Complete reference for all public classes, methods, types, and enums exported by `mosaicstack-telemetry`.
|
||||
|
||||
**SDK version:** 0.1.0
|
||||
**Telemetry API version:** v1
|
||||
**Event schema version:** 1.0
|
||||
|
||||
---
|
||||
|
||||
## TelemetryClient
|
||||
|
||||
`mosaicstack_telemetry.TelemetryClient`
|
||||
|
||||
The main entry point for the SDK. Manages event queuing, background submission, and prediction caching.
|
||||
|
||||
### Constructor
|
||||
|
||||
```python
|
||||
TelemetryClient(config: TelemetryConfig)
|
||||
```
|
||||
|
||||
Validates the config on construction. If validation errors are found and telemetry is enabled, warnings are logged (but the client is still created).
|
||||
|
||||
### Methods
|
||||
|
||||
#### `start() -> None`
|
||||
|
||||
Start background event submission using a threading-based loop. Spawns a daemon thread that flushes the queue every `config.submit_interval_seconds`.
|
||||
|
||||
No-op if `config.enabled` is `False`.
|
||||
|
||||
#### `await start_async() -> None`
|
||||
|
||||
Start background event submission using an asyncio task. Creates an `asyncio.Task` that flushes the queue periodically.
|
||||
|
||||
No-op if `config.enabled` is `False`.
|
||||
|
||||
#### `stop() -> None`
|
||||
|
||||
Stop the sync background submitter. Performs a final flush of all remaining queued events before returning. Safe to call if not started.
|
||||
|
||||
#### `await stop_async() -> None`
|
||||
|
||||
Stop the async background submitter. Performs a final flush of all remaining queued events before returning. Safe to call if not started.
|
||||
|
||||
#### `track(event: TaskCompletionEvent) -> None`
|
||||
|
||||
Queue an event for submission. **Always synchronous.** **Never blocks.** **Never throws.**
|
||||
|
||||
If telemetry is disabled, the event is silently dropped. If the queue is full, the oldest event is evicted. Any unexpected error is caught and logged.
|
||||
|
||||
This method is thread-safe and can be called from any thread or coroutine.
|
||||
|
||||
#### `get_prediction(query: PredictionQuery) -> PredictionResponse | None`
|
||||
|
||||
Return a cached prediction for the given query, or `None` if not cached or expired.
|
||||
|
||||
#### `refresh_predictions_sync(queries: list[PredictionQuery]) -> None`
|
||||
|
||||
Fetch predictions from the server synchronously using `POST /v1/predictions/batch`. Results are stored in the internal prediction cache.
|
||||
|
||||
No-op if `queries` is empty.
|
||||
|
||||
#### `await refresh_predictions(queries: list[PredictionQuery]) -> None`
|
||||
|
||||
Fetch predictions from the server asynchronously using `POST /v1/predictions/batch`. Results are stored in the internal prediction cache.
|
||||
|
||||
No-op if `queries` is empty.
|
||||
|
||||
### Properties
|
||||
|
||||
#### `queue_size: int`
|
||||
|
||||
Number of events currently in the in-memory queue.
|
||||
|
||||
#### `is_running: bool`
|
||||
|
||||
Whether the background submitter (sync or async) is currently active.
|
||||
|
||||
### Context Managers
|
||||
|
||||
```python
|
||||
# Sync: calls start() on entry, stop() on exit
|
||||
with TelemetryClient(config) as client:
|
||||
client.track(event)
|
||||
|
||||
# Async: calls start_async() on entry, stop_async() on exit
|
||||
async with TelemetryClient(config) as client:
|
||||
client.track(event)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## TelemetryConfig
|
||||
|
||||
`mosaicstack_telemetry.TelemetryConfig`
|
||||
|
||||
A dataclass holding all configuration for the telemetry client. Supports environment variable overrides.
|
||||
|
||||
### Fields
|
||||
|
||||
| Field | Type | Default | Description |
|
||||
|-------|------|---------|-------------|
|
||||
| `server_url` | `str` | `""` | Telemetry API base URL. Trailing slashes are stripped. |
|
||||
| `api_key` | `str` | `""` | 64-character hex API key for Bearer token auth. |
|
||||
| `instance_id` | `str` | `""` | UUID string identifying this Mosaic Stack instance. |
|
||||
| `enabled` | `bool` | `True` | Master switch for telemetry. When `False`, `track()` is a no-op and no background threads/tasks are created. |
|
||||
| `submit_interval_seconds` | `float` | `300.0` | Interval between background queue flushes (seconds). |
|
||||
| `max_queue_size` | `int` | `1000` | Maximum events in the in-memory queue. Older events are evicted when full. |
|
||||
| `batch_size` | `int` | `100` | Events per HTTP batch request. Server maximum is 100. |
|
||||
| `request_timeout_seconds` | `float` | `10.0` | HTTP request timeout (seconds). |
|
||||
| `prediction_cache_ttl_seconds` | `float` | `21600.0` | Prediction cache time-to-live (seconds). Default: 6 hours. |
|
||||
| `dry_run` | `bool` | `False` | When `True`, batches are logged but not sent to the server. |
|
||||
| `max_retries` | `int` | `3` | Maximum retry attempts for transient failures. |
|
||||
| `user_agent` | `str` | `"mosaicstack-telemetry-python/0.1.0"` | User-Agent header sent with all requests. |
|
||||
|
||||
### Environment Variables
|
||||
|
||||
These are read during `__post_init__` and only apply when the corresponding constructor field is empty/default:
|
||||
|
||||
| Env Var | Field | Notes |
|
||||
|---------|-------|-------|
|
||||
| `MOSAIC_TELEMETRY_ENABLED` | `enabled` | Accepts `1`, `true`, `yes` (case-insensitive) as truthy. Always overrides the constructor value. |
|
||||
| `MOSAIC_TELEMETRY_SERVER_URL` | `server_url` | Only used if `server_url` is not set in the constructor. |
|
||||
| `MOSAIC_TELEMETRY_API_KEY` | `api_key` | Only used if `api_key` is not set in the constructor. |
|
||||
| `MOSAIC_TELEMETRY_INSTANCE_ID` | `instance_id` | Only used if `instance_id` is not set in the constructor. |
|
||||
|
||||
### Methods
|
||||
|
||||
#### `validate() -> list[str]`
|
||||
|
||||
Validate the configuration and return a list of error messages. Returns an empty list if valid.
|
||||
|
||||
Checks performed:
|
||||
- `server_url` is non-empty and starts with `http://` or `https://`
|
||||
- `api_key` is a 64-character hex string
|
||||
- `instance_id` is a valid UUID string
|
||||
- `submit_interval_seconds` is positive
|
||||
- `max_queue_size` is positive
|
||||
- `batch_size` is between 1 and 100
|
||||
- `request_timeout_seconds` is positive
|
||||
|
||||
---
|
||||
|
||||
## EventBuilder
|
||||
|
||||
`mosaicstack_telemetry.EventBuilder`
|
||||
|
||||
Fluent builder for constructing `TaskCompletionEvent` instances with a chainable API and sensible defaults.
|
||||
|
||||
### Constructor
|
||||
|
||||
```python
|
||||
EventBuilder(instance_id: str | UUID)
|
||||
```
|
||||
|
||||
### Setter Methods
|
||||
|
||||
All setter methods return `self` for chaining.
|
||||
|
||||
| Method | Parameter Type | Sets Field | Default |
|
||||
|--------|---------------|-----------|---------|
|
||||
| `.event_id(value)` | `str \| UUID` | `event_id` | Auto-generated UUIDv4 |
|
||||
| `.timestamp(value)` | `datetime` | `timestamp` | `datetime.now(UTC)` |
|
||||
| `.task_type(value)` | `TaskType` | `task_type` | `TaskType.UNKNOWN` |
|
||||
| `.complexity_level(value)` | `Complexity` | `complexity` | `Complexity.MEDIUM` |
|
||||
| `.harness_type(value)` | `Harness` | `harness` | `Harness.UNKNOWN` |
|
||||
| `.model(value)` | `str` | `model` | `"unknown"` |
|
||||
| `.provider(value)` | `Provider` | `provider` | `Provider.UNKNOWN` |
|
||||
| `.duration_ms(value)` | `int` | `task_duration_ms` | `0` |
|
||||
| `.outcome_value(value)` | `Outcome` | `outcome` | `Outcome.FAILURE` |
|
||||
| `.retry_count(value)` | `int` | `retry_count` | `0` |
|
||||
| `.language(value)` | `str \| None` | `language` | `None` |
|
||||
| `.repo_size(value)` | `RepoSizeCategory \| None` | `repo_size_category` | `None` |
|
||||
|
||||
#### `.tokens(*, estimated_in, estimated_out, actual_in, actual_out) -> EventBuilder`
|
||||
|
||||
Set all four token count fields. All parameters are keyword-only integers (default `0`).
|
||||
|
||||
#### `.cost(*, estimated, actual) -> EventBuilder`
|
||||
|
||||
Set estimated and actual cost in microdollars. Both are keyword-only integers (default `0`).
|
||||
|
||||
#### `.quality(*, passed, gates_run=None, gates_failed=None) -> EventBuilder`
|
||||
|
||||
Set quality gate results. `passed` is required. `gates_run` and `gates_failed` are optional lists of `QualityGate` values.
|
||||
|
||||
#### `.context(*, compactions=0, rotations=0, utilization=0.0) -> EventBuilder`
|
||||
|
||||
Set context window metrics. All parameters are keyword-only with defaults.
|
||||
|
||||
#### `.build() -> TaskCompletionEvent`
|
||||
|
||||
Construct and return the `TaskCompletionEvent`. The builder can be reused after calling `.build()`.
|
||||
|
||||
---
|
||||
|
||||
## TaskCompletionEvent
|
||||
|
||||
`mosaicstack_telemetry.TaskCompletionEvent`
|
||||
|
||||
Pydantic model representing a single telemetry event. Matches the server's v1 event schema (version 1.0).
|
||||
|
||||
### Fields
|
||||
|
||||
| Field | Type | Required | Constraints | Description |
|
||||
|-------|------|----------|-------------|-------------|
|
||||
| `instance_id` | `UUID` | Yes | Valid UUID | Mosaic Stack installation identifier |
|
||||
| `event_id` | `UUID` | No | Valid UUID | Unique event identifier (default: auto-generated) |
|
||||
| `schema_version` | `str` | No | -- | Event schema version (default: `"1.0"`) |
|
||||
| `timestamp` | `datetime` | No | -- | When the task completed (default: now UTC) |
|
||||
| `task_duration_ms` | `int` | Yes | 0--86,400,000 | Task wall-clock time in milliseconds |
|
||||
| `task_type` | `TaskType` | Yes | Enum value | Type of work performed |
|
||||
| `complexity` | `Complexity` | Yes | Enum value | Task complexity level |
|
||||
| `harness` | `Harness` | Yes | Enum value | Coding tool / execution environment |
|
||||
| `model` | `str` | Yes | 1--100 chars | Model identifier |
|
||||
| `provider` | `Provider` | Yes | Enum value | LLM provider |
|
||||
| `estimated_input_tokens` | `int` | Yes | 0--10,000,000 | Pre-task input token estimate |
|
||||
| `estimated_output_tokens` | `int` | Yes | 0--10,000,000 | Pre-task output token estimate |
|
||||
| `actual_input_tokens` | `int` | Yes | 0--10,000,000 | Actual input tokens consumed |
|
||||
| `actual_output_tokens` | `int` | Yes | 0--10,000,000 | Actual output tokens generated |
|
||||
| `estimated_cost_usd_micros` | `int` | Yes | 0--100,000,000 | Estimated cost in microdollars |
|
||||
| `actual_cost_usd_micros` | `int` | Yes | 0--100,000,000 | Actual cost in microdollars |
|
||||
| `quality_gate_passed` | `bool` | Yes | -- | Whether all quality gates passed |
|
||||
| `quality_gates_run` | `list[QualityGate]` | No | -- | Gates executed (default: `[]`) |
|
||||
| `quality_gates_failed` | `list[QualityGate]` | No | -- | Gates that failed (default: `[]`) |
|
||||
| `context_compactions` | `int` | Yes | 0--100 | Context compaction events during task |
|
||||
| `context_rotations` | `int` | Yes | 0--50 | Agent session rotations during task |
|
||||
| `context_utilization_final` | `float` | Yes | 0.0--1.0 | Final context usage ratio |
|
||||
| `outcome` | `Outcome` | Yes | Enum value | Final task result |
|
||||
| `retry_count` | `int` | Yes | 0--20 | Retries before final outcome |
|
||||
| `language` | `str \| None` | No | Max 30 chars | Primary programming language |
|
||||
| `repo_size_category` | `RepoSizeCategory \| None` | No | Enum value | Repository size bucket |
|
||||
|
||||
---
|
||||
|
||||
## Prediction Types
|
||||
|
||||
### PredictionQuery
|
||||
|
||||
`mosaicstack_telemetry.PredictionQuery`
|
||||
|
||||
Query parameters for fetching a prediction.
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `task_type` | `TaskType` | Task type to predict for |
|
||||
| `model` | `str` | Model identifier |
|
||||
| `provider` | `Provider` | LLM provider |
|
||||
| `complexity` | `Complexity` | Complexity level |
|
||||
|
||||
### PredictionResponse
|
||||
|
||||
`mosaicstack_telemetry.PredictionResponse`
|
||||
|
||||
Response from the prediction endpoint.
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `prediction` | `PredictionData \| None` | Prediction data, or `None` if no data is available |
|
||||
| `metadata` | `PredictionMetadata` | Metadata about how the prediction was generated |
|
||||
|
||||
### PredictionData
|
||||
|
||||
`mosaicstack_telemetry.PredictionData`
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `input_tokens` | `TokenDistribution` | Input token distribution (p10/p25/median/p75/p90) |
|
||||
| `output_tokens` | `TokenDistribution` | Output token distribution (p10/p25/median/p75/p90) |
|
||||
| `cost_usd_micros` | `dict[str, int]` | `{"median": <value>}` -- median cost in microdollars |
|
||||
| `duration_ms` | `dict[str, int]` | `{"median": <value>}` -- median duration in milliseconds |
|
||||
| `correction_factors` | `CorrectionFactors` | Actual-to-estimated token ratios |
|
||||
| `quality` | `QualityPrediction` | Gate pass rate and success rate |
|
||||
|
||||
### TokenDistribution
|
||||
|
||||
`mosaicstack_telemetry.TokenDistribution`
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `p10` | `int` | 10th percentile |
|
||||
| `p25` | `int` | 25th percentile |
|
||||
| `median` | `int` | 50th percentile (median) |
|
||||
| `p75` | `int` | 75th percentile |
|
||||
| `p90` | `int` | 90th percentile |
|
||||
|
||||
### CorrectionFactors
|
||||
|
||||
`mosaicstack_telemetry.CorrectionFactors`
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `input` | `float` | Ratio of actual to estimated input tokens (>1.0 = estimates too low) |
|
||||
| `output` | `float` | Ratio of actual to estimated output tokens |
|
||||
|
||||
### QualityPrediction
|
||||
|
||||
`mosaicstack_telemetry.QualityPrediction`
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `gate_pass_rate` | `float` | Fraction of tasks where all quality gates passed |
|
||||
| `success_rate` | `float` | Fraction of tasks with `outcome: success` |
|
||||
|
||||
### PredictionMetadata
|
||||
|
||||
`mosaicstack_telemetry.PredictionMetadata`
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `sample_size` | `int` | Number of events used to compute the prediction |
|
||||
| `fallback_level` | `int` | `0` = exact match, `1+` = dimensions dropped, `-1` = no data |
|
||||
| `confidence` | `str` | `"high"`, `"medium"`, `"low"`, or `"none"` |
|
||||
| `last_updated` | `datetime \| None` | When the prediction was last computed |
|
||||
| `dimensions_matched` | `dict[str, str \| None] \| None` | Which dimensions matched (`None` values = fallback) |
|
||||
| `fallback_note` | `str \| None` | Explanation when fallback was used |
|
||||
| `cache_hit` | `bool` | Whether the server served from its cache |
|
||||
|
||||
---
|
||||
|
||||
## Batch Types
|
||||
|
||||
### BatchEventRequest
|
||||
|
||||
`mosaicstack_telemetry.BatchEventRequest`
|
||||
|
||||
Request body for `POST /v1/events/batch`. Used internally by the submitter.
|
||||
|
||||
| Field | Type | Constraints | Description |
|
||||
|-------|------|-------------|-------------|
|
||||
| `events` | `list[TaskCompletionEvent]` | 1--100 items | Events to submit |
|
||||
|
||||
### BatchEventResponse
|
||||
|
||||
`mosaicstack_telemetry.BatchEventResponse`
|
||||
|
||||
Response from the batch event endpoint.
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `accepted` | `int` | Count of accepted events |
|
||||
| `rejected` | `int` | Count of rejected events |
|
||||
| `results` | `list[BatchEventResult]` | Per-event results |
|
||||
|
||||
### BatchEventResult
|
||||
|
||||
`mosaicstack_telemetry.BatchEventResult`
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `event_id` | `UUID` | The event's unique identifier |
|
||||
| `status` | `str` | `"accepted"` or `"rejected"` |
|
||||
| `error` | `str \| None` | Error message if rejected |
|
||||
|
||||
---
|
||||
|
||||
## Enumerations
|
||||
|
||||
All enums use `str, Enum` mixin (Python 3.10 compatible). Their `.value` is the lowercase string sent to the server.
|
||||
|
||||
### TaskType
|
||||
|
||||
`mosaicstack_telemetry.TaskType`
|
||||
|
||||
| Member | Value | Description |
|
||||
|--------|-------|-------------|
|
||||
| `PLANNING` | `"planning"` | Architecture design, task breakdown |
|
||||
| `IMPLEMENTATION` | `"implementation"` | Writing new code |
|
||||
| `CODE_REVIEW` | `"code_review"` | Reviewing existing code |
|
||||
| `TESTING` | `"testing"` | Writing or running tests |
|
||||
| `DEBUGGING` | `"debugging"` | Investigating and fixing bugs |
|
||||
| `REFACTORING` | `"refactoring"` | Restructuring existing code |
|
||||
| `DOCUMENTATION` | `"documentation"` | Writing docs, comments, READMEs |
|
||||
| `CONFIGURATION` | `"configuration"` | Config files, CI/CD, infrastructure |
|
||||
| `SECURITY_AUDIT` | `"security_audit"` | Security review, vulnerability analysis |
|
||||
| `UNKNOWN` | `"unknown"` | Unclassified task type (fallback) |
|
||||
|
||||
### Complexity
|
||||
|
||||
`mosaicstack_telemetry.Complexity`
|
||||
|
||||
| Member | Value | Description |
|
||||
|--------|-------|-------------|
|
||||
| `LOW` | `"low"` | Simple fixes, typos, config changes |
|
||||
| `MEDIUM` | `"medium"` | Standard features, moderate logic |
|
||||
| `HIGH` | `"high"` | Complex features, multi-file changes |
|
||||
| `CRITICAL` | `"critical"` | Major refactoring, architectural changes |
|
||||
|
||||
### Harness
|
||||
|
||||
`mosaicstack_telemetry.Harness`
|
||||
|
||||
| Member | Value | Description |
|
||||
|--------|-------|-------------|
|
||||
| `CLAUDE_CODE` | `"claude_code"` | Anthropic Claude Code CLI |
|
||||
| `OPENCODE` | `"opencode"` | OpenCode CLI |
|
||||
| `KILO_CODE` | `"kilo_code"` | Kilo Code VS Code extension |
|
||||
| `AIDER` | `"aider"` | Aider AI pair programming |
|
||||
| `API_DIRECT` | `"api_direct"` | Direct API calls (no harness) |
|
||||
| `OLLAMA_LOCAL` | `"ollama_local"` | Ollama local inference |
|
||||
| `CUSTOM` | `"custom"` | Custom or unrecognized harness |
|
||||
| `UNKNOWN` | `"unknown"` | Harness not reported |
|
||||
|
||||
### Provider
|
||||
|
||||
`mosaicstack_telemetry.Provider`
|
||||
|
||||
| Member | Value | Description |
|
||||
|--------|-------|-------------|
|
||||
| `ANTHROPIC` | `"anthropic"` | Anthropic (Claude models) |
|
||||
| `OPENAI` | `"openai"` | OpenAI (GPT models) |
|
||||
| `OPENROUTER` | `"openrouter"` | OpenRouter (multi-provider routing) |
|
||||
| `OLLAMA` | `"ollama"` | Ollama (local/self-hosted) |
|
||||
| `GOOGLE` | `"google"` | Google (Gemini models) |
|
||||
| `MISTRAL` | `"mistral"` | Mistral AI |
|
||||
| `CUSTOM` | `"custom"` | Custom or unrecognized provider |
|
||||
| `UNKNOWN` | `"unknown"` | Provider not reported |
|
||||
|
||||
### Outcome
|
||||
|
||||
`mosaicstack_telemetry.Outcome`
|
||||
|
||||
| Member | Value | Description |
|
||||
|--------|-------|-------------|
|
||||
| `SUCCESS` | `"success"` | Task completed, all quality gates passed |
|
||||
| `FAILURE` | `"failure"` | Task failed after all retries |
|
||||
| `PARTIAL` | `"partial"` | Task partially completed |
|
||||
| `TIMEOUT` | `"timeout"` | Task exceeded time or token budget |
|
||||
|
||||
### QualityGate
|
||||
|
||||
`mosaicstack_telemetry.QualityGate`
|
||||
|
||||
| Member | Value | Description |
|
||||
|--------|-------|-------------|
|
||||
| `BUILD` | `"build"` | Code compiles/builds |
|
||||
| `LINT` | `"lint"` | Linter passes |
|
||||
| `TEST` | `"test"` | Tests pass |
|
||||
| `COVERAGE` | `"coverage"` | Coverage meets threshold |
|
||||
| `TYPECHECK` | `"typecheck"` | Type checker passes |
|
||||
| `SECURITY` | `"security"` | Security scan passes |
|
||||
|
||||
### RepoSizeCategory
|
||||
|
||||
`mosaicstack_telemetry.RepoSizeCategory`
|
||||
|
||||
| Member | Value | Approximate LOC | Description |
|
||||
|--------|-------|----------------|-------------|
|
||||
| `TINY` | `"tiny"` | < 1,000 | Scripts, single-file projects |
|
||||
| `SMALL` | `"small"` | 1,000--10,000 | Small libraries, tools |
|
||||
| `MEDIUM` | `"medium"` | 10,000--100,000 | Standard applications |
|
||||
| `LARGE` | `"large"` | 100,000--1,000,000 | Large applications, monorepos |
|
||||
| `HUGE` | `"huge"` | > 1,000,000 | Enterprise codebases |
|
||||
|
||||
---
|
||||
|
||||
## Exceptions
|
||||
|
||||
### TelemetryError
|
||||
|
||||
`mosaicstack_telemetry.TelemetryError`
|
||||
|
||||
Base exception for telemetry client errors. Extends `Exception`. Currently unused by the public API (since `track()` never throws), but available for custom error handling in integrations.
|
||||
|
||||
---
|
||||
|
||||
## Internal Components
|
||||
|
||||
These are exported for advanced use cases but are managed automatically by `TelemetryClient`.
|
||||
|
||||
### EventQueue
|
||||
|
||||
`mosaicstack_telemetry.EventQueue`
|
||||
|
||||
Thread-safe bounded FIFO queue. When full, oldest events are evicted.
|
||||
|
||||
- `EventQueue(max_size: int = 1000)`
|
||||
- `put(event: TaskCompletionEvent) -> None` -- Add event, evict oldest if full
|
||||
- `drain(max_items: int) -> list[TaskCompletionEvent]` -- Remove and return up to N events
|
||||
- `put_back(events: list[TaskCompletionEvent]) -> None` -- Re-queue events at the front (for retries)
|
||||
- `size: int` -- Current queue length
|
||||
- `is_empty: bool` -- Whether the queue is empty
|
||||
|
||||
### PredictionCache
|
||||
|
||||
`mosaicstack_telemetry.PredictionCache`
|
||||
|
||||
Thread-safe dict-based cache with TTL expiration.
|
||||
|
||||
- `PredictionCache(ttl_seconds: float = 21600.0)`
|
||||
- `get(query: PredictionQuery) -> PredictionResponse | None` -- Get cached prediction
|
||||
- `put(query: PredictionQuery, response: PredictionResponse) -> None` -- Store prediction
|
||||
- `clear() -> None` -- Invalidate all entries
|
||||
- `size: int` -- Number of entries (including possibly expired)
|
||||
610
docs/integration-guide.md
Normal file
610
docs/integration-guide.md
Normal file
@@ -0,0 +1,610 @@
|
||||
# Integration Guide
|
||||
|
||||
This guide covers installing and integrating `mosaicstack-telemetry` into Python applications. The SDK reports AI coding task-completion telemetry to a [Mosaic Stack Telemetry](https://github.com/mosaicstack/telemetry) server and queries crowd-sourced predictions.
|
||||
|
||||
**Telemetry API version:** This SDK targets the Mosaic Telemetry API **v1** with event schema version **1.0**.
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
pip install mosaicstack-telemetry
|
||||
```
|
||||
|
||||
Or with [uv](https://docs.astral.sh/uv/):
|
||||
|
||||
```bash
|
||||
uv add mosaicstack-telemetry
|
||||
```
|
||||
|
||||
**Requirements:** Python 3.10+. Runtime dependencies: `httpx` and `pydantic`.
|
||||
|
||||
## Configuration
|
||||
|
||||
### Constructor Parameters
|
||||
|
||||
```python
|
||||
from mosaicstack_telemetry import TelemetryConfig
|
||||
|
||||
config = TelemetryConfig(
|
||||
server_url="https://tel-api.mosaicstack.dev",
|
||||
api_key="your-64-char-hex-api-key-here...",
|
||||
instance_id="a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d",
|
||||
)
|
||||
```
|
||||
|
||||
All three fields (`server_url`, `api_key`, `instance_id`) are required when telemetry is enabled. The `api_key` must be a 64-character hexadecimal string issued by a Mosaic Telemetry administrator. The `instance_id` is a UUID that identifies your Mosaic Stack installation and must match the instance associated with your API key on the server.
|
||||
|
||||
### Environment Variables
|
||||
|
||||
Instead of passing values to the constructor, set environment variables:
|
||||
|
||||
```bash
|
||||
export MOSAIC_TELEMETRY_ENABLED=true
|
||||
export MOSAIC_TELEMETRY_SERVER_URL=https://tel-api.mosaicstack.dev
|
||||
export MOSAIC_TELEMETRY_API_KEY=your-64-char-hex-api-key
|
||||
export MOSAIC_TELEMETRY_INSTANCE_ID=a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d
|
||||
```
|
||||
|
||||
Then create the config with no arguments:
|
||||
|
||||
```python
|
||||
config = TelemetryConfig() # Reads from environment
|
||||
```
|
||||
|
||||
Constructor values take priority over environment variables.
|
||||
|
||||
### Full Configuration Reference
|
||||
|
||||
| Parameter | Type | Default | Env Var | Description |
|
||||
|-----------|------|---------|---------|-------------|
|
||||
| `server_url` | `str` | `""` (required) | `MOSAIC_TELEMETRY_SERVER_URL` | Telemetry API base URL (no trailing slash) |
|
||||
| `api_key` | `str` | `""` (required) | `MOSAIC_TELEMETRY_API_KEY` | 64-character hex API key |
|
||||
| `instance_id` | `str` | `""` (required) | `MOSAIC_TELEMETRY_INSTANCE_ID` | UUID identifying this Mosaic Stack instance |
|
||||
| `enabled` | `bool` | `True` | `MOSAIC_TELEMETRY_ENABLED` | Enable/disable telemetry entirely |
|
||||
| `submit_interval_seconds` | `float` | `300.0` | -- | How often the background submitter flushes queued events (seconds) |
|
||||
| `max_queue_size` | `int` | `1000` | -- | Maximum events held in the in-memory queue |
|
||||
| `batch_size` | `int` | `100` | -- | Events per batch (server maximum is 100) |
|
||||
| `request_timeout_seconds` | `float` | `10.0` | -- | HTTP request timeout for API calls |
|
||||
| `prediction_cache_ttl_seconds` | `float` | `21600.0` | -- | Prediction cache TTL (default 6 hours) |
|
||||
| `dry_run` | `bool` | `False` | -- | Log batches but don't send to server |
|
||||
| `max_retries` | `int` | `3` | -- | Retries on transient failures (429, timeouts, network errors) |
|
||||
|
||||
### Environment-Specific Configuration
|
||||
|
||||
**Development:**
|
||||
|
||||
```python
|
||||
config = TelemetryConfig(
|
||||
server_url="http://localhost:8000",
|
||||
api_key="a" * 64,
|
||||
instance_id="12345678-1234-1234-1234-123456789abc",
|
||||
dry_run=True, # Log but don't send
|
||||
submit_interval_seconds=10.0, # Flush quickly for testing
|
||||
)
|
||||
```
|
||||
|
||||
**Production:**
|
||||
|
||||
```python
|
||||
config = TelemetryConfig(
|
||||
server_url="https://tel-api.mosaicstack.dev",
|
||||
api_key=os.environ["MOSAIC_TELEMETRY_API_KEY"],
|
||||
instance_id=os.environ["MOSAIC_TELEMETRY_INSTANCE_ID"],
|
||||
submit_interval_seconds=300.0, # Default: flush every 5 minutes
|
||||
max_retries=3, # Retry transient failures
|
||||
)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Sync Usage (Threading)
|
||||
|
||||
Best for scripts, CLI tools, aider integrations, and non-async contexts. The SDK spawns a daemon thread that periodically flushes queued events.
|
||||
|
||||
```python
|
||||
from mosaicstack_telemetry import (
|
||||
TelemetryClient,
|
||||
TelemetryConfig,
|
||||
EventBuilder,
|
||||
TaskType,
|
||||
Provider,
|
||||
Harness,
|
||||
Complexity,
|
||||
Outcome,
|
||||
QualityGate,
|
||||
)
|
||||
|
||||
config = TelemetryConfig(
|
||||
server_url="https://tel-api.mosaicstack.dev",
|
||||
api_key="your-64-char-hex-api-key-here...",
|
||||
instance_id="a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d",
|
||||
)
|
||||
|
||||
client = TelemetryClient(config)
|
||||
client.start() # Starts background daemon thread
|
||||
|
||||
event = (
|
||||
EventBuilder(instance_id=config.instance_id)
|
||||
.task_type(TaskType.IMPLEMENTATION)
|
||||
.model("claude-sonnet-4-5-20250929")
|
||||
.provider(Provider.ANTHROPIC)
|
||||
.harness_type(Harness.CLAUDE_CODE)
|
||||
.complexity_level(Complexity.MEDIUM)
|
||||
.outcome_value(Outcome.SUCCESS)
|
||||
.duration_ms(45000)
|
||||
.tokens(estimated_in=105000, estimated_out=45000, actual_in=112340, actual_out=38760)
|
||||
.cost(estimated=630000, actual=919200)
|
||||
.quality(
|
||||
passed=True,
|
||||
gates_run=[QualityGate.BUILD, QualityGate.LINT, QualityGate.TEST, QualityGate.TYPECHECK],
|
||||
)
|
||||
.context(compactions=2, rotations=0, utilization=0.72)
|
||||
.language("typescript")
|
||||
.build()
|
||||
)
|
||||
|
||||
client.track(event) # Non-blocking, thread-safe
|
||||
|
||||
client.stop() # Flushes remaining events and stops the thread
|
||||
```
|
||||
|
||||
### Sync Context Manager
|
||||
|
||||
The context manager calls `start()` on entry and `stop()` (with flush) on exit:
|
||||
|
||||
```python
|
||||
with TelemetryClient(config) as client:
|
||||
client.track(event)
|
||||
# Automatically flushed and stopped here
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Async Usage (asyncio)
|
||||
|
||||
For asyncio-based applications (FastAPI, aiohttp, etc.). The SDK creates an asyncio task that periodically flushes events.
|
||||
|
||||
```python
|
||||
import asyncio
|
||||
from mosaicstack_telemetry import TelemetryClient, TelemetryConfig
|
||||
|
||||
async def main():
|
||||
config = TelemetryConfig(
|
||||
server_url="https://tel-api.mosaicstack.dev",
|
||||
api_key="your-64-char-hex-api-key-here...",
|
||||
instance_id="a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d",
|
||||
)
|
||||
|
||||
client = TelemetryClient(config)
|
||||
await client.start_async() # Starts asyncio background task
|
||||
|
||||
# track() is always synchronous and non-blocking
|
||||
client.track(event)
|
||||
|
||||
await client.stop_async() # Flushes remaining events
|
||||
|
||||
asyncio.run(main())
|
||||
```
|
||||
|
||||
### Async Context Manager
|
||||
|
||||
```python
|
||||
async with TelemetryClient(config) as client:
|
||||
client.track(event)
|
||||
# Automatically flushed and stopped here
|
||||
```
|
||||
|
||||
### Key Difference: Sync vs Async
|
||||
|
||||
| Aspect | Sync | Async |
|
||||
|--------|------|-------|
|
||||
| Start | `client.start()` | `await client.start_async()` |
|
||||
| Stop | `client.stop()` | `await client.stop_async()` |
|
||||
| Context manager | `with TelemetryClient(config)` | `async with TelemetryClient(config)` |
|
||||
| Background mechanism | `threading.Timer` (daemon thread) | `asyncio.Task` |
|
||||
| `track()` | Always synchronous | Always synchronous |
|
||||
| `refresh_predictions` | `refresh_predictions_sync(queries)` | `await refresh_predictions(queries)` |
|
||||
|
||||
The `track()` method is always synchronous regardless of which mode you use. It simply appends to a thread-safe in-memory queue and returns immediately. The background submitter handles batching and sending.
|
||||
|
||||
---
|
||||
|
||||
## Integration Examples
|
||||
|
||||
### Example 1: Instrumenting a FastAPI Service
|
||||
|
||||
```python
|
||||
import os
|
||||
import time
|
||||
from contextlib import asynccontextmanager
|
||||
from uuid import UUID
|
||||
|
||||
from fastapi import FastAPI
|
||||
|
||||
from mosaicstack_telemetry import (
|
||||
Complexity,
|
||||
EventBuilder,
|
||||
Harness,
|
||||
Outcome,
|
||||
Provider,
|
||||
QualityGate,
|
||||
TaskType,
|
||||
TelemetryClient,
|
||||
TelemetryConfig,
|
||||
)
|
||||
|
||||
# Initialize telemetry once at startup
|
||||
config = TelemetryConfig(
|
||||
server_url=os.environ.get("MOSAIC_TELEMETRY_SERVER_URL", "https://tel-api.mosaicstack.dev"),
|
||||
api_key=os.environ["MOSAIC_TELEMETRY_API_KEY"],
|
||||
instance_id=os.environ["MOSAIC_TELEMETRY_INSTANCE_ID"],
|
||||
)
|
||||
|
||||
telemetry = TelemetryClient(config)
|
||||
|
||||
|
||||
@asynccontextmanager
|
||||
async def lifespan(app: FastAPI):
|
||||
"""Start telemetry on app startup, flush on shutdown."""
|
||||
await telemetry.start_async()
|
||||
yield
|
||||
await telemetry.stop_async()
|
||||
|
||||
|
||||
app = FastAPI(lifespan=lifespan)
|
||||
|
||||
|
||||
@app.post("/tasks/complete")
|
||||
async def complete_task(
|
||||
task_type: str,
|
||||
model: str,
|
||||
provider: str,
|
||||
complexity: str,
|
||||
actual_input_tokens: int,
|
||||
actual_output_tokens: int,
|
||||
actual_cost_usd_micros: int,
|
||||
duration_ms: int,
|
||||
outcome: str,
|
||||
):
|
||||
"""Record a completed AI coding task."""
|
||||
event = (
|
||||
EventBuilder(instance_id=config.instance_id)
|
||||
.task_type(TaskType(task_type))
|
||||
.model(model)
|
||||
.provider(Provider(provider))
|
||||
.harness_type(Harness.CLAUDE_CODE)
|
||||
.complexity_level(Complexity(complexity))
|
||||
.outcome_value(Outcome(outcome))
|
||||
.duration_ms(duration_ms)
|
||||
.tokens(
|
||||
estimated_in=0,
|
||||
estimated_out=0,
|
||||
actual_in=actual_input_tokens,
|
||||
actual_out=actual_output_tokens,
|
||||
)
|
||||
.cost(estimated=0, actual=actual_cost_usd_micros)
|
||||
.quality(passed=outcome == "success", gates_run=[QualityGate.BUILD, QualityGate.TEST])
|
||||
.context(compactions=0, rotations=0, utilization=0.0)
|
||||
.build()
|
||||
)
|
||||
|
||||
telemetry.track(event) # Non-blocking, never throws
|
||||
return {"status": "tracked"}
|
||||
```
|
||||
|
||||
### Example 2: Instrumenting a Generic Python App
|
||||
|
||||
```python
|
||||
"""
|
||||
Generic Python script that tracks AI coding tasks.
|
||||
Suitable for CLI tools, batch processors, or any non-async application.
|
||||
"""
|
||||
|
||||
import os
|
||||
import time
|
||||
|
||||
from mosaicstack_telemetry import (
|
||||
Complexity,
|
||||
EventBuilder,
|
||||
Harness,
|
||||
Outcome,
|
||||
Provider,
|
||||
QualityGate,
|
||||
RepoSizeCategory,
|
||||
TaskType,
|
||||
TelemetryClient,
|
||||
TelemetryConfig,
|
||||
)
|
||||
|
||||
|
||||
def run_coding_task() -> dict:
|
||||
"""Simulate an AI coding task. Returns task metrics."""
|
||||
start = time.monotonic()
|
||||
|
||||
# ... your AI coding logic here ...
|
||||
|
||||
elapsed_ms = int((time.monotonic() - start) * 1000)
|
||||
return {
|
||||
"duration_ms": elapsed_ms,
|
||||
"actual_input_tokens": 4500,
|
||||
"actual_output_tokens": 1800,
|
||||
"actual_cost_usd_micros": 48000,
|
||||
"outcome": "success",
|
||||
"quality_gates_passed": True,
|
||||
}
|
||||
|
||||
|
||||
def main():
|
||||
config = TelemetryConfig() # Reads from MOSAIC_TELEMETRY_* env vars
|
||||
|
||||
with TelemetryClient(config) as client:
|
||||
result = run_coding_task()
|
||||
|
||||
event = (
|
||||
EventBuilder(instance_id=config.instance_id)
|
||||
.task_type(TaskType.IMPLEMENTATION)
|
||||
.model("claude-sonnet-4-5-20250929")
|
||||
.provider(Provider.ANTHROPIC)
|
||||
.harness_type(Harness.AIDER)
|
||||
.complexity_level(Complexity.MEDIUM)
|
||||
.outcome_value(Outcome(result["outcome"]))
|
||||
.duration_ms(result["duration_ms"])
|
||||
.tokens(
|
||||
estimated_in=5000,
|
||||
estimated_out=2000,
|
||||
actual_in=result["actual_input_tokens"],
|
||||
actual_out=result["actual_output_tokens"],
|
||||
)
|
||||
.cost(estimated=50000, actual=result["actual_cost_usd_micros"])
|
||||
.quality(
|
||||
passed=result["quality_gates_passed"],
|
||||
gates_run=[QualityGate.BUILD, QualityGate.LINT, QualityGate.TEST],
|
||||
)
|
||||
.context(compactions=0, rotations=0, utilization=0.35)
|
||||
.language("python")
|
||||
.repo_size(RepoSizeCategory.MEDIUM)
|
||||
.build()
|
||||
)
|
||||
|
||||
client.track(event)
|
||||
print(f"Tracked task: {event.event_id}")
|
||||
|
||||
# Client is automatically flushed and stopped after the `with` block
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Building Events
|
||||
|
||||
The `EventBuilder` provides a fluent API for constructing `TaskCompletionEvent` objects with sensible defaults. All setter methods return the builder instance for chaining.
|
||||
|
||||
### Required Fields
|
||||
|
||||
Every event requires these fields to be set (either via builder methods or from defaults):
|
||||
|
||||
| Builder Method | Sets Field | Default |
|
||||
|----------------|-----------|---------|
|
||||
| `EventBuilder(instance_id=...)` | `instance_id` | (required) |
|
||||
| `.task_type(TaskType.X)` | `task_type` | `unknown` |
|
||||
| `.model("model-name")` | `model` | `"unknown"` |
|
||||
| `.provider(Provider.X)` | `provider` | `unknown` |
|
||||
| `.harness_type(Harness.X)` | `harness` | `unknown` |
|
||||
| `.complexity_level(Complexity.X)` | `complexity` | `medium` |
|
||||
| `.outcome_value(Outcome.X)` | `outcome` | `failure` |
|
||||
| `.duration_ms(N)` | `task_duration_ms` | `0` |
|
||||
| `.tokens(...)` | token fields | all `0` |
|
||||
| `.cost(...)` | cost fields | all `0` |
|
||||
| `.quality(...)` | quality fields | `passed=False, gates_run=[], gates_failed=[]` |
|
||||
| `.context(...)` | context fields | all `0` / `0.0` |
|
||||
|
||||
### Auto-Generated Fields
|
||||
|
||||
| Field | Auto-generated Value |
|
||||
|-------|---------------------|
|
||||
| `event_id` | Random UUID (override with `.event_id(uuid)`) |
|
||||
| `timestamp` | Current UTC time (override with `.timestamp(dt)`) |
|
||||
| `schema_version` | `"1.0"` (set automatically by `TaskCompletionEvent`) |
|
||||
|
||||
### Optional Fields
|
||||
|
||||
| Builder Method | Sets Field | Default |
|
||||
|----------------|-----------|---------|
|
||||
| `.language("python")` | `language` | `None` |
|
||||
| `.repo_size(RepoSizeCategory.MEDIUM)` | `repo_size_category` | `None` |
|
||||
| `.retry_count(N)` | `retry_count` | `0` |
|
||||
|
||||
### Token and Cost Values
|
||||
|
||||
Costs are expressed in **microdollars** (1 USD = 1,000,000 microdollars). This avoids floating-point precision issues.
|
||||
|
||||
```python
|
||||
event = (
|
||||
EventBuilder(instance_id=config.instance_id)
|
||||
# ... other fields ...
|
||||
.tokens(
|
||||
estimated_in=105000, # Pre-task estimate: input tokens
|
||||
estimated_out=45000, # Pre-task estimate: output tokens
|
||||
actual_in=112340, # Actual input tokens consumed
|
||||
actual_out=38760, # Actual output tokens generated
|
||||
)
|
||||
.cost(
|
||||
estimated=630000, # $0.63 in microdollars
|
||||
actual=919200, # $0.92 in microdollars
|
||||
)
|
||||
.build()
|
||||
)
|
||||
```
|
||||
|
||||
### Quality Gates
|
||||
|
||||
Record which quality gates were executed and their results:
|
||||
|
||||
```python
|
||||
event = (
|
||||
EventBuilder(instance_id=config.instance_id)
|
||||
# ... other fields ...
|
||||
.quality(
|
||||
passed=False,
|
||||
gates_run=[QualityGate.BUILD, QualityGate.LINT, QualityGate.TEST, QualityGate.COVERAGE],
|
||||
gates_failed=[QualityGate.COVERAGE],
|
||||
)
|
||||
.build()
|
||||
)
|
||||
```
|
||||
|
||||
Available gates: `BUILD`, `LINT`, `TEST`, `COVERAGE`, `TYPECHECK`, `SECURITY`.
|
||||
|
||||
---
|
||||
|
||||
## Querying Predictions
|
||||
|
||||
The SDK can query crowd-sourced predictions from the telemetry server. Predictions provide percentile distributions for token usage, cost, duration, and quality metrics based on aggregated data from all participating instances.
|
||||
|
||||
### Fetching Predictions
|
||||
|
||||
```python
|
||||
from mosaicstack_telemetry import PredictionQuery, TaskType, Provider, Complexity
|
||||
|
||||
queries = [
|
||||
PredictionQuery(
|
||||
task_type=TaskType.IMPLEMENTATION,
|
||||
model="claude-sonnet-4-5-20250929",
|
||||
provider=Provider.ANTHROPIC,
|
||||
complexity=Complexity.MEDIUM,
|
||||
),
|
||||
PredictionQuery(
|
||||
task_type=TaskType.TESTING,
|
||||
model="claude-haiku-4-5-20251001",
|
||||
provider=Provider.ANTHROPIC,
|
||||
complexity=Complexity.LOW,
|
||||
),
|
||||
]
|
||||
|
||||
# Async
|
||||
await client.refresh_predictions(queries)
|
||||
|
||||
# Sync
|
||||
client.refresh_predictions_sync(queries)
|
||||
```
|
||||
|
||||
### Reading from Cache
|
||||
|
||||
Predictions are stored in a TTL-based in-memory cache (default: 6 hours):
|
||||
|
||||
```python
|
||||
prediction = client.get_prediction(queries[0])
|
||||
|
||||
if prediction and prediction.prediction:
|
||||
data = prediction.prediction
|
||||
print(f"Input tokens (median): {data.input_tokens.median}")
|
||||
print(f"Input tokens (p90): {data.input_tokens.p90}")
|
||||
print(f"Output tokens (median): {data.output_tokens.median}")
|
||||
print(f"Cost (median): ${data.cost_usd_micros['median'] / 1_000_000:.4f}")
|
||||
print(f"Duration (median): {data.duration_ms['median'] / 1000:.1f}s")
|
||||
print(f"Correction factor (input): {data.correction_factors.input:.2f}")
|
||||
print(f"Quality gate pass rate: {data.quality.gate_pass_rate:.0%}")
|
||||
print(f"Success rate: {data.quality.success_rate:.0%}")
|
||||
|
||||
meta = prediction.metadata
|
||||
print(f"Sample size: {meta.sample_size}")
|
||||
print(f"Confidence: {meta.confidence}")
|
||||
if meta.fallback_note:
|
||||
print(f"Note: {meta.fallback_note}")
|
||||
else:
|
||||
print("No prediction data available for this combination")
|
||||
```
|
||||
|
||||
### Prediction Confidence Levels
|
||||
|
||||
| Level | Meaning |
|
||||
|-------|---------|
|
||||
| `high` | 100+ samples, exact dimension match |
|
||||
| `medium` | 30-99 samples, exact dimension match |
|
||||
| `low` | <30 samples or fallback was used |
|
||||
| `none` | No data available; `prediction` is `None` |
|
||||
|
||||
---
|
||||
|
||||
## Error Handling
|
||||
|
||||
### The `track()` Contract
|
||||
|
||||
**`track()` never throws and never blocks the caller.** If anything goes wrong (queue full, telemetry disabled, unexpected error), the event is silently dropped and the error is logged. This ensures telemetry instrumentation never affects your application's behavior.
|
||||
|
||||
```python
|
||||
# This is always safe, even if telemetry is misconfigured
|
||||
client.track(event)
|
||||
```
|
||||
|
||||
### Queue Overflow
|
||||
|
||||
When the in-memory queue reaches `max_queue_size` (default 1000), the oldest events are evicted to make room for new ones. A warning is logged when this happens.
|
||||
|
||||
### Submission Retries
|
||||
|
||||
The background submitter retries transient failures with exponential backoff and jitter:
|
||||
|
||||
- **429 Too Many Requests**: Honors the server's `Retry-After` header
|
||||
- **Timeouts**: Retried with backoff
|
||||
- **Network errors**: Retried with backoff
|
||||
- **403 Forbidden**: Not retried (configuration error)
|
||||
|
||||
Failed batches are re-queued for the next submission cycle (up to queue capacity).
|
||||
|
||||
### Logging
|
||||
|
||||
All SDK logging uses the `mosaicstack_telemetry` logger. Enable it to see submission activity:
|
||||
|
||||
```python
|
||||
import logging
|
||||
|
||||
logging.basicConfig(level=logging.DEBUG)
|
||||
# Or target the SDK logger specifically:
|
||||
logging.getLogger("mosaicstack_telemetry").setLevel(logging.DEBUG)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Dry-Run Mode
|
||||
|
||||
Test your integration without sending data to the server:
|
||||
|
||||
```python
|
||||
config = TelemetryConfig(
|
||||
server_url="https://tel-api.mosaicstack.dev",
|
||||
api_key="a" * 64,
|
||||
instance_id="12345678-1234-1234-1234-123456789abc",
|
||||
dry_run=True,
|
||||
)
|
||||
|
||||
with TelemetryClient(config) as client:
|
||||
client.track(event)
|
||||
# Logs: "[DRY RUN] Would submit batch of 1 events to ..."
|
||||
```
|
||||
|
||||
## Disabling Telemetry
|
||||
|
||||
Set `enabled=False` or the environment variable `MOSAIC_TELEMETRY_ENABLED=false`:
|
||||
|
||||
```python
|
||||
config = TelemetryConfig(enabled=False)
|
||||
|
||||
with TelemetryClient(config) as client:
|
||||
client.track(event) # Silently dropped, no background thread started
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## API Compatibility
|
||||
|
||||
| SDK Version | Telemetry API | Event Schema | Notes |
|
||||
|-------------|---------------|--------------|-------|
|
||||
| 0.1.x | v1 (`/v1/*`) | 1.0 | Current release |
|
||||
|
||||
The SDK submits events to `POST /v1/events/batch` and queries predictions from `POST /v1/predictions/batch`. These are the only two server endpoints the SDK communicates with.
|
||||
|
||||
For the full server API documentation, see the [Mosaic Telemetry API Reference](https://github.com/mosaicstack/telemetry).
|
||||
Reference in New Issue
Block a user