merge: resolve conflicts with develop (M10-Telemetry + M12-MatrixBridge)
All checks were successful
All checks were successful
Merge origin/develop into feature/m13-speech-services to incorporate M10-Telemetry and M12-MatrixBridge changes. Resolved 4 conflicts: - .env.example: Added speech config alongside telemetry + matrix config - Makefile: Added speech targets alongside matrix targets - app.module.ts: Import both MosaicTelemetryModule and SpeechModule - docs/tasks.md: Combined all milestone task tracking sections Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
537
docs/MATRIX-BRIDGE.md
Normal file
537
docs/MATRIX-BRIDGE.md
Normal file
@@ -0,0 +1,537 @@
|
||||
# Matrix Bridge
|
||||
|
||||
Integration between Mosaic Stack and the Matrix protocol, enabling workspace management
|
||||
and job orchestration through Matrix chat rooms.
|
||||
|
||||
## Overview
|
||||
|
||||
The Matrix bridge connects Mosaic Stack to any Matrix homeserver (Synapse, Dendrite, Conduit,
|
||||
etc.), allowing users to interact with the platform through Matrix clients like Element,
|
||||
FluffyChat, or any other Matrix-compatible application.
|
||||
|
||||
Key capabilities:
|
||||
|
||||
- **Command interface** -- Issue bot commands (`@mosaic fix #42`) from any mapped Matrix room
|
||||
- **Workspace-room mapping** -- Each Mosaic workspace can be linked to a Matrix room
|
||||
- **Threaded job updates** -- Job progress is posted to MSC3440 threads, keeping rooms clean
|
||||
- **Streaming AI responses** -- LLM output streams to Matrix via rate-limited message edits
|
||||
- **Multi-provider broadcasting** -- HeraldService broadcasts status updates to all active
|
||||
chat providers (Discord and Matrix can run simultaneously)
|
||||
|
||||
### Architecture
|
||||
|
||||
```
|
||||
Matrix Client (Element, FluffyChat, etc.)
|
||||
|
|
||||
v
|
||||
Synapse Homeserver
|
||||
|
|
||||
matrix-bot-sdk
|
||||
|
|
||||
v
|
||||
+------------------+ +---------------------+
|
||||
| MatrixService |<----->| CommandParserService |
|
||||
| (IChatProvider) | | (shared, all platforms)
|
||||
+------------------+ +---------------------+
|
||||
| |
|
||||
| v
|
||||
| +--------------------+
|
||||
| | MatrixRoomService | workspace <-> room mapping
|
||||
| +--------------------+
|
||||
| |
|
||||
v v
|
||||
+------------------+ +----------------+
|
||||
| StitcherService | | PrismaService |
|
||||
| (job dispatch) | | (database) |
|
||||
+------------------+ +----------------+
|
||||
|
|
||||
v
|
||||
+------------------+
|
||||
| HeraldService | broadcasts to CHAT_PROVIDERS[]
|
||||
+------------------+
|
||||
|
|
||||
v
|
||||
+---------------------------+
|
||||
| MatrixStreamingService | streaming AI responses
|
||||
| (m.replace edits, typing) |
|
||||
+---------------------------+
|
||||
```
|
||||
|
||||
## Quick Start
|
||||
|
||||
### 1. Start the dev environment
|
||||
|
||||
The Matrix dev environment uses a Docker Compose overlay that adds Synapse and Element Web
|
||||
alongside the existing Mosaic Stack services.
|
||||
|
||||
```bash
|
||||
# Using Makefile (recommended)
|
||||
make matrix-up
|
||||
|
||||
# Or manually
|
||||
docker compose -f docker/docker-compose.yml -f docker/docker-compose.matrix.yml up -d
|
||||
```
|
||||
|
||||
This starts:
|
||||
|
||||
| Service | URL | Purpose |
|
||||
| ----------- | --------------------- | ----------------------- |
|
||||
| Synapse | http://localhost:8008 | Matrix homeserver |
|
||||
| Element Web | http://localhost:8501 | Web-based Matrix client |
|
||||
|
||||
Both services share the existing Mosaic PostgreSQL instance. A `synapse-db-init` container
|
||||
runs once to create the `synapse` database and user, then exits.
|
||||
|
||||
### 2. Create the bot account
|
||||
|
||||
After Synapse is healthy, run the setup script to create admin and bot accounts:
|
||||
|
||||
```bash
|
||||
make matrix-setup-bot
|
||||
|
||||
# Or directly
|
||||
docker/matrix/scripts/setup-bot.sh
|
||||
```
|
||||
|
||||
The script:
|
||||
|
||||
1. Registers an admin account (`admin` / `admin-dev-password`)
|
||||
2. Obtains an admin access token
|
||||
3. Creates the bot account (`mosaic-bot` / `mosaic-bot-dev-password`)
|
||||
4. Retrieves the bot access token
|
||||
5. Prints the environment variables to add to `.env`
|
||||
|
||||
Custom credentials can be passed:
|
||||
|
||||
```bash
|
||||
docker/matrix/scripts/setup-bot.sh \
|
||||
--username custom-bot \
|
||||
--password custom-pass \
|
||||
--admin-username myadmin \
|
||||
--admin-password myadmin-pass
|
||||
```
|
||||
|
||||
### 3. Configure environment variables
|
||||
|
||||
Copy the output from the setup script into your `.env` file:
|
||||
|
||||
```bash
|
||||
# Matrix Bridge Configuration
|
||||
MATRIX_HOMESERVER_URL=http://localhost:8008
|
||||
MATRIX_ACCESS_TOKEN=<token from setup-bot.sh>
|
||||
MATRIX_BOT_USER_ID=@mosaic-bot:localhost
|
||||
MATRIX_CONTROL_ROOM_ID=!roomid:localhost
|
||||
MATRIX_WORKSPACE_ID=<your-workspace-uuid>
|
||||
```
|
||||
|
||||
If running the API inside the Docker Compose network, use the internal hostname:
|
||||
|
||||
```bash
|
||||
MATRIX_HOMESERVER_URL=http://synapse:8008
|
||||
```
|
||||
|
||||
### 4. Restart the API
|
||||
|
||||
```bash
|
||||
pnpm dev:api
|
||||
# or
|
||||
make docker-restart
|
||||
```
|
||||
|
||||
The BridgeModule will detect `MATRIX_ACCESS_TOKEN` and enable the Matrix bridge
|
||||
automatically.
|
||||
|
||||
### 5. Test in Element Web
|
||||
|
||||
1. Open http://localhost:8501
|
||||
2. Register or log in with any account
|
||||
3. Create a room and invite `@mosaic-bot:localhost`
|
||||
4. Send `@mosaic help` or `!mosaic help`
|
||||
|
||||
## Configuration
|
||||
|
||||
### Environment Variables
|
||||
|
||||
| Variable | Description | Example |
|
||||
| ------------------------ | --------------------------------------------- | ----------------------------- |
|
||||
| `MATRIX_HOMESERVER_URL` | Matrix server URL | `http://localhost:8008` |
|
||||
| `MATRIX_ACCESS_TOKEN` | Bot access token (from setup script or login) | `syt_bW9z...` |
|
||||
| `MATRIX_BOT_USER_ID` | Bot's full Matrix user ID | `@mosaic-bot:localhost` |
|
||||
| `MATRIX_CONTROL_ROOM_ID` | Default room for status broadcasts | `!abcdef:localhost` |
|
||||
| `MATRIX_WORKSPACE_ID` | Default workspace UUID for the control room | `550e8400-e29b-41d4-a716-...` |
|
||||
|
||||
All variables are read from `process.env` at service construction time. The bridge activates
|
||||
only when `MATRIX_ACCESS_TOKEN` is set.
|
||||
|
||||
### Dev Environment Variables (docker-compose.matrix.yml)
|
||||
|
||||
These configure the local Synapse and Element Web instances:
|
||||
|
||||
| Variable | Default | Purpose |
|
||||
| --------------------------- | ---------------------- | ------------------------- |
|
||||
| `SYNAPSE_POSTGRES_DB` | `synapse` | Synapse database name |
|
||||
| `SYNAPSE_POSTGRES_USER` | `synapse` | Synapse database user |
|
||||
| `SYNAPSE_POSTGRES_PASSWORD` | `synapse_dev_password` | Synapse database password |
|
||||
| `SYNAPSE_CLIENT_PORT` | `8008` | Synapse client API port |
|
||||
| `SYNAPSE_FEDERATION_PORT` | `8448` | Synapse federation port |
|
||||
| `ELEMENT_PORT` | `8501` | Element Web port |
|
||||
|
||||
## Architecture
|
||||
|
||||
### Service Responsibilities
|
||||
|
||||
**MatrixService** (`apps/api/src/bridge/matrix/matrix.service.ts`)
|
||||
|
||||
The primary Matrix integration. Implements the `IChatProvider` interface.
|
||||
|
||||
- Connects to the homeserver using `matrix-bot-sdk`
|
||||
- Listens for `room.message` events in all joined rooms
|
||||
- Resolves workspace context via MatrixRoomService (or falls back to control room)
|
||||
- Normalizes `!mosaic` prefix to `@mosaic` for the shared CommandParserService
|
||||
- Dispatches parsed commands to StitcherService for job execution
|
||||
- Creates MSC3440 threads for job updates
|
||||
- Auto-joins rooms when invited (`AutojoinRoomsMixin`)
|
||||
|
||||
**MatrixRoomService** (`apps/api/src/bridge/matrix/matrix-room.service.ts`)
|
||||
|
||||
Manages the mapping between Mosaic workspaces and Matrix rooms.
|
||||
|
||||
- **Provision**: Creates a private Matrix room named `Mosaic: {workspace_name}` with alias
|
||||
`#mosaic-{slug}:{server}`
|
||||
- **Link/Unlink**: Maps existing rooms to workspaces via `workspace.matrixRoomId`
|
||||
- **Lookup**: Forward lookup (workspace -> room) and reverse lookup (room -> workspace)
|
||||
- Room mappings are stored in the `workspace` table's `matrixRoomId` column
|
||||
|
||||
**MatrixStreamingService** (`apps/api/src/bridge/matrix/matrix-streaming.service.ts`)
|
||||
|
||||
Streams AI responses to Matrix rooms using incremental message edits.
|
||||
|
||||
- Sends an initial "Thinking..." placeholder message
|
||||
- Activates typing indicator during generation
|
||||
- Buffers incoming tokens and edits the message every 500ms (rate-limited)
|
||||
- On completion, sends a final clean edit with optional token usage stats
|
||||
- On error, edits the message with an error notice
|
||||
- Supports threaded responses via MSC3440
|
||||
|
||||
**CommandParserService** (`apps/api/src/bridge/parser/command-parser.service.ts`)
|
||||
|
||||
Shared, platform-agnostic command parser used by both Discord and Matrix bridges.
|
||||
|
||||
- Parses `@mosaic <action> [args]` commands
|
||||
- Supports issue references in multiple formats: `#42`, `owner/repo#42`, full URL
|
||||
- Returns typed `ParsedCommand` objects or structured parse errors with help text
|
||||
|
||||
**BridgeModule** (`apps/api/src/bridge/bridge.module.ts`)
|
||||
|
||||
Conditional module loader. Inspects environment variables at startup:
|
||||
|
||||
- If `DISCORD_BOT_TOKEN` is set, Discord bridge is added to `CHAT_PROVIDERS`
|
||||
- If `MATRIX_ACCESS_TOKEN` is set, Matrix bridge is added to `CHAT_PROVIDERS`
|
||||
- Both can run simultaneously; neither is a dependency of the other
|
||||
|
||||
**HeraldService** (`apps/api/src/herald/herald.service.ts`)
|
||||
|
||||
Status broadcaster that sends job event updates to all active chat providers.
|
||||
|
||||
- Iterates over the `CHAT_PROVIDERS` injection token
|
||||
- Sends thread messages for job lifecycle events (created, started, completed, failed, etc.)
|
||||
- Uses PDA-friendly language (no "OVERDUE", "URGENT", etc.)
|
||||
- If one provider fails, others still receive the broadcast
|
||||
|
||||
### Data Flow
|
||||
|
||||
```
|
||||
1. User sends "@mosaic fix #42" in a Matrix room
|
||||
2. MatrixService receives room.message event
|
||||
3. MatrixRoomService resolves room -> workspace mapping
|
||||
4. CommandParserService parses the command (action=FIX, issue=#42)
|
||||
5. MatrixService creates a thread (MSC3440) for job updates
|
||||
6. StitcherService dispatches the job with workspace context
|
||||
7. HeraldService receives job events and broadcasts to all CHAT_PROVIDERS
|
||||
8. Thread messages appear in the Matrix room thread
|
||||
```
|
||||
|
||||
### Thread Model (MSC3440)
|
||||
|
||||
Matrix threads are implemented per [MSC3440](https://github.com/matrix-org/matrix-spec-proposals/pull/3440):
|
||||
|
||||
- A **thread root** is created by sending a regular `m.room.message` event
|
||||
- Subsequent messages reference the root via `m.relates_to` with `rel_type: "m.thread"`
|
||||
- The `is_falling_back: true` flag and `m.in_reply_to` provide compatibility with clients
|
||||
that do not support threads
|
||||
- Thread root event IDs are stored in job metadata for HeraldService to post updates
|
||||
|
||||
## Commands
|
||||
|
||||
All commands accept either `@mosaic` or `!mosaic` prefix. The `!mosaic` form is
|
||||
normalized to `@mosaic` internally before parsing.
|
||||
|
||||
| Command | Description | Example |
|
||||
| -------------------------- | ----------------------------- | ---------------------------- |
|
||||
| `@mosaic fix <issue>` | Start a job for an issue | `@mosaic fix #42` |
|
||||
| `@mosaic status <job-id>` | Check job status | `@mosaic status job-abc123` |
|
||||
| `@mosaic cancel <job-id>` | Cancel a running job | `@mosaic cancel job-abc123` |
|
||||
| `@mosaic retry <job-id>` | Retry a failed job | `@mosaic retry job-abc123` |
|
||||
| `@mosaic verbose <job-id>` | Stream full logs to thread | `@mosaic verbose job-abc123` |
|
||||
| `@mosaic quiet` | Reduce notification verbosity | `@mosaic quiet` |
|
||||
| `@mosaic help` | Show available commands | `@mosaic help` |
|
||||
|
||||
### Issue Reference Formats
|
||||
|
||||
The `fix` command accepts issue references in multiple formats:
|
||||
|
||||
```
|
||||
@mosaic fix #42 # Current repo
|
||||
@mosaic fix owner/repo#42 # Cross-repo
|
||||
@mosaic fix https://git.example.com/o/r/issues/42 # Full URL
|
||||
```
|
||||
|
||||
### Noise Management
|
||||
|
||||
Job updates are scoped to threads to keep main rooms clean:
|
||||
|
||||
- **Main room**: Low verbosity -- milestone completions only
|
||||
- **Job threads**: Medium verbosity -- step completions and status changes
|
||||
- **DMs**: Configurable per user (planned)
|
||||
|
||||
## Workspace-Room Mapping
|
||||
|
||||
Each Mosaic workspace can be associated with one Matrix room. The mapping is stored in the
|
||||
`workspace` table's `matrixRoomId` column.
|
||||
|
||||
### Automatic Provisioning
|
||||
|
||||
When a workspace needs a Matrix room, MatrixRoomService provisions one:
|
||||
|
||||
```
|
||||
Room name: "Mosaic: My Workspace"
|
||||
Room alias: #mosaic-my-workspace:localhost
|
||||
Visibility: private
|
||||
```
|
||||
|
||||
The room ID is then stored in `workspace.matrixRoomId`.
|
||||
|
||||
### Manual Linking
|
||||
|
||||
Existing rooms can be linked to workspaces:
|
||||
|
||||
```typescript
|
||||
await matrixRoomService.linkWorkspaceToRoom(workspaceId, "!roomid:localhost");
|
||||
```
|
||||
|
||||
And unlinked:
|
||||
|
||||
```typescript
|
||||
await matrixRoomService.unlinkWorkspace(workspaceId);
|
||||
```
|
||||
|
||||
### Message Routing
|
||||
|
||||
When a message arrives in a room:
|
||||
|
||||
1. MatrixRoomService performs a reverse lookup: room ID -> workspace ID
|
||||
2. If no mapping is found, the service checks if the room is the configured control room
|
||||
(`MATRIX_CONTROL_ROOM_ID`) and uses `MATRIX_WORKSPACE_ID` as fallback
|
||||
3. If still unmapped, the message is ignored
|
||||
|
||||
This ensures commands only execute within a valid workspace context.
|
||||
|
||||
## Streaming Responses
|
||||
|
||||
MatrixStreamingService enables real-time AI response streaming in Matrix rooms.
|
||||
|
||||
### How It Works
|
||||
|
||||
1. An initial placeholder message ("Thinking...") is sent to the room
|
||||
2. The bot's typing indicator is activated
|
||||
3. Tokens from the LLM arrive via an `AsyncIterable<string>`
|
||||
4. Tokens are buffered and the message is edited via `m.replace` events
|
||||
5. Edits are rate-limited to a maximum of once every **500ms** to avoid flooding the
|
||||
homeserver
|
||||
6. When streaming completes, a final clean edit is sent and the typing indicator clears
|
||||
7. On error, the message is edited to include an error notice
|
||||
|
||||
### Message Edit Format (m.replace)
|
||||
|
||||
```json
|
||||
{
|
||||
"m.new_content": {
|
||||
"msgtype": "m.text",
|
||||
"body": "Updated response text"
|
||||
},
|
||||
"m.relates_to": {
|
||||
"rel_type": "m.replace",
|
||||
"event_id": "$original_event_id"
|
||||
},
|
||||
"msgtype": "m.text",
|
||||
"body": "* Updated response text"
|
||||
}
|
||||
```
|
||||
|
||||
The top-level `body` prefixed with `*` serves as a fallback for clients that do not
|
||||
support message edits.
|
||||
|
||||
### Thread Support
|
||||
|
||||
Streaming responses can target a specific thread by passing `threadId` in the options.
|
||||
The initial message and all edits will include the `m.thread` relation.
|
||||
|
||||
## Development
|
||||
|
||||
### Running Tests
|
||||
|
||||
```bash
|
||||
# All bridge tests
|
||||
pnpm test -- --filter @mosaic/api -- matrix
|
||||
|
||||
# Individual service tests
|
||||
pnpm test -- --filter @mosaic/api -- matrix.service
|
||||
pnpm test -- --filter @mosaic/api -- matrix-room.service
|
||||
pnpm test -- --filter @mosaic/api -- matrix-streaming.service
|
||||
pnpm test -- --filter @mosaic/api -- command-parser
|
||||
pnpm test -- --filter @mosaic/api -- bridge.module
|
||||
```
|
||||
|
||||
### Adding a New Command
|
||||
|
||||
1. Add the action to the `CommandAction` enum in
|
||||
`apps/api/src/bridge/parser/command.interface.ts`
|
||||
|
||||
2. Add parsing logic in `CommandParserService.parseActionArguments()`
|
||||
(`apps/api/src/bridge/parser/command-parser.service.ts`)
|
||||
|
||||
3. Add the handler case in `MatrixService.handleParsedCommand()`
|
||||
(`apps/api/src/bridge/matrix/matrix.service.ts`)
|
||||
|
||||
4. Implement the handler method (e.g., `handleNewCommand()`)
|
||||
|
||||
5. Update the help text in `MatrixService.handleHelpCommand()`
|
||||
|
||||
6. Add tests for the new command in both the parser and service spec files
|
||||
|
||||
### Extending the Bridge
|
||||
|
||||
The `IChatProvider` interface (`apps/api/src/bridge/interfaces/chat-provider.interface.ts`)
|
||||
defines the contract all chat bridges implement:
|
||||
|
||||
```typescript
|
||||
interface IChatProvider {
|
||||
connect(): Promise<void>;
|
||||
disconnect(): Promise<void>;
|
||||
isConnected(): boolean;
|
||||
sendMessage(channelId: string, content: string): Promise<void>;
|
||||
createThread(options: ThreadCreateOptions): Promise<string>;
|
||||
sendThreadMessage(options: ThreadMessageOptions): Promise<void>;
|
||||
parseCommand(message: ChatMessage): ChatCommand | null;
|
||||
editMessage?(channelId: string, messageId: string, content: string): Promise<void>;
|
||||
}
|
||||
```
|
||||
|
||||
To add a new chat platform:
|
||||
|
||||
1. Create a new service implementing `IChatProvider`
|
||||
2. Register it in `BridgeModule` with a conditional check on its environment variable
|
||||
3. Add it to the `CHAT_PROVIDERS` factory
|
||||
4. HeraldService will automatically broadcast to it with no further changes
|
||||
|
||||
### File Layout
|
||||
|
||||
```
|
||||
apps/api/src/
|
||||
bridge/
|
||||
bridge.module.ts # Conditional module loader
|
||||
bridge.constants.ts # CHAT_PROVIDERS injection token
|
||||
interfaces/
|
||||
chat-provider.interface.ts # IChatProvider contract
|
||||
index.ts
|
||||
parser/
|
||||
command-parser.service.ts # Shared command parser
|
||||
command-parser.spec.ts
|
||||
command.interface.ts # Command types and enums
|
||||
matrix/
|
||||
matrix.service.ts # Core Matrix integration
|
||||
matrix.service.spec.ts
|
||||
matrix-room.service.ts # Workspace-room mapping
|
||||
matrix-room.service.spec.ts
|
||||
matrix-streaming.service.ts # Streaming AI responses
|
||||
matrix-streaming.service.spec.ts
|
||||
discord/
|
||||
discord.service.ts # Discord integration (parallel)
|
||||
herald/
|
||||
herald.module.ts
|
||||
herald.service.ts # Status broadcasting
|
||||
herald.service.spec.ts
|
||||
|
||||
docker/
|
||||
docker-compose.matrix.yml # Dev overlay (Synapse + Element)
|
||||
docker-compose.sample.matrix.yml # Production sample (Swarm)
|
||||
matrix/
|
||||
synapse/
|
||||
homeserver.yaml # Dev Synapse config
|
||||
element/
|
||||
config.json # Dev Element Web config
|
||||
scripts/
|
||||
setup-bot.sh # Bot account setup
|
||||
```
|
||||
|
||||
## Deployment
|
||||
|
||||
### Production Considerations
|
||||
|
||||
The dev environment uses relaxed settings that are not suitable for production.
|
||||
Review and address the following before deploying:
|
||||
|
||||
**Synapse Configuration**
|
||||
|
||||
- Set a proper `server_name` (this is permanent and cannot change after first run)
|
||||
- Disable open registration (`enable_registration: false`)
|
||||
- Replace dev secrets (`macaroon_secret_key`, `form_secret`) with strong random values
|
||||
- Configure proper rate limiting (dev config allows 100 msg/sec)
|
||||
- Set up TLS termination (via reverse proxy or Synapse directly)
|
||||
- Consider a dedicated PostgreSQL instance rather than the shared Mosaic database
|
||||
|
||||
**Bot Security**
|
||||
|
||||
- Generate a strong bot password (not the dev default)
|
||||
- Store the access token securely (use a secrets manager or encrypted `.env`)
|
||||
- The bot auto-joins rooms when invited -- consider restricting this in production
|
||||
by removing `AutojoinRoomsMixin` and implementing allow-list logic
|
||||
|
||||
**Environment Variables**
|
||||
|
||||
- `MATRIX_WORKSPACE_ID` should be a valid workspace UUID from your database; all
|
||||
commands from the control room execute within this workspace context
|
||||
|
||||
**Network**
|
||||
|
||||
- If Synapse runs on a separate host, ensure `MATRIX_HOMESERVER_URL` points to the
|
||||
correct endpoint
|
||||
- For federation, configure DNS SRV records and `.well-known` delegation
|
||||
|
||||
### Sample Production Stack
|
||||
|
||||
A production-ready Docker Swarm compose file is provided at
|
||||
`docker/docker-compose.sample.matrix.yml`. It includes:
|
||||
|
||||
- Synapse with Traefik labels for automatic TLS
|
||||
- Element Web with its own domain
|
||||
- Dedicated PostgreSQL instance for Synapse
|
||||
- Optional coturn (TURN/STUN) for voice/video
|
||||
|
||||
Deploy via Portainer or Docker Swarm CLI:
|
||||
|
||||
```bash
|
||||
docker stack deploy -c docker/docker-compose.sample.matrix.yml matrix
|
||||
```
|
||||
|
||||
After deploying, follow the post-deploy steps in the compose file comments to create
|
||||
accounts and configure the Mosaic Stack connection.
|
||||
|
||||
### Makefile Targets
|
||||
|
||||
| Target | Description |
|
||||
| ----------------------- | ----------------------------------------- |
|
||||
| `make matrix-up` | Start Synapse + Element Web (dev overlay) |
|
||||
| `make matrix-down` | Stop Matrix services |
|
||||
| `make matrix-logs` | Follow Synapse and Element logs |
|
||||
| `make matrix-setup-bot` | Run bot account setup script |
|
||||
114
docs/tasks.md
114
docs/tasks.md
@@ -1,4 +1,102 @@
|
||||
# Tasks — M13-SpeechServices (0.0.13)
|
||||
# Tasks
|
||||
|
||||
## M10-Telemetry (0.0.10) — Telemetry Integration
|
||||
|
||||
**Orchestrator:** Claude Code
|
||||
**Started:** 2026-02-15
|
||||
**Branch:** feature/m10-telemetry
|
||||
**Milestone:** M10-Telemetry (0.0.10)
|
||||
|
||||
| id | status | description | issue | repo | branch | depends_on | blocks | agent | started_at | completed_at | estimate | used |
|
||||
| ------- | ------ | ------------------------------------------------------------ | ----- | ----------- | --------------------- | --------------- | ----------------------- | ----- | ----------------- | ----------------- | -------- | ---- |
|
||||
| TEL-001 | done | Install @mosaicstack/telemetry-client in API + NestJS module | #369 | api | feature/m10-telemetry | | TEL-004,TEL-006,TEL-007 | w-1 | 2026-02-15T10:00Z | 2026-02-15T10:37Z | 20K | 25K |
|
||||
| TEL-002 | done | Install mosaicstack-telemetry in Coordinator | #370 | coordinator | feature/m10-telemetry | | TEL-005,TEL-006 | w-2 | 2026-02-15T10:00Z | 2026-02-15T10:34Z | 15K | 20K |
|
||||
| TEL-003 | done | Add telemetry config to docker-compose and .env | #374 | devops | feature/m10-telemetry | | | w-3 | 2026-02-15T10:38Z | 2026-02-15T10:40Z | 8K | 10K |
|
||||
| TEL-004 | done | Track LLM task completions via Mosaic Telemetry | #371 | api | feature/m10-telemetry | TEL-001 | TEL-007 | w-4 | 2026-02-15T10:38Z | 2026-02-15T10:44Z | 25K | 30K |
|
||||
| TEL-005 | done | Track orchestrator agent task completions | #372 | coordinator | feature/m10-telemetry | TEL-002 | | w-5 | 2026-02-15T10:45Z | 2026-02-15T10:52Z | 20K | 25K |
|
||||
| TEL-006 | done | Prediction integration for cost estimation | #373 | api | feature/m10-telemetry | TEL-001,TEL-002 | TEL-007 | w-6 | 2026-02-15T10:45Z | 2026-02-15T10:51Z | 20K | 25K |
|
||||
| TEL-007 | done | Frontend: Token usage and cost dashboard | #375 | web | feature/m10-telemetry | TEL-004,TEL-006 | TEL-008 | w-7 | 2026-02-15T10:53Z | 2026-02-15T11:03Z | 30K | 115K |
|
||||
| TEL-008 | done | Documentation: Telemetry integration guide | #376 | docs | feature/m10-telemetry | TEL-007 | | w-8 | 2026-02-15T10:53Z | 2026-02-15T10:58Z | 15K | 75K |
|
||||
|
||||
---
|
||||
|
||||
## M11-CIPipeline (0.0.11) — CI Pipeline #360 Remediation
|
||||
|
||||
**Orchestrator:** Claude Code
|
||||
**Started:** 2026-02-12
|
||||
**Branch:** fix/ci-\*
|
||||
**Epic:** #360
|
||||
|
||||
### CI Fix Round 6
|
||||
|
||||
| id | status | description | issue | repo | branch | depends_on | blocks | agent | started_at | completed_at | estimate | used |
|
||||
| ----------- | ------ | -------------------------------------------------------------------------------------------- | ----- | ------------ | ---------- | ----------------------- | ----------- | ----- | ----------------- | ----------------- | -------- | ---- |
|
||||
| CI-FIX6-001 | done | Add @mosaic/ui build to web.yml build-shared step (fixes 10 test suites + 20 typecheck errs) | | ci | fix/ci-366 | | CI-FIX6-003 | w-14 | 2026-02-12T21:00Z | 2026-02-12T21:01Z | 3K | 3K |
|
||||
| CI-FIX6-002 | done | Move spec file removal to builder stage (layer-aware); add tar CVEs to .trivyignore | | orchestrator | fix/ci-366 | | CI-FIX6-004 | w-15 | 2026-02-12T21:00Z | 2026-02-12T21:15Z | 3K | 5K |
|
||||
| CI-FIX6-003 | done | Add React.ChangeEvent types to ~10 web files with untyped event handlers (49 lint + 19 TS) | | web | fix/ci-366 | CI-FIX6-001 | CI-FIX6-004 | w-16 | 2026-02-12T21:02Z | 2026-02-12T21:08Z | 12K | 8K |
|
||||
| CI-FIX6-004 | done | Verification: pnpm lint && pnpm typecheck && pnpm test on web; Dockerfile find validation | | all | fix/ci-366 | CI-FIX6-002,CI-FIX6-003 | | orch | 2026-02-12T21:08Z | 2026-02-12T21:10Z | 5K | 2K |
|
||||
|
||||
---
|
||||
|
||||
## M12-MatrixBridge (0.0.12) — Matrix/Element Bridge Integration
|
||||
|
||||
**Orchestrator:** Claude Code
|
||||
**Started:** 2026-02-15
|
||||
**Branch:** feature/m12-matrix-bridge
|
||||
**Epic:** #377
|
||||
|
||||
| id | status | description | issue | repo | branch | depends_on | blocks | agent | started_at | completed_at | estimate | used |
|
||||
| ------ | ------ | --------------------------------------------------------------- | ----- | ------ | ------------------------- | ----------------------------------------- | ----------------------------------------- | -------- | ----------------- | ----------------- | -------- | ---- |
|
||||
| MB-001 | done | Install matrix-bot-sdk and create MatrixService skeleton | #378 | api | feature/m12-matrix-bridge | | MB-003,MB-004,MB-005,MB-006,MB-007,MB-008 | worker-1 | 2026-02-15T10:00Z | 2026-02-15T10:20Z | 20K | 15K |
|
||||
| MB-002 | done | Add Synapse + Element Web to docker-compose for dev | #384 | docker | feature/m12-matrix-bridge | | | worker-2 | 2026-02-15T10:00Z | 2026-02-15T10:15Z | 15K | 5K |
|
||||
| MB-003 | done | Register MatrixService in BridgeModule with conditional loading | #379 | api | feature/m12-matrix-bridge | MB-001 | MB-008 | worker-3 | 2026-02-15T10:25Z | 2026-02-15T10:35Z | 12K | 20K |
|
||||
| MB-004 | done | Workspace-to-Matrix-Room mapping and provisioning | #380 | api | feature/m12-matrix-bridge | MB-001 | MB-005,MB-006,MB-008 | worker-4 | 2026-02-15T10:25Z | 2026-02-15T10:35Z | 20K | 39K |
|
||||
| MB-005 | done | Matrix command handling — receive and dispatch commands | #381 | api | feature/m12-matrix-bridge | MB-001,MB-004 | MB-007,MB-008 | worker-5 | 2026-02-15T10:40Z | 2026-02-15T14:27Z | 20K | 27K |
|
||||
| MB-006 | done | Herald Service: Add Matrix output adapter | #382 | api | feature/m12-matrix-bridge | MB-001,MB-004 | MB-008 | worker-6 | 2026-02-15T10:40Z | 2026-02-15T14:25Z | 18K | 109K |
|
||||
| MB-007 | done | Streaming AI responses via Matrix message edits | #383 | api | feature/m12-matrix-bridge | MB-001,MB-005 | MB-008 | worker-7 | 2026-02-15T14:30Z | 2026-02-15T14:35Z | 20K | 28K |
|
||||
| MB-008 | done | Matrix bridge E2E integration tests | #385 | api | feature/m12-matrix-bridge | MB-001,MB-003,MB-004,MB-005,MB-006,MB-007 | MB-009 | worker-8 | 2026-02-15T14:38Z | 2026-02-15T14:40Z | 25K | 35K |
|
||||
| MB-009 | done | Documentation: Matrix bridge setup and architecture | #386 | docs | feature/m12-matrix-bridge | MB-008 | | worker-9 | 2026-02-15T14:38Z | 2026-02-15T14:39Z | 10K | 12K |
|
||||
| MB-010 | done | Sample Matrix swarm deployment compose file | #387 | docker | feature/m12-matrix-bridge | | | | | 2026-02-15 | 0 | 0 |
|
||||
|
||||
| MB-011 | done | Remediate code review and security review findings | #377 | api | feature/m12-matrix-bridge | MB-001..MB-010 | | worker-10 | 2026-02-15T15:00Z | 2026-02-15T15:10Z | 30K | 145K |
|
||||
|
||||
### Phase Summary
|
||||
|
||||
| Phase | Tasks | Description |
|
||||
| ---------------------- | -------------- | --------------------------------------- |
|
||||
| 1 - Foundation | MB-001, MB-002 | SDK install, dev infrastructure |
|
||||
| 2 - Module Integration | MB-003, MB-004 | Module registration, DB mapping |
|
||||
| 3 - Core Features | MB-005, MB-006 | Command handling, Herald adapter |
|
||||
| 4 - Advanced Features | MB-007 | Streaming responses |
|
||||
| 5 - Testing | MB-008 | E2E integration tests |
|
||||
| 6 - Documentation | MB-009 | Setup guide, architecture docs |
|
||||
| 7 - Review Remediation | MB-011 | Fix all code review + security findings |
|
||||
|
||||
### Review Findings Resolved (MB-011)
|
||||
|
||||
| # | Severity | Finding | Fix |
|
||||
| --- | -------- | ---------------------------------------------------------- | -------------------------------------------------------------- |
|
||||
| 1 | CRITICAL | sendThreadMessage hardcodes controlRoomId — wrong room | Added channelId to ThreadMessageOptions, use options.channelId |
|
||||
| 2 | CRITICAL | void handleRoomMessage swallows ALL errors | Added .catch() with logger.error |
|
||||
| 3 | CRITICAL | handleFixCommand: dead thread on dispatch failure | Wrapped dispatch in try-catch with user-visible error |
|
||||
| 4 | CRITICAL | provisionRoom: orphaned Matrix room on DB failure | try-catch around DB update with logged warning |
|
||||
| 5 | HIGH | Missing MATRIX_BOT_USER_ID validation (infinite loop risk) | Added throw in connect() if missing |
|
||||
| 6 | HIGH | streamResponse finally block can throw/mask errors | Wrapped setTypingIndicator in nested try-catch |
|
||||
| 7 | HIGH | streamResponse catch editMessage can throw/mask | Wrapped editMessage in nested try-catch |
|
||||
| 8 | HIGH | HeraldService error log missing provider identity | Added provider.constructor.name to error log |
|
||||
| 9 | HIGH | MatrixRoomService uses unsafe type assertion | Replaced with public getClient() method |
|
||||
| 10 | HIGH | BridgeModule factory incomplete env var validation | Added warnings for missing vars when token set |
|
||||
| 11 | MEDIUM | setup-bot.sh JSON injection via shell variables | Replaced with jq -n for safe JSON construction |
|
||||
|
||||
### Notes
|
||||
|
||||
- #387 already completed in commit 6e20fc5
|
||||
- #377 is the EPIC issue — closed after all reviews remediated
|
||||
- 187 tests passing after remediation (41 matrix, 20 streaming, 10 room, 26 integration, 27 herald, 25 discord, + others)
|
||||
|
||||
---
|
||||
|
||||
## M13-SpeechServices (0.0.13) — TTS & STT Integration
|
||||
|
||||
**Orchestrator:** Claude Code
|
||||
**Started:** 2026-02-15
|
||||
@@ -6,14 +104,14 @@
|
||||
**Milestone:** M13-SpeechServices (0.0.13)
|
||||
**Epic:** #388
|
||||
|
||||
## Phase 1: Foundation (Config + Module + Providers)
|
||||
### Phase 1: Foundation (Config + Module + Providers)
|
||||
|
||||
| id | status | description | issue | repo | branch | depends_on | blocks | agent | started_at | completed_at | estimate | used | notes |
|
||||
| ---------- | ------ | ------------------------------------------------------------------------ | ----- | ---- | --------------------------- | ---------- | -------------------------------- | -------- | ----------------- | ----------------- | -------- | ---- | ----------------- |
|
||||
| SP-CFG-001 | done | #401: Speech services environment variables and ConfigModule integration | #401 | api | feature/m13-speech-services | | SP-MOD-001,SP-DOC-001 | worker-1 | 2026-02-15T06:00Z | 2026-02-15T06:07Z | 15K | 15K | 51 tests, 4cc43be |
|
||||
| SP-MOD-001 | done | #389: Create SpeechModule with provider abstraction layer | #389 | api | feature/m13-speech-services | SP-CFG-001 | SP-STT-001,SP-TTS-001,SP-MID-001 | worker-2 | 2026-02-15T06:08Z | 2026-02-15T06:14Z | 25K | 25K | 27 tests, c40373f |
|
||||
|
||||
## Phase 2: Providers (STT + TTS)
|
||||
### Phase 2: Providers (STT + TTS)
|
||||
|
||||
| id | status | description | issue | repo | branch | depends_on | blocks | agent | started_at | completed_at | estimate | used | notes |
|
||||
| ---------- | ------ | ---------------------------------------------------------------------- | ----- | ---- | --------------------------- | ---------- | ------------------------------------------ | -------- | ----------------- | ----------------- | -------- | ---- | ----------------- |
|
||||
@@ -23,7 +121,7 @@
|
||||
| SP-TTS-003 | done | #394: Implement Chatterbox TTS provider (premium tier, voice cloning) | #394 | api | feature/m13-speech-services | SP-TTS-001 | SP-EP-002 | worker-7 | 2026-02-15T06:26Z | 2026-02-15T06:34Z | 15K | 25K | 26 tests, d37c78f |
|
||||
| SP-TTS-004 | done | #395: Implement Piper TTS provider via OpenedAI Speech (fallback tier) | #395 | api | feature/m13-speech-services | SP-TTS-001 | SP-EP-002 | worker-8 | 2026-02-15T06:35Z | 2026-02-15T06:44Z | 12K | 15K | 37 tests, 6c46556 |
|
||||
|
||||
## Phase 3: Middleware + REST Endpoints
|
||||
### Phase 3: Middleware + REST Endpoints
|
||||
|
||||
| id | status | description | issue | repo | branch | depends_on | blocks | agent | started_at | completed_at | estimate | used | notes |
|
||||
| ---------- | ------ | ---------------------------------------------------------- | ----- | ---- | --------------------------- | ------------------------------------------- | ------------------- | --------- | ----------------- | ----------------- | -------- | ---- | ----------------- |
|
||||
@@ -31,20 +129,20 @@
|
||||
| SP-EP-001 | done | #392: Create /api/speech/transcribe REST endpoint | #392 | api | feature/m13-speech-services | SP-STT-001,SP-MID-001 | SP-WS-001,SP-FE-001 | worker-10 | 2026-02-15T06:45Z | 2026-02-15T06:52Z | 20K | 25K | 10 tests, 527262a |
|
||||
| SP-EP-002 | done | #396: Create /api/speech/synthesize REST endpoint | #396 | api | feature/m13-speech-services | SP-TTS-002,SP-TTS-003,SP-TTS-004,SP-MID-001 | SP-FE-002 | worker-11 | 2026-02-15T06:45Z | 2026-02-15T06:53Z | 20K | 35K | 17 tests, 527262a |
|
||||
|
||||
## Phase 4: WebSocket Streaming
|
||||
### Phase 4: WebSocket Streaming
|
||||
|
||||
| id | status | description | issue | repo | branch | depends_on | blocks | agent | started_at | completed_at | estimate | used | notes |
|
||||
| --------- | ------ | ---------------------------------------------------------- | ----- | ---- | --------------------------- | -------------------- | --------- | --------- | ----------------- | ----------------- | -------- | ---- | ----------------- |
|
||||
| SP-WS-001 | done | #397: Implement WebSocket streaming transcription endpoint | #397 | api | feature/m13-speech-services | SP-STT-001,SP-EP-001 | SP-FE-001 | worker-12 | 2026-02-15T06:54Z | 2026-02-15T07:00Z | 20K | 30K | 29 tests, 28c9e6f |
|
||||
|
||||
## Phase 5: Docker/DevOps
|
||||
### Phase 5: Docker/DevOps
|
||||
|
||||
| id | status | description | issue | repo | branch | depends_on | blocks | agent | started_at | completed_at | estimate | used | notes |
|
||||
| ---------- | ------ | -------------------------------------------------------------- | ----- | ------ | --------------------------- | ---------- | ---------- | --------- | ----------------- | ----------------- | -------- | ---- | ------- |
|
||||
| SP-DOC-001 | done | #399: Docker Compose dev overlay for speech services | #399 | devops | feature/m13-speech-services | SP-CFG-001 | SP-DOC-002 | worker-3 | 2026-02-15T06:08Z | 2026-02-15T06:10Z | 10K | 15K | 52553c8 |
|
||||
| SP-DOC-002 | done | #400: Docker Compose swarm/prod deployment for speech services | #400 | devops | feature/m13-speech-services | SP-DOC-001 | | worker-13 | 2026-02-15T06:54Z | 2026-02-15T06:56Z | 10K | 8K | b3d6d73 |
|
||||
|
||||
## Phase 6: Frontend
|
||||
### Phase 6: Frontend
|
||||
|
||||
| id | status | description | issue | repo | branch | depends_on | blocks | agent | started_at | completed_at | estimate | used | notes |
|
||||
| --------- | ------ | ------------------------------------------------------------------------- | ----- | ---- | --------------------------- | ------------------- | ---------- | --------- | ----------------- | ----------------- | -------- | ---- | ----------------- |
|
||||
@@ -52,7 +150,7 @@
|
||||
| SP-FE-002 | done | #403: Frontend audio playback component for TTS output | #403 | web | feature/m13-speech-services | SP-EP-002 | SP-FE-003 | worker-15 | 2026-02-15T07:01Z | 2026-02-15T07:11Z | 20K | 50K | 32 tests, 74d6c10 |
|
||||
| SP-FE-003 | done | #404: Frontend speech settings page (provider selection, voice config) | #404 | web | feature/m13-speech-services | SP-FE-001,SP-FE-002 | SP-E2E-001 | worker-16 | 2026-02-15T07:13Z | 2026-02-15T07:22Z | 20K | 35K | 30 tests, bc86947 |
|
||||
|
||||
## Phase 7: Testing + Documentation
|
||||
### Phase 7: Testing + Documentation
|
||||
|
||||
| id | status | description | issue | repo | branch | depends_on | blocks | agent | started_at | completed_at | estimate | used | notes |
|
||||
| ----------- | ------ | ----------------------------------------------------------------------- | ----- | ---- | --------------------------- | --------------------------------------- | ----------- | --------- | ----------------- | ----------------- | -------- | ---- | ----------------- |
|
||||
|
||||
735
docs/telemetry.md
Normal file
735
docs/telemetry.md
Normal file
@@ -0,0 +1,735 @@
|
||||
# Mosaic Telemetry Integration Guide
|
||||
|
||||
## 1. Overview
|
||||
|
||||
### What is Mosaic Telemetry?
|
||||
|
||||
Mosaic Telemetry is a task completion tracking system purpose-built for AI operations within Mosaic Stack. It captures detailed metrics about every AI task execution -- token usage, cost, duration, outcome, and quality gate results -- and submits them to a central telemetry API for aggregation and analysis.
|
||||
|
||||
The aggregated data powers a **prediction system** that provides pre-task estimates for cost, token usage, and expected quality, enabling informed decisions before dispatching work to AI agents.
|
||||
|
||||
### How It Differs from OpenTelemetry
|
||||
|
||||
Mosaic Stack uses **two separate telemetry systems** that serve different purposes:
|
||||
|
||||
| Aspect | OpenTelemetry (OTEL) | Mosaic Telemetry |
|
||||
| --------------------------------- | --------------------------------------------- | -------------------------------------------- |
|
||||
| **Purpose** | Distributed request tracing and observability | AI task completion metrics and predictions |
|
||||
| **What it tracks** | HTTP requests, spans, latency, errors | Token counts, costs, outcomes, quality gates |
|
||||
| **Data destination** | OTEL Collector (Jaeger, Grafana, etc.) | Mosaic Telemetry API (PostgreSQL-backed) |
|
||||
| **Module location (API)** | `apps/api/src/telemetry/` | `apps/api/src/mosaic-telemetry/` |
|
||||
| **Module location (Coordinator)** | `apps/coordinator/src/telemetry.py` | `apps/coordinator/src/mosaic_telemetry.py` |
|
||||
|
||||
Both systems can run simultaneously. They are completely independent.
|
||||
|
||||
### Architecture
|
||||
|
||||
```
|
||||
+------------------+ +------------------+
|
||||
| Mosaic API | | Coordinator |
|
||||
| (NestJS) | | (FastAPI) |
|
||||
+--------+---------+ +--------+---------+
|
||||
| |
|
||||
Track events Track events
|
||||
| |
|
||||
v v
|
||||
+------------------------------------------+
|
||||
| Telemetry Client SDK |
|
||||
| (JS: @mosaicstack/telemetry-client) |
|
||||
| (Py: mosaicstack-telemetry) |
|
||||
| |
|
||||
| - Event queue (in-memory) |
|
||||
| - Batch submission (5-min intervals) |
|
||||
| - Prediction cache (6hr TTL) |
|
||||
+-------------------+----------------------+
|
||||
|
|
||||
HTTP POST /events
|
||||
HTTP POST /predictions
|
||||
|
|
||||
v
|
||||
+------------------------------------------+
|
||||
| Mosaic Telemetry API |
|
||||
| (Separate service) |
|
||||
| |
|
||||
| - Event ingestion & validation |
|
||||
| - Aggregation & statistics |
|
||||
| - Prediction generation |
|
||||
+-------------------+----------------------+
|
||||
|
|
||||
v
|
||||
+---------------+
|
||||
| PostgreSQL |
|
||||
+---------------+
|
||||
```
|
||||
|
||||
**Data flow:**
|
||||
|
||||
1. Application code calls `trackTaskCompletion()` (JS) or `client.track()` (Python)
|
||||
2. Events are queued in memory (up to 1,000 events)
|
||||
3. A background timer flushes the queue every 5 minutes in batches of up to 100
|
||||
4. The telemetry API ingests events, validates them, and stores them in PostgreSQL
|
||||
5. Prediction queries are served from aggregated data with a 6-hour cache TTL
|
||||
|
||||
---
|
||||
|
||||
## 2. Configuration Guide
|
||||
|
||||
### Environment Variables
|
||||
|
||||
All configuration is done through environment variables prefixed with `MOSAIC_TELEMETRY_`:
|
||||
|
||||
| Variable | Type | Default | Description |
|
||||
| ------------------------------ | ------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------ |
|
||||
| `MOSAIC_TELEMETRY_ENABLED` | boolean | `true` | Master switch. Set to `false` to completely disable telemetry (no HTTP calls). |
|
||||
| `MOSAIC_TELEMETRY_SERVER_URL` | string | (none) | URL of the telemetry API server. For Docker Compose: `http://telemetry-api:8000`. For production: `https://tel-api.mosaicstack.dev`. |
|
||||
| `MOSAIC_TELEMETRY_API_KEY` | string | (none) | API key for authenticating with the telemetry server. Generate with: `openssl rand -hex 32` (64-char hex string). |
|
||||
| `MOSAIC_TELEMETRY_INSTANCE_ID` | string | (none) | Unique UUID identifying this Mosaic Stack instance. Generate with: `uuidgen` or `python -c "import uuid; print(uuid.uuid4())"`. |
|
||||
| `MOSAIC_TELEMETRY_DRY_RUN` | boolean | `false` | When `true`, events are logged to console instead of being sent via HTTP. Useful for development. |
|
||||
|
||||
### Enabling Telemetry
|
||||
|
||||
To enable telemetry, set all three required variables in your `.env` file:
|
||||
|
||||
```bash
|
||||
MOSAIC_TELEMETRY_ENABLED=true
|
||||
MOSAIC_TELEMETRY_SERVER_URL=http://telemetry-api:8000
|
||||
MOSAIC_TELEMETRY_API_KEY=<your-64-char-hex-api-key>
|
||||
MOSAIC_TELEMETRY_INSTANCE_ID=<your-uuid>
|
||||
```
|
||||
|
||||
If `MOSAIC_TELEMETRY_ENABLED` is `true` but any of `SERVER_URL`, `API_KEY`, or `INSTANCE_ID` is missing, the service logs a warning and disables telemetry gracefully. This is intentional: telemetry configuration issues never prevent the application from starting.
|
||||
|
||||
### Disabling Telemetry
|
||||
|
||||
Set `MOSAIC_TELEMETRY_ENABLED=false` in your `.env`. No HTTP calls will be made, and all tracking methods become safe no-ops.
|
||||
|
||||
### Dry-Run Mode
|
||||
|
||||
For local development and debugging, enable dry-run mode:
|
||||
|
||||
```bash
|
||||
MOSAIC_TELEMETRY_ENABLED=true
|
||||
MOSAIC_TELEMETRY_DRY_RUN=true
|
||||
MOSAIC_TELEMETRY_SERVER_URL=http://localhost:8000 # Not actually called
|
||||
MOSAIC_TELEMETRY_API_KEY=0000000000000000000000000000000000000000000000000000000000000000
|
||||
MOSAIC_TELEMETRY_INSTANCE_ID=00000000-0000-0000-0000-000000000000
|
||||
```
|
||||
|
||||
In dry-run mode, the SDK logs event payloads to the console instead of submitting them via HTTP. This lets you verify that tracking points are firing correctly without needing a running telemetry API.
|
||||
|
||||
### Docker Compose Configuration
|
||||
|
||||
Both `docker-compose.yml` (root) and `docker/docker-compose.yml` pass telemetry environment variables to the API service:
|
||||
|
||||
```yaml
|
||||
services:
|
||||
mosaic-api:
|
||||
environment:
|
||||
# Telemetry (task completion tracking & predictions)
|
||||
MOSAIC_TELEMETRY_ENABLED: ${MOSAIC_TELEMETRY_ENABLED:-false}
|
||||
MOSAIC_TELEMETRY_SERVER_URL: ${MOSAIC_TELEMETRY_SERVER_URL:-http://telemetry-api:8000}
|
||||
MOSAIC_TELEMETRY_API_KEY: ${MOSAIC_TELEMETRY_API_KEY:-}
|
||||
MOSAIC_TELEMETRY_INSTANCE_ID: ${MOSAIC_TELEMETRY_INSTANCE_ID:-}
|
||||
MOSAIC_TELEMETRY_DRY_RUN: ${MOSAIC_TELEMETRY_DRY_RUN:-false}
|
||||
```
|
||||
|
||||
Note that telemetry defaults to `false` in Docker Compose. Set `MOSAIC_TELEMETRY_ENABLED=true` in your `.env` to activate it.
|
||||
|
||||
An optional local telemetry API service is available (commented out in `docker/docker-compose.yml`). Uncomment it to run a self-contained development environment:
|
||||
|
||||
```yaml
|
||||
# Uncomment in docker/docker-compose.yml
|
||||
telemetry-api:
|
||||
image: git.mosaicstack.dev/mosaic/telemetry-api:latest
|
||||
container_name: mosaic-telemetry-api
|
||||
restart: unless-stopped
|
||||
environment:
|
||||
HOST: 0.0.0.0
|
||||
PORT: 8000
|
||||
ports:
|
||||
- "8001:8000"
|
||||
healthcheck:
|
||||
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 3
|
||||
start_period: 10s
|
||||
networks:
|
||||
- mosaic-network
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. What Gets Tracked
|
||||
|
||||
### TaskCompletionEvent Schema
|
||||
|
||||
Every tracked event conforms to the `TaskCompletionEvent` interface. This is the core data structure submitted to the telemetry API:
|
||||
|
||||
| Field | Type | Description |
|
||||
| --------------------------- | ------------------- | -------------------------------------------------------------- |
|
||||
| `instance_id` | `string` | UUID of the Mosaic Stack instance that generated the event |
|
||||
| `event_id` | `string` | Unique UUID for this event (auto-generated by the SDK) |
|
||||
| `schema_version` | `string` | Schema version for forward compatibility (auto-set by the SDK) |
|
||||
| `timestamp` | `string` | ISO 8601 timestamp of event creation (auto-set by the SDK) |
|
||||
| `task_duration_ms` | `number` | How long the task took in milliseconds |
|
||||
| `task_type` | `TaskType` | Type of task performed (see enum below) |
|
||||
| `complexity` | `Complexity` | Complexity level of the task |
|
||||
| `harness` | `Harness` | The coding harness or tool used |
|
||||
| `model` | `string` | AI model name (e.g., `"claude-sonnet-4-5"`) |
|
||||
| `provider` | `Provider` | AI model provider |
|
||||
| `estimated_input_tokens` | `number` | Pre-task estimated input tokens (from predictions) |
|
||||
| `estimated_output_tokens` | `number` | Pre-task estimated output tokens (from predictions) |
|
||||
| `actual_input_tokens` | `number` | Actual input tokens consumed |
|
||||
| `actual_output_tokens` | `number` | Actual output tokens generated |
|
||||
| `estimated_cost_usd_micros` | `number` | Pre-task estimated cost in microdollars (USD \* 1,000,000) |
|
||||
| `actual_cost_usd_micros` | `number` | Actual cost in microdollars |
|
||||
| `quality_gate_passed` | `boolean` | Whether all quality gates passed |
|
||||
| `quality_gates_run` | `QualityGate[]` | List of quality gates that were executed |
|
||||
| `quality_gates_failed` | `QualityGate[]` | List of quality gates that failed |
|
||||
| `context_compactions` | `number` | Number of context window compactions during the task |
|
||||
| `context_rotations` | `number` | Number of context window rotations during the task |
|
||||
| `context_utilization_final` | `number` | Final context window utilization (0.0 to 1.0) |
|
||||
| `outcome` | `Outcome` | Task outcome |
|
||||
| `retry_count` | `number` | Number of retries before completion |
|
||||
| `language` | `string?` | Primary programming language (optional) |
|
||||
| `repo_size_category` | `RepoSizeCategory?` | Repository size category (optional) |
|
||||
|
||||
### Enum Values
|
||||
|
||||
**TaskType:**
|
||||
`planning`, `implementation`, `code_review`, `testing`, `debugging`, `refactoring`, `documentation`, `configuration`, `security_audit`, `unknown`
|
||||
|
||||
**Complexity:**
|
||||
`low`, `medium`, `high`, `critical`
|
||||
|
||||
**Harness:**
|
||||
`claude_code`, `opencode`, `kilo_code`, `aider`, `api_direct`, `ollama_local`, `custom`, `unknown`
|
||||
|
||||
**Provider:**
|
||||
`anthropic`, `openai`, `openrouter`, `ollama`, `google`, `mistral`, `custom`, `unknown`
|
||||
|
||||
**QualityGate:**
|
||||
`build`, `lint`, `test`, `coverage`, `typecheck`, `security`
|
||||
|
||||
**Outcome:**
|
||||
`success`, `failure`, `partial`, `timeout`
|
||||
|
||||
**RepoSizeCategory:**
|
||||
`tiny`, `small`, `medium`, `large`, `huge`
|
||||
|
||||
### API Service: LLM Call Tracking
|
||||
|
||||
The NestJS API tracks every LLM service call (chat, streaming chat, and embeddings) via `LlmTelemetryTrackerService` at `apps/api/src/llm/llm-telemetry-tracker.service.ts`.
|
||||
|
||||
Tracked operations:
|
||||
|
||||
- **`chat`** -- Synchronous chat completions
|
||||
- **`chatStream`** -- Streaming chat completions
|
||||
- **`embed`** -- Embedding generation
|
||||
|
||||
For each call, the tracker captures:
|
||||
|
||||
- Model name and provider type
|
||||
- Input and output token counts
|
||||
- Duration in milliseconds
|
||||
- Success or failure outcome
|
||||
- Calculated cost from the built-in cost table (`apps/api/src/llm/llm-cost-table.ts`)
|
||||
- Task type inferred from calling context (e.g., `"brain"` maps to `planning`, `"review"` maps to `code_review`)
|
||||
|
||||
The cost table uses longest-prefix matching on model names and covers all major Anthropic and OpenAI models. Ollama/local models are treated as zero-cost.
|
||||
|
||||
### Coordinator: Agent Task Dispatch Tracking
|
||||
|
||||
The FastAPI coordinator tracks agent task completions in `apps/coordinator/src/mosaic_telemetry.py` and `apps/coordinator/src/coordinator.py`.
|
||||
|
||||
After each agent task dispatch (success or failure), the coordinator emits a `TaskCompletionEvent` capturing:
|
||||
|
||||
- Task duration from start to finish
|
||||
- Agent model, provider, and harness (resolved from the `assigned_agent` field)
|
||||
- Task outcome (`success`, `failure`, `partial`, `timeout`)
|
||||
- Quality gate results (build, lint, test, etc.)
|
||||
- Retry count for the issue
|
||||
- Complexity level from issue metadata
|
||||
|
||||
The coordinator uses the `build_task_event()` helper function which provides sensible defaults for the coordinator context (Claude Code harness, Anthropic provider, TypeScript language).
|
||||
|
||||
### Event Lifecycle
|
||||
|
||||
```
|
||||
1. Application code calls trackTaskCompletion() or client.track()
|
||||
|
|
||||
v
|
||||
2. Event is added to in-memory queue (max 1,000 events)
|
||||
|
|
||||
v
|
||||
3. Background timer fires every 5 minutes (submitIntervalMs)
|
||||
|
|
||||
v
|
||||
4. Queue is drained in batches of up to 100 events (batchSize)
|
||||
|
|
||||
v
|
||||
5. Each batch is POSTed to the telemetry API
|
||||
|
|
||||
v
|
||||
6. API validates, stores, and acknowledges each event
|
||||
```
|
||||
|
||||
If the telemetry API is unreachable, events remain in the queue and are retried on the next interval (up to 3 retries per submission). Telemetry errors are logged but never propagated to calling code.
|
||||
|
||||
---
|
||||
|
||||
## 4. Prediction System
|
||||
|
||||
### How Predictions Work
|
||||
|
||||
The Mosaic Telemetry API aggregates historical task completion data across all contributing instances. From this data, it generates statistical predictions for new tasks based on their characteristics (task type, model, provider, complexity).
|
||||
|
||||
Predictions include percentile distributions (p10, p25, median, p75, p90) for token usage and cost, plus quality metrics (gate pass rate, success rate).
|
||||
|
||||
### Querying Predictions via API
|
||||
|
||||
The API exposes a prediction endpoint at:
|
||||
|
||||
```
|
||||
GET /api/telemetry/estimate?taskType=<taskType>&model=<model>&provider=<provider>&complexity=<complexity>
|
||||
```
|
||||
|
||||
**Authentication:** Requires a valid session (Bearer token via `AuthGuard`).
|
||||
|
||||
**Query Parameters (all required):**
|
||||
|
||||
| Parameter | Type | Example | Description |
|
||||
| ------------ | ------------ | ------------------- | --------------------- |
|
||||
| `taskType` | `TaskType` | `implementation` | Task type to estimate |
|
||||
| `model` | `string` | `claude-sonnet-4-5` | Model name |
|
||||
| `provider` | `Provider` | `anthropic` | Provider name |
|
||||
| `complexity` | `Complexity` | `medium` | Complexity level |
|
||||
|
||||
**Example Request:**
|
||||
|
||||
```bash
|
||||
curl -X GET \
|
||||
'http://localhost:3001/api/telemetry/estimate?taskType=implementation&model=claude-sonnet-4-5&provider=anthropic&complexity=medium' \
|
||||
-H 'Authorization: Bearer YOUR_SESSION_TOKEN'
|
||||
```
|
||||
|
||||
**Response:**
|
||||
|
||||
```json
|
||||
{
|
||||
"data": {
|
||||
"prediction": {
|
||||
"input_tokens": {
|
||||
"p10": 500,
|
||||
"p25": 1200,
|
||||
"median": 2500,
|
||||
"p75": 5000,
|
||||
"p90": 10000
|
||||
},
|
||||
"output_tokens": {
|
||||
"p10": 200,
|
||||
"p25": 800,
|
||||
"median": 1500,
|
||||
"p75": 3000,
|
||||
"p90": 6000
|
||||
},
|
||||
"cost_usd_micros": {
|
||||
"median": 30000
|
||||
},
|
||||
"duration_ms": {
|
||||
"median": 5000
|
||||
},
|
||||
"correction_factors": {
|
||||
"input": 1.0,
|
||||
"output": 1.0
|
||||
},
|
||||
"quality": {
|
||||
"gate_pass_rate": 0.85,
|
||||
"success_rate": 0.92
|
||||
}
|
||||
},
|
||||
"metadata": {
|
||||
"sample_size": 150,
|
||||
"fallback_level": 0,
|
||||
"confidence": "high",
|
||||
"last_updated": "2026-02-15T10:00:00Z",
|
||||
"cache_hit": true
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
If no prediction data is available, the response returns `{ "data": null }`.
|
||||
|
||||
### Confidence Levels
|
||||
|
||||
The prediction system reports a confidence level based on sample size and data freshness:
|
||||
|
||||
| Confidence | Meaning |
|
||||
| ---------- | -------------------------------------------------------------- |
|
||||
| `high` | Substantial sample size, recent data, all dimensions matched |
|
||||
| `medium` | Moderate sample, some dimension fallback |
|
||||
| `low` | Small sample or significant fallback from requested dimensions |
|
||||
| `none` | No data available for this combination |
|
||||
|
||||
### Fallback Behavior
|
||||
|
||||
When exact matches are unavailable, the prediction system falls back through progressively broader aggregations:
|
||||
|
||||
1. **Exact match** -- task_type + model + provider + complexity
|
||||
2. **Drop complexity** -- task_type + model + provider
|
||||
3. **Drop model** -- task_type + provider
|
||||
4. **Global** -- task_type only
|
||||
|
||||
The `fallback_level` field in metadata indicates which level was used (0 = exact match).
|
||||
|
||||
### Cache Strategy
|
||||
|
||||
Predictions are cached in-memory by the SDK with a **6-hour TTL** (`predictionCacheTtlMs: 21_600_000`). The `PredictionService` pre-fetches common combinations on startup to warm the cache:
|
||||
|
||||
- **Models:** claude-sonnet-4-5, claude-opus-4, claude-haiku-4-5, gpt-4o, gpt-4o-mini
|
||||
- **Task types:** implementation, planning, code_review
|
||||
- **Complexities:** low, medium
|
||||
|
||||
This produces 30 pre-cached queries (5 models x 3 task types x 2 complexities). Subsequent requests for these combinations are served from cache without any HTTP call.
|
||||
|
||||
---
|
||||
|
||||
## 5. SDK Reference
|
||||
|
||||
### JavaScript: @mosaicstack/telemetry-client
|
||||
|
||||
**Registry:** Gitea npm registry at `git.mosaicstack.dev`
|
||||
**Version:** 0.1.0
|
||||
|
||||
**Installation:**
|
||||
|
||||
```bash
|
||||
pnpm add @mosaicstack/telemetry-client
|
||||
```
|
||||
|
||||
**Key Exports:**
|
||||
|
||||
```typescript
|
||||
// Client
|
||||
import { TelemetryClient, EventBuilder, resolveConfig } from "@mosaicstack/telemetry-client";
|
||||
|
||||
// Types
|
||||
import type {
|
||||
TelemetryConfig,
|
||||
TaskCompletionEvent,
|
||||
EventBuilderParams,
|
||||
PredictionQuery,
|
||||
PredictionResponse,
|
||||
PredictionData,
|
||||
PredictionMetadata,
|
||||
TokenDistribution,
|
||||
} from "@mosaicstack/telemetry-client";
|
||||
|
||||
// Enums
|
||||
import {
|
||||
TaskType,
|
||||
Complexity,
|
||||
Harness,
|
||||
Provider,
|
||||
QualityGate,
|
||||
Outcome,
|
||||
RepoSizeCategory,
|
||||
} from "@mosaicstack/telemetry-client";
|
||||
```
|
||||
|
||||
**TelemetryClient API:**
|
||||
|
||||
| Method | Description |
|
||||
| ------------------------------------------------------------------- | ------------------------------------------------------------ |
|
||||
| `constructor(config: TelemetryConfig)` | Create a new client with the given configuration |
|
||||
| `start(): void` | Start background batch submission (idempotent) |
|
||||
| `stop(): Promise<void>` | Stop background submission, flush remaining events |
|
||||
| `track(event: TaskCompletionEvent): void` | Queue an event for batch submission (never throws) |
|
||||
| `getPrediction(query: PredictionQuery): PredictionResponse \| null` | Get a cached prediction (returns null if not cached/expired) |
|
||||
| `refreshPredictions(queries: PredictionQuery[]): Promise<void>` | Force-refresh predictions from the server |
|
||||
| `eventBuilder: EventBuilder` | Get the EventBuilder for constructing events |
|
||||
| `queueSize: number` | Number of events currently queued |
|
||||
| `isRunning: boolean` | Whether the client is currently running |
|
||||
|
||||
**TelemetryConfig Options:**
|
||||
|
||||
| Option | Type | Default | Description |
|
||||
| ---------------------- | ------------------------ | ------------------- | ---------------------------------- |
|
||||
| `serverUrl` | `string` | (required) | Base URL of the telemetry server |
|
||||
| `apiKey` | `string` | (required) | 64-char hex API key |
|
||||
| `instanceId` | `string` | (required) | UUID for this instance |
|
||||
| `enabled` | `boolean` | `true` | Enable/disable telemetry |
|
||||
| `submitIntervalMs` | `number` | `300_000` (5 min) | Interval between batch submissions |
|
||||
| `maxQueueSize` | `number` | `1000` | Maximum queued events |
|
||||
| `batchSize` | `number` | `100` | Maximum events per batch |
|
||||
| `requestTimeoutMs` | `number` | `10_000` (10 sec) | HTTP request timeout |
|
||||
| `predictionCacheTtlMs` | `number` | `21_600_000` (6 hr) | Prediction cache TTL |
|
||||
| `dryRun` | `boolean` | `false` | Log events instead of sending |
|
||||
| `maxRetries` | `number` | `3` | Retries per submission |
|
||||
| `onError` | `(error: Error) => void` | noop | Error callback |
|
||||
|
||||
**EventBuilder Usage:**
|
||||
|
||||
```typescript
|
||||
const event = client.eventBuilder.build({
|
||||
task_duration_ms: 1500,
|
||||
task_type: TaskType.IMPLEMENTATION,
|
||||
complexity: Complexity.LOW,
|
||||
harness: Harness.API_DIRECT,
|
||||
model: "claude-sonnet-4-5",
|
||||
provider: Provider.ANTHROPIC,
|
||||
estimated_input_tokens: 0,
|
||||
estimated_output_tokens: 0,
|
||||
actual_input_tokens: 200,
|
||||
actual_output_tokens: 500,
|
||||
estimated_cost_usd_micros: 0,
|
||||
actual_cost_usd_micros: 8100,
|
||||
quality_gate_passed: true,
|
||||
quality_gates_run: [QualityGate.LINT, QualityGate.TEST],
|
||||
quality_gates_failed: [],
|
||||
context_compactions: 0,
|
||||
context_rotations: 0,
|
||||
context_utilization_final: 0.3,
|
||||
outcome: Outcome.SUCCESS,
|
||||
retry_count: 0,
|
||||
language: "typescript",
|
||||
});
|
||||
|
||||
client.track(event);
|
||||
```
|
||||
|
||||
### Python: mosaicstack-telemetry
|
||||
|
||||
**Registry:** Gitea PyPI registry at `git.mosaicstack.dev`
|
||||
**Version:** 0.1.0
|
||||
|
||||
**Installation:**
|
||||
|
||||
```bash
|
||||
pip install mosaicstack-telemetry
|
||||
```
|
||||
|
||||
**Key Imports:**
|
||||
|
||||
```python
|
||||
from mosaicstack_telemetry import (
|
||||
TelemetryClient,
|
||||
TelemetryConfig,
|
||||
EventBuilder,
|
||||
TaskType,
|
||||
Complexity,
|
||||
Harness,
|
||||
Provider,
|
||||
QualityGate,
|
||||
Outcome,
|
||||
)
|
||||
```
|
||||
|
||||
**Python Client Usage:**
|
||||
|
||||
```python
|
||||
# Create config (reads MOSAIC_TELEMETRY_* env vars automatically)
|
||||
config = TelemetryConfig()
|
||||
errors = config.validate()
|
||||
|
||||
# Create and start client
|
||||
client = TelemetryClient(config)
|
||||
await client.start_async()
|
||||
|
||||
# Build and track an event
|
||||
builder = EventBuilder(instance_id=config.instance_id)
|
||||
event = (
|
||||
builder
|
||||
.task_type(TaskType.IMPLEMENTATION)
|
||||
.complexity_level(Complexity.MEDIUM)
|
||||
.harness_type(Harness.CLAUDE_CODE)
|
||||
.model("claude-sonnet-4-5")
|
||||
.provider(Provider.ANTHROPIC)
|
||||
.duration_ms(5000)
|
||||
.outcome_value(Outcome.SUCCESS)
|
||||
.tokens(
|
||||
estimated_in=0,
|
||||
estimated_out=0,
|
||||
actual_in=3000,
|
||||
actual_out=1500,
|
||||
)
|
||||
.cost(estimated=0, actual=52500)
|
||||
.quality(
|
||||
passed=True,
|
||||
gates_run=[QualityGate.BUILD, QualityGate.LINT, QualityGate.TEST],
|
||||
gates_failed=[],
|
||||
)
|
||||
.context(compactions=0, rotations=0, utilization=0.4)
|
||||
.retry_count(0)
|
||||
.language("typescript")
|
||||
.build()
|
||||
)
|
||||
|
||||
client.track(event)
|
||||
|
||||
# Shutdown (flushes remaining events)
|
||||
await client.stop_async()
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Development Guide
|
||||
|
||||
### Testing Locally with Dry-Run Mode
|
||||
|
||||
The fastest way to develop with telemetry is to use dry-run mode. This logs event payloads to the console without needing a running telemetry API:
|
||||
|
||||
```bash
|
||||
# In your .env
|
||||
MOSAIC_TELEMETRY_ENABLED=true
|
||||
MOSAIC_TELEMETRY_DRY_RUN=true
|
||||
MOSAIC_TELEMETRY_SERVER_URL=http://localhost:8000
|
||||
MOSAIC_TELEMETRY_API_KEY=0000000000000000000000000000000000000000000000000000000000000000
|
||||
MOSAIC_TELEMETRY_INSTANCE_ID=00000000-0000-0000-0000-000000000000
|
||||
```
|
||||
|
||||
Start the API server and trigger LLM operations. You will see telemetry event payloads logged in the console output.
|
||||
|
||||
### Adding New Tracking Points
|
||||
|
||||
To add telemetry tracking to a new service in the NestJS API:
|
||||
|
||||
**Step 1:** Inject `MosaicTelemetryService` into your service. Because `MosaicTelemetryModule` is global, no module import is needed:
|
||||
|
||||
```typescript
|
||||
import { Injectable } from "@nestjs/common";
|
||||
import { MosaicTelemetryService } from "../mosaic-telemetry/mosaic-telemetry.service";
|
||||
import { TaskType, Complexity, Harness, Provider, Outcome } from "@mosaicstack/telemetry-client";
|
||||
|
||||
@Injectable()
|
||||
export class MyService {
|
||||
constructor(private readonly telemetry: MosaicTelemetryService) {}
|
||||
}
|
||||
```
|
||||
|
||||
**Step 2:** Build and track events after task completion:
|
||||
|
||||
```typescript
|
||||
async performTask(): Promise<void> {
|
||||
const start = Date.now();
|
||||
|
||||
// ... perform the task ...
|
||||
|
||||
const duration = Date.now() - start;
|
||||
const builder = this.telemetry.eventBuilder;
|
||||
|
||||
if (builder) {
|
||||
const event = builder.build({
|
||||
task_duration_ms: duration,
|
||||
task_type: TaskType.IMPLEMENTATION,
|
||||
complexity: Complexity.MEDIUM,
|
||||
harness: Harness.API_DIRECT,
|
||||
model: "claude-sonnet-4-5",
|
||||
provider: Provider.ANTHROPIC,
|
||||
estimated_input_tokens: 0,
|
||||
estimated_output_tokens: 0,
|
||||
actual_input_tokens: inputTokens,
|
||||
actual_output_tokens: outputTokens,
|
||||
estimated_cost_usd_micros: 0,
|
||||
actual_cost_usd_micros: costMicros,
|
||||
quality_gate_passed: true,
|
||||
quality_gates_run: [],
|
||||
quality_gates_failed: [],
|
||||
context_compactions: 0,
|
||||
context_rotations: 0,
|
||||
context_utilization_final: 0,
|
||||
outcome: Outcome.SUCCESS,
|
||||
retry_count: 0,
|
||||
});
|
||||
|
||||
this.telemetry.trackTaskCompletion(event);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Step 3:** For LLM-specific tracking, use `LlmTelemetryTrackerService` instead, which handles cost calculation and task type inference automatically:
|
||||
|
||||
```typescript
|
||||
import { LlmTelemetryTrackerService } from "../llm/llm-telemetry-tracker.service";
|
||||
|
||||
@Injectable()
|
||||
export class MyLlmService {
|
||||
constructor(private readonly telemetryTracker: LlmTelemetryTrackerService) {}
|
||||
|
||||
async chat(): Promise<void> {
|
||||
const start = Date.now();
|
||||
|
||||
// ... call LLM ...
|
||||
|
||||
this.telemetryTracker.trackLlmCompletion({
|
||||
model: "claude-sonnet-4-5",
|
||||
providerType: "claude",
|
||||
operation: "chat",
|
||||
durationMs: Date.now() - start,
|
||||
inputTokens: 150,
|
||||
outputTokens: 300,
|
||||
callingContext: "brain", // Used for task type inference
|
||||
success: true,
|
||||
});
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Adding Tracking in the Coordinator (Python)
|
||||
|
||||
Use the `build_task_event()` helper from `src/mosaic_telemetry.py`:
|
||||
|
||||
```python
|
||||
from src.mosaic_telemetry import build_task_event, get_telemetry_client
|
||||
|
||||
client = get_telemetry_client(app)
|
||||
if client is not None:
|
||||
event = build_task_event(
|
||||
instance_id=instance_id,
|
||||
task_type=TaskType.IMPLEMENTATION,
|
||||
complexity=Complexity.MEDIUM,
|
||||
outcome=Outcome.SUCCESS,
|
||||
duration_ms=5000,
|
||||
model="claude-sonnet-4-5",
|
||||
provider=Provider.ANTHROPIC,
|
||||
harness=Harness.CLAUDE_CODE,
|
||||
actual_input_tokens=3000,
|
||||
actual_output_tokens=1500,
|
||||
actual_cost_micros=52500,
|
||||
)
|
||||
client.track(event)
|
||||
```
|
||||
|
||||
### Troubleshooting
|
||||
|
||||
**Telemetry events not appearing:**
|
||||
|
||||
1. Check that `MOSAIC_TELEMETRY_ENABLED=true` is set
|
||||
2. Verify all three required variables are set: `SERVER_URL`, `API_KEY`, `INSTANCE_ID`
|
||||
3. Look for warning logs: `"Mosaic Telemetry is enabled but missing configuration"` indicates a missing variable
|
||||
4. Try dry-run mode to confirm events are being generated
|
||||
|
||||
**Console shows "Mosaic Telemetry is disabled":**
|
||||
|
||||
This is the expected message when `MOSAIC_TELEMETRY_ENABLED=false`. If you intended telemetry to be active, set it to `true`.
|
||||
|
||||
**Events queuing but not submitting:**
|
||||
|
||||
- Check that the telemetry API server at `MOSAIC_TELEMETRY_SERVER_URL` is reachable
|
||||
- Verify the API key is a valid 64-character hex string
|
||||
- The default submission interval is 5 minutes; wait at least one interval or call `stop()` to force a flush
|
||||
|
||||
**Prediction endpoint returns null:**
|
||||
|
||||
- Predictions require sufficient historical data in the telemetry API
|
||||
- Check the `metadata.confidence` field; `"none"` means no data exists for this combination
|
||||
- Predictions are cached for 6 hours; new data takes time to appear
|
||||
- The `PredictionService` logs startup refresh status; check logs for errors
|
||||
|
||||
**"Telemetry client error" in logs:**
|
||||
|
||||
- These are non-fatal. The SDK never blocks application logic.
|
||||
- Common causes: network timeout, invalid API key, server-side validation failure
|
||||
- Check the telemetry API logs for corresponding errors
|
||||
Reference in New Issue
Block a user