matrix-bot-sdk depends on the deprecated `request` library which pulls
in vulnerable form-data (<2.5.4, critical: unsafe random boundary) and
qs (<6.14.1, high: DoS via memory exhaustion). Add pnpm overrides to
force patched versions since matrix-bot-sdk has no newer release.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Keep both Mosaic Telemetry section (from develop) and Matrix Dev
Environment section (from feature branch) in .env.example.
Regenerate pnpm-lock.yaml with both dependency trees merged.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The Docker build failed because pip couldn't find mosaicstack-telemetry
from the private Gitea PyPI registry. Copy pip.conf into the image so
pip resolves the extra-index-url during docker build.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix sendThreadMessage room mismatch: use channelId from options instead of hardcoded controlRoomId
- Add .catch() to fire-and-forget handleRoomMessage to prevent silent error swallowing
- Wrap dispatchJob in try-catch for user-visible error reporting in handleFixCommand
- Add MATRIX_BOT_USER_ID validation in connect() to prevent infinite message loops
- Fix streamResponse error masking: wrap finally/catch side-effects in try-catch
- Replace unsafe type assertion with public getClient() in MatrixRoomService
- Add orphaned room warning in provisionRoom on DB failure
- Add provider identity to Herald error logs
- Add channelId to ThreadMessageOptions interface and all callers
- Add missing env var warnings in BridgeModule factory
- Fix JSON injection in setup-bot.sh: use jq for safe JSON construction
Fixes#377
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Quick start guide for dev environment
- Architecture overview with service responsibilities
- Command reference with examples
- Configuration reference
- Streaming response architecture
- Deployment considerations
Refs #386
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
MB-007 (Streaming AI responses) done in commit 93cd314.
20 new tests, 132 total bridge tests pass.
Launching MB-008 (E2E tests) and MB-009 (Docs) in parallel.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
MB-005 (Matrix command handling) and MB-006 (Herald adapter) done.
Both committed in ad24720 (bundled by pre-commit hooks).
49 Matrix tests pass, 112 total bridge tests pass.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix prettier formatting for Tooltip formatter props (single-line)
- Fix no-base-to-string by using typed props instead of Record<string, unknown>
- Fix restrict-template-expressions by wrapping number in String()
Refs #375
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix Tooltip formatter/labelFormatter type overload conflicts
- Fix Pie label render props type mismatch
- Fix telemetry.ts date split array access type
Refs #375
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add CHAT_PROVIDERS injection token for bridge-agnostic access
- Conditional loading based on env vars (DISCORD_BOT_TOKEN, MATRIX_ACCESS_TOKEN)
- Both bridges can run simultaneously
- No crash if neither bridge is configured
- Tests verify all configuration combinations
Refs #379
The mosaicstack-telemetry package lacks py.typed marker. Add type
ignore comment consistent with other import sites.
Refs #370
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add matrix_room_id column to workspace table (migration)
- Create MatrixRoomService for room provisioning and mapping
- Auto-create Matrix room on workspace provisioning (when configured)
- Support manual room linking for existing workspaces
- Unit tests for all mapping operations
Refs #380
The mosaicstack-telemetry package is hosted on the Gitea PyPI registry.
CI pip install needs --extra-index-url to find it.
Refs #370
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- llm-cost-table.ts: Add undefined guard for MODEL_COSTS lookup
- llm-telemetry-tracker.service.ts: Allow undefined in callingContext
for exactOptionalPropertyTypes compatibility
Refs #371
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Install recharts for data visualization
- Add Usage nav item to sidebar navigation
- Create telemetry API service with data fetching functions
- Build dashboard page with summary cards, charts, and time range selector
- Token usage line chart, cost breakdown bar chart, task outcome pie chart
- Loading and empty states handled
- Responsive layout with PDA-friendly design
- Add unit tests (14 tests passing)
Refs #375
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Create comprehensive telemetry documentation at docs/telemetry.md
- Cover configuration, event schema, predictions, SDK reference
- Include development guide with dry-run mode and troubleshooting
- Link from main README.md
Refs #376
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Instrument Coordinator.process_queue() with timing and telemetry events
- Instrument OrchestrationLoop.process_next_issue() with quality gate tracking
- Add agent-to-telemetry mapping (model, provider, harness per agent name)
- Map difficulty levels to Complexity enum and gate names to QualityGate enum
- Track retry counts per issue (increment on failure, clear on success)
- Emit FAILURE outcome on agent spawn failure or quality gate rejection
- Non-blocking: telemetry errors are logged and swallowed, never delay tasks
- Pass telemetry client from FastAPI lifespan to Coordinator constructor
- Add 33 unit tests covering all telemetry scenarios
Refs #372
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Create PredictionService for pre-task cost/token estimates
- Refresh common predictions on startup
- Integrate predictions into LLM telemetry tracker
- Add GET /api/telemetry/estimate endpoint
- Graceful degradation when no prediction data available
- Add unit tests for prediction service
Refs #373
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Create LlmTelemetryTrackerService for non-blocking event emission
- Normalize token usage across Anthropic, OpenAI, Ollama providers
- Add cost table with per-token pricing in microdollars
- Instrument chat, chatStream, and embed methods
- Infer task type from calling context
- Aggregate streaming tokens after stream ends with fallback estimation
- Add 69 unit tests for tracker service, cost table, and LLM service
Refs #371
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add MOSAIC_TELEMETRY_* variables to .env.example with descriptions
- Pass telemetry env vars to api service in production compose
- Pass telemetry env vars to coordinator service in dev and swarm composes
- Swarm composes default to production URL (https://tel-api.mosaicstack.dev)
- Dev compose includes commented-out telemetry-api service placeholder
- All compose files default MOSAIC_TELEMETRY_ENABLED to false for safety
Refs #374
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add mosaicstack-telemetry>=0.1.0 to pyproject.toml dependencies
- Configure Gitea PyPI registry via pip.conf (extra-index-url)
- Integrate TelemetryClient in FastAPI lifespan (start_async/stop_async)
- Store client on app.state.mosaic_telemetry for downstream access
- Create mosaic_telemetry.py helper module with:
- get_telemetry_client(): retrieve client from app state
- build_task_event(): construct TaskCompletionEvent with coordinator defaults
- create_telemetry_config(): create config from MOSAIC_TELEMETRY_* env vars
- Add 28 unit tests covering config, helpers, disabled mode, and lifespan
- New module has 100% test coverage
Refs #370
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Database: 6 models in the Prisma schema had no CREATE TABLE migration:
cron_schedules, workspace_llm_settings, quality_gates, task_rejections,
token_budgets, llm_usage_logs. Same root cause as the federation tables.
CORS: Health check requests (Docker, load balancers) don't send Origin
headers. The CORS config was rejecting these in production, causing
/health to return 500 and Docker to mark the container as unhealthy.
Requests without Origin headers are not cross-origin per the CORS spec
and should be allowed through.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Added CSRF_SECRET to docker-compose.swarm.portainer.yml (the active
Portainer deployment) and both example compose files. Also added
ENCRYPTION_KEY to the example files where it was missing.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Both env vars were missing from the API service environment in
docker-compose.prod.yml and docker-compose.build.yml, causing the
CSRF_SECRET check to fail at startup even when set in .env.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Standalone Synapse + Element Web deployment for Docker Swarm/Portainer.
Separate infrastructure from Mosaic Stack (same pattern as Authentik).
Includes: Synapse, Element Web, dedicated PostgreSQL, optional coturn.
Traefik labels match existing Stack conventions.
Refs #387
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Federation is optional and should not prevent the app from starting
when DEFAULT_WORKSPACE_ID is not set. Changed from throwing (crash)
to logging a warning. The endpoint-level validation in the controller
still rejects requests when federation is unconfigured.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The NestJS tsconfig compiles to CommonJS (module: "CommonJS") but
package.json had "type": "module", causing Node.js v24 to treat the
CJS output as ESM and fail with "exports is not defined in ES module
scope" at startup.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Federation models (FederationConnection, FederatedIdentity,
FederationMessage) and their enums were defined in the Prisma schema
but never had CREATE TABLE migrations. This caused the
20260203_add_federation_event_subscriptions migration to fail with
"relation federation_messages does not exist".
Adds new migration 20260202200000 to create the 3 missing enums,
3 missing tables, all indexes, and foreign keys. Removes the
now-redundant ALTER TABLE from the 20260203 migration since
event_type is created with the table.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The base openbao image's docker-entrypoint.sh injects -dev-root-token-id
and -dev-listen-address flags when it sees 'server' as $1, causing the
server to exit immediately (code 0). Override entrypoint with dumb-init
and call bao directly to avoid the dev-mode flag injection.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The CMD exec form drops everything after & in the healthcheck URL,
causing uninitcode=200 and sealedcode=200 params to be lost. Without
them, OpenBao returns 501 when uninitialized, healthcheck fails, and
Swarm kills the container before the init sidecar can reach it.
Switch to CMD-SHELL with single-quoted URL to preserve query params.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
npx is unavailable in production image since npm is removed.
Use ./node_modules/.bin/prisma directly instead.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
BullMQ requires noeviction to prevent silent job data loss. With
allkeys-lru, Valkey could evict keys BullMQ depends on for job tracking.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add docker-entrypoint.sh that runs prisma migrate deploy before
starting the app, ensuring all tables exist on deployment
- Add "type": "module" to package.json to eliminate Node.js ESM
reparsing warning for eslint.config.js
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>