The mosaicstack-telemetry package is hosted on the Gitea PyPI registry.
CI pip install needs --extra-index-url to find it.
Refs #370
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- llm-cost-table.ts: Add undefined guard for MODEL_COSTS lookup
- llm-telemetry-tracker.service.ts: Allow undefined in callingContext
for exactOptionalPropertyTypes compatibility
Refs #371
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Install recharts for data visualization
- Add Usage nav item to sidebar navigation
- Create telemetry API service with data fetching functions
- Build dashboard page with summary cards, charts, and time range selector
- Token usage line chart, cost breakdown bar chart, task outcome pie chart
- Loading and empty states handled
- Responsive layout with PDA-friendly design
- Add unit tests (14 tests passing)
Refs #375
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Create comprehensive telemetry documentation at docs/telemetry.md
- Cover configuration, event schema, predictions, SDK reference
- Include development guide with dry-run mode and troubleshooting
- Link from main README.md
Refs #376
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Instrument Coordinator.process_queue() with timing and telemetry events
- Instrument OrchestrationLoop.process_next_issue() with quality gate tracking
- Add agent-to-telemetry mapping (model, provider, harness per agent name)
- Map difficulty levels to Complexity enum and gate names to QualityGate enum
- Track retry counts per issue (increment on failure, clear on success)
- Emit FAILURE outcome on agent spawn failure or quality gate rejection
- Non-blocking: telemetry errors are logged and swallowed, never delay tasks
- Pass telemetry client from FastAPI lifespan to Coordinator constructor
- Add 33 unit tests covering all telemetry scenarios
Refs #372
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Create PredictionService for pre-task cost/token estimates
- Refresh common predictions on startup
- Integrate predictions into LLM telemetry tracker
- Add GET /api/telemetry/estimate endpoint
- Graceful degradation when no prediction data available
- Add unit tests for prediction service
Refs #373
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Create LlmTelemetryTrackerService for non-blocking event emission
- Normalize token usage across Anthropic, OpenAI, Ollama providers
- Add cost table with per-token pricing in microdollars
- Instrument chat, chatStream, and embed methods
- Infer task type from calling context
- Aggregate streaming tokens after stream ends with fallback estimation
- Add 69 unit tests for tracker service, cost table, and LLM service
Refs #371
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add MOSAIC_TELEMETRY_* variables to .env.example with descriptions
- Pass telemetry env vars to api service in production compose
- Pass telemetry env vars to coordinator service in dev and swarm composes
- Swarm composes default to production URL (https://tel-api.mosaicstack.dev)
- Dev compose includes commented-out telemetry-api service placeholder
- All compose files default MOSAIC_TELEMETRY_ENABLED to false for safety
Refs #374
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add mosaicstack-telemetry>=0.1.0 to pyproject.toml dependencies
- Configure Gitea PyPI registry via pip.conf (extra-index-url)
- Integrate TelemetryClient in FastAPI lifespan (start_async/stop_async)
- Store client on app.state.mosaic_telemetry for downstream access
- Create mosaic_telemetry.py helper module with:
- get_telemetry_client(): retrieve client from app state
- build_task_event(): construct TaskCompletionEvent with coordinator defaults
- create_telemetry_config(): create config from MOSAIC_TELEMETRY_* env vars
- Add 28 unit tests covering config, helpers, disabled mode, and lifespan
- New module has 100% test coverage
Refs #370
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Database: 6 models in the Prisma schema had no CREATE TABLE migration:
cron_schedules, workspace_llm_settings, quality_gates, task_rejections,
token_budgets, llm_usage_logs. Same root cause as the federation tables.
CORS: Health check requests (Docker, load balancers) don't send Origin
headers. The CORS config was rejecting these in production, causing
/health to return 500 and Docker to mark the container as unhealthy.
Requests without Origin headers are not cross-origin per the CORS spec
and should be allowed through.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Added CSRF_SECRET to docker-compose.swarm.portainer.yml (the active
Portainer deployment) and both example compose files. Also added
ENCRYPTION_KEY to the example files where it was missing.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Both env vars were missing from the API service environment in
docker-compose.prod.yml and docker-compose.build.yml, causing the
CSRF_SECRET check to fail at startup even when set in .env.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Standalone Synapse + Element Web deployment for Docker Swarm/Portainer.
Separate infrastructure from Mosaic Stack (same pattern as Authentik).
Includes: Synapse, Element Web, dedicated PostgreSQL, optional coturn.
Traefik labels match existing Stack conventions.
Refs #387
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Federation is optional and should not prevent the app from starting
when DEFAULT_WORKSPACE_ID is not set. Changed from throwing (crash)
to logging a warning. The endpoint-level validation in the controller
still rejects requests when federation is unconfigured.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The NestJS tsconfig compiles to CommonJS (module: "CommonJS") but
package.json had "type": "module", causing Node.js v24 to treat the
CJS output as ESM and fail with "exports is not defined in ES module
scope" at startup.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Federation models (FederationConnection, FederatedIdentity,
FederationMessage) and their enums were defined in the Prisma schema
but never had CREATE TABLE migrations. This caused the
20260203_add_federation_event_subscriptions migration to fail with
"relation federation_messages does not exist".
Adds new migration 20260202200000 to create the 3 missing enums,
3 missing tables, all indexes, and foreign keys. Removes the
now-redundant ALTER TABLE from the 20260203 migration since
event_type is created with the table.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The base openbao image's docker-entrypoint.sh injects -dev-root-token-id
and -dev-listen-address flags when it sees 'server' as $1, causing the
server to exit immediately (code 0). Override entrypoint with dumb-init
and call bao directly to avoid the dev-mode flag injection.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The CMD exec form drops everything after & in the healthcheck URL,
causing uninitcode=200 and sealedcode=200 params to be lost. Without
them, OpenBao returns 501 when uninitialized, healthcheck fails, and
Swarm kills the container before the init sidecar can reach it.
Switch to CMD-SHELL with single-quoted URL to preserve query params.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
npx is unavailable in production image since npm is removed.
Use ./node_modules/.bin/prisma directly instead.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
BullMQ requires noeviction to prevent silent job data loss. With
allkeys-lru, Valkey could evict keys BullMQ depends on for job tracking.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add docker-entrypoint.sh that runs prisma migrate deploy before
starting the app, ensuring all tables exist on deployment
- Add "type": "module" to package.json to eliminate Node.js ESM
reparsing warning for eslint.config.js
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Pass BETTER_AUTH_SECRET through all 6 docker-compose files to API container
- Fix BullModule to parse VALKEY_URL instead of VALKEY_HOST/VALKEY_PORT,
matching all other Redis consumers in the codebase
- Migrate Prisma encryption from removed $use() middleware to $extends()
client extensions (Prisma 6.x compatibility), keeping extends PrismaClient
pattern with only account and llmProviderInstance getters overridden
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
AuthGuard used across federation controllers depends on AuthService,
which requires AuthModule to be imported. Matches pattern used by
TasksModule, ProjectsModule, and CredentialsModule.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CsrfGuard is already applied globally via APP_GUARD in AppModule.
The explicit @UseGuards(CsrfGuard) on FederationController caused a
DI error because CsrfService is not provided in FederationModule.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Node.js 24 (Krypton) entered Active LTS on 2026-02-09. Update all
Dockerfiles, CI pipelines, and engine constraint from node:20-alpine
to node:24-alpine. Corrected .trivyignore: tar CVEs come from Next.js
16.1.6 bundled tar@7.5.2 (not npm). Orchestrator and API images are
clean; web image needs Next.js upstream fix.
Fixes#367
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Two Trivy fixes:
1. Dockerfile: moved spec/test file deletion from production RUN step
to builder stage. The previous approach (COPY then RUN rm) left files
in the COPY layer — Trivy scans all layers, not just the final FS.
Now spec files are deleted in builder BEFORE COPY to production.
2. .trivyignore: added 3 tar CVEs (CVE-2026-23745/23950/24842) with
documented rationale. tar@7.5.2 is bundled inside npm which ships
with node:20-alpine. Not upgradeable — not our dependency. npm is
already removed from all production images.
Verified: local Trivy scan passes (exit code 0, 0 findings)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add build-shared step to web.yml so lint/typecheck/test can resolve
@mosaic/shared types (same fix previously applied to api.yml)
- Remove compiled .spec.js/.test.js files from orchestrator production
image to prevent Trivy secret scanning false positives from test
fixtures (fake AWS keys and RSA private keys in secret-scanner tests)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- docker/postgres/Dockerfile: build gosu from source with Go 1.26 via
multi-stage build (eliminates 1 CRITICAL + 5 HIGH Go stdlib CVEs)
- apps/{api,web,orchestrator}/Dockerfile: remove npm from production
images (eliminates 5 HIGH CVEs in npm's bundled cross-spawn/glob/tar)
- .trivyignore: trimmed from 16 to 5 CVEs (OpenBao only — 4 false
positives from Go pseudo-version + 1 real Go stdlib waiting on upstream)
Fixes#363
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
All 16 suppressed CVEs are in upstream binaries/packages we don't control:
- Go stdlib CVEs in openbao bin/bao (Go 1.25.6) and postgres gosu (Go 1.24.6)
- OpenBao CVE false positives (Trivy reads Go pseudo-version, we run 2.5.0)
- npm bundled cross-spawn/glob/tar CVEs in node:20-alpine base image
Updated all 6 Trivy scan steps across 5 pipelines to use --ignorefile.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Three fixes for the coordinator pipeline:
1. Use bandit.yaml config file (-c bandit.yaml) so global skips
and exclude_dirs are respected in CI.
2. Upgrade pip to >=25.3 in the install step so pip-audit doesn't
fail on the stale pip 24.0 bundled with python:3.11-slim.
3. Clean up nosec inline comments to bare "# nosec BXXX" format,
moving explanations to a separate comment line above. This
prevents bandit from misinterpreting trailing text as test IDs.
Fixes#365
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The lint and typecheck steps fail because @mosaic/shared isn't built.
Add a build-shared step that compiles the shared package before lint
and typecheck run, both of which now depend on build-shared in
addition to prisma-generate.
Fixes#364
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
gosu doesn't publish proper Go module semver tags, so
`go install github.com/tianon/gosu@v1.19` fails with "no matching
versions". Replace the multi-stage golang builder with
`COPY --from=tianon/gosu /gosu /usr/local/bin/gosu`, which pulls the
pre-built binary from the official tianon/gosu Docker image. This image
is rebuilt with recent Go toolchains, so it still addresses the Go
stdlib CVEs documented in the Dockerfile comments.
Fixes#363
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The lint step in .woodpecker/api.yml depended only on install, but
ESLint needs Prisma-generated client types to resolve imports. Without
prisma-generate running first, all Prisma type references produce
false-positive errors (3,919 total). Changing the dependency from
install to prisma-generate fixes the issue since prisma-generate
already depends on install.
Fixes#364
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The gosu 1.19 binary bundled in the postgres base image was compiled
with Go 1.24.6, which contains CVE-2025-68121 (CRITICAL) and 5 HIGH
severity Go stdlib vulnerabilities. Since upstream gosu has not released
a version built with patched Go (1.24.13+ / 1.25.7+), this adds a
multi-stage Docker build that recompiles gosu from source using Go 1.26.
Changes:
- Pin postgres base image to 17.7-alpine3.22 for reproducibility
- Add golang:1.26-alpine3.22 builder stage to compile gosu v1.19
- Replace bundled gosu binary with freshly built version
- Pin all postgres:17-alpine references across compose files and CI
CVEs fixed:
- CVE-2025-68121 (CRITICAL): Go crypto/tls vulnerability
- CVE-2025-58183 (HIGH): Go archive/tar unbounded allocation
- CVE-2025-61726 (HIGH): Go net/url memory exhaustion
- CVE-2025-61728 (HIGH): Go archive/zip CPU exhaustion
- CVE-2025-61729 (HIGH): Go crypto/x509 DoS
- CVE-2025-61730 (HIGH): Go TLS 1.3 handshake vulnerability
Fixes#363
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Pin OpenBao base image from unpinned :2 tag to :2.5.0 (latest stable,
released 2026-02-04) in both the Dockerfile and the dev docker-compose.
CVEs resolved:
- CVE-2025-68121 (CRITICAL): Go stdlib crypto/tls session resumption
- CVE-2024-8185 (HIGH): DoS via Raft join requests
- CVE-2024-9180 (HIGH): Root namespace privilege escalation
- CVE-2025-59043 (HIGH): DoS via malicious JSON
- CVE-2025-64761 (HIGH): Identity group root escalation
All fixed in OpenBao >= 2.4.4; v2.5.0 includes all patches plus new
features (horizontal read scalability, OCI plugin distribution).
Files changed:
- docker/openbao/Dockerfile: FROM tag 2 -> 2.5.0
- docker/docker-compose.yml: openbao + openbao-init image tags 2 -> 2.5.0
The production/swarm compose files use the custom-built
git.mosaicstack.dev/mosaic/stack-openbao image which is built FROM
this Dockerfile, so they inherit the fix on next CI build.
Fixes#363