Keep both Mosaic Telemetry section (from develop) and Matrix Dev
Environment section (from feature branch) in .env.example.
Regenerate pnpm-lock.yaml with both dependency trees merged.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix sendThreadMessage room mismatch: use channelId from options instead of hardcoded controlRoomId
- Add .catch() to fire-and-forget handleRoomMessage to prevent silent error swallowing
- Wrap dispatchJob in try-catch for user-visible error reporting in handleFixCommand
- Add MATRIX_BOT_USER_ID validation in connect() to prevent infinite message loops
- Fix streamResponse error masking: wrap finally/catch side-effects in try-catch
- Replace unsafe type assertion with public getClient() in MatrixRoomService
- Add orphaned room warning in provisionRoom on DB failure
- Add provider identity to Herald error logs
- Add channelId to ThreadMessageOptions interface and all callers
- Add missing env var warnings in BridgeModule factory
- Fix JSON injection in setup-bot.sh: use jq for safe JSON construction
Fixes#377
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add CHAT_PROVIDERS injection token for bridge-agnostic access
- Conditional loading based on env vars (DISCORD_BOT_TOKEN, MATRIX_ACCESS_TOKEN)
- Both bridges can run simultaneously
- No crash if neither bridge is configured
- Tests verify all configuration combinations
Refs #379
- Add matrix_room_id column to workspace table (migration)
- Create MatrixRoomService for room provisioning and mapping
- Auto-create Matrix room on workspace provisioning (when configured)
- Support manual room linking for existing workspaces
- Unit tests for all mapping operations
Refs #380
- llm-cost-table.ts: Add undefined guard for MODEL_COSTS lookup
- llm-telemetry-tracker.service.ts: Allow undefined in callingContext
for exactOptionalPropertyTypes compatibility
Refs #371
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Create PredictionService for pre-task cost/token estimates
- Refresh common predictions on startup
- Integrate predictions into LLM telemetry tracker
- Add GET /api/telemetry/estimate endpoint
- Graceful degradation when no prediction data available
- Add unit tests for prediction service
Refs #373
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Create LlmTelemetryTrackerService for non-blocking event emission
- Normalize token usage across Anthropic, OpenAI, Ollama providers
- Add cost table with per-token pricing in microdollars
- Instrument chat, chatStream, and embed methods
- Infer task type from calling context
- Aggregate streaming tokens after stream ends with fallback estimation
- Add 69 unit tests for tracker service, cost table, and LLM service
Refs #371
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Database: 6 models in the Prisma schema had no CREATE TABLE migration:
cron_schedules, workspace_llm_settings, quality_gates, task_rejections,
token_budgets, llm_usage_logs. Same root cause as the federation tables.
CORS: Health check requests (Docker, load balancers) don't send Origin
headers. The CORS config was rejecting these in production, causing
/health to return 500 and Docker to mark the container as unhealthy.
Requests without Origin headers are not cross-origin per the CORS spec
and should be allowed through.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Federation is optional and should not prevent the app from starting
when DEFAULT_WORKSPACE_ID is not set. Changed from throwing (crash)
to logging a warning. The endpoint-level validation in the controller
still rejects requests when federation is unconfigured.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The NestJS tsconfig compiles to CommonJS (module: "CommonJS") but
package.json had "type": "module", causing Node.js v24 to treat the
CJS output as ESM and fail with "exports is not defined in ES module
scope" at startup.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Federation models (FederationConnection, FederatedIdentity,
FederationMessage) and their enums were defined in the Prisma schema
but never had CREATE TABLE migrations. This caused the
20260203_add_federation_event_subscriptions migration to fail with
"relation federation_messages does not exist".
Adds new migration 20260202200000 to create the 3 missing enums,
3 missing tables, all indexes, and foreign keys. Removes the
now-redundant ALTER TABLE from the 20260203 migration since
event_type is created with the table.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
npx is unavailable in production image since npm is removed.
Use ./node_modules/.bin/prisma directly instead.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add docker-entrypoint.sh that runs prisma migrate deploy before
starting the app, ensuring all tables exist on deployment
- Add "type": "module" to package.json to eliminate Node.js ESM
reparsing warning for eslint.config.js
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Pass BETTER_AUTH_SECRET through all 6 docker-compose files to API container
- Fix BullModule to parse VALKEY_URL instead of VALKEY_HOST/VALKEY_PORT,
matching all other Redis consumers in the codebase
- Migrate Prisma encryption from removed $use() middleware to $extends()
client extensions (Prisma 6.x compatibility), keeping extends PrismaClient
pattern with only account and llmProviderInstance getters overridden
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
AuthGuard used across federation controllers depends on AuthService,
which requires AuthModule to be imported. Matches pattern used by
TasksModule, ProjectsModule, and CredentialsModule.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CsrfGuard is already applied globally via APP_GUARD in AppModule.
The explicit @UseGuards(CsrfGuard) on FederationController caused a
DI error because CsrfService is not provided in FederationModule.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Node.js 24 (Krypton) entered Active LTS on 2026-02-09. Update all
Dockerfiles, CI pipelines, and engine constraint from node:20-alpine
to node:24-alpine. Corrected .trivyignore: tar CVEs come from Next.js
16.1.6 bundled tar@7.5.2 (not npm). Orchestrator and API images are
clean; web image needs Next.js upstream fix.
Fixes#367
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- docker/postgres/Dockerfile: build gosu from source with Go 1.26 via
multi-stage build (eliminates 1 CRITICAL + 5 HIGH Go stdlib CVEs)
- apps/{api,web,orchestrator}/Dockerfile: remove npm from production
images (eliminates 5 HIGH CVEs in npm's bundled cross-spawn/glob/tar)
- .trivyignore: trimmed from 16 to 5 CVEs (OpenBao only — 4 false
positives from Go pseudo-version + 1 real Go stdlib waiting on upstream)
Fixes#363
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CredentialsController uses AuthGuard which depends on AuthService.
NestJS resolves guard dependencies in the module context, so
CredentialsModule needs to import AuthModule directly.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove temporary debug RUN layers that were added during initial
build troubleshooting. These add build time and leak directory
structure into build logs unnecessarily.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Coordinator: install all dependencies from pyproject.toml instead of
hardcoded subset (missing slowapi, anthropic, opentelemetry-*).
API: FederationAgentService now gracefully disables when orchestrator
URL is not configured instead of throwing and crashing the app.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds directory-specific agent context templates for AI-assisted
development across all apps and packages.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Bump axios ^1.13.4→^1.13.5 (GHSA-43fc-jf86-j433). Add pnpm overrides for
lodash/lodash-es >=4.17.23 and undici >=6.23.0 to resolve transitive
vulnerabilities via chevrotain and discord.js.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
API:
- Add AuthModule import to JobEventsModule
- Add AuthModule import to JobStepsModule
- Fixes: AuthGuard dependency resolution in job modules
Orchestrator:
- Add @Optional() decorator to docker parameter in DockerSandboxService
- Fixes: NestJS trying to inject Docker class as dependency
All modules using AuthGuard must import AuthModule.
Docker parameter is optional for testing, needs @Optional() decorator.
- Add ConfigService mock for encryption configuration
- Add VaultService and CryptoService to test module
- Fixes: PrismaService dependency injection error in test
PrismaService requires VaultService for credential encryption.
Performance tests now properly provide all required dependencies.
Refs #341 (pipeline test failure)
API:
- Add AuthModule import to RunnerJobsModule
- Fixes: Nest can't resolve dependencies of AuthGuard
Orchestrator:
- Remove --prod flag from dependency installation
- Copy full node_modules tree to production stage
- Align Dockerfile with API pattern for monorepo builds
- Fixes: Cannot find module '@nestjs/core'
Both services now match the working API Dockerfile pattern.
Woodpecker sets CI=woodpecker and CI_PIPELINE_EVENT, not CI=true.
Updated the CI detection to check for both.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The .env.test file was being loaded in CI and overriding the CI-provided
DATABASE_URL, causing tests to try connecting to localhost:5432 instead of
the postgres:5432 service.
Fix: Only load .env.test when NOT in CI (check for CI or WOODPECKER env vars).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fixes integration test failures caused by missing DATABASE_URL environment variable.
Changes:
- Add dotenv as dev dependency to load .env.test in vitest setup
- Add .env.test to .gitignore to prevent committing test credentials
- Create .env.test.example with warning comments for documentation
- Add conditional test skipping when DATABASE_URL is not available
- Add DATABASE_URL format validation in vitest setup
- Add error handling to test cleanup to prevent silent failures
- Remove filesystem path disclosure from error messages
The fix allows integration tests to:
- Load DATABASE_URL from .env.test locally for developers with database setup
- Skip gracefully if DATABASE_URL is not available (no database running)
- Connect to postgres service in CI where DATABASE_URL is explicitly provided
Tests affected: auth-rls.integration.spec.ts and other integration tests
requiring real database connections.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implemented transparent encryption/decryption of LLM provider API keys
stored in llm_provider_instances.config JSON field using OpenBao Transit
encryption.
Implementation:
- Created llm-encryption.middleware.ts with encryption/decryption logic
- Auto-detects format (vault:v1: vs plaintext) for backward compatibility
- Idempotent encryption prevents double-encryption
- Registered middleware in PrismaService
- Created data migration script for active encryption
- Added migrate:encrypt-llm-keys command to package.json
Tests:
- 14 comprehensive unit tests
- 90.76% code coverage (exceeds 85% requirement)
- Tests create, read, update, upsert operations
- Tests error handling and backward compatibility
Migration:
- Lazy migration: New keys encrypted, old keys work until re-saved
- Active migration: pnpm --filter @mosaic/api migrate:encrypt-llm-keys
- No schema changes required
- Zero downtime
Security:
- Uses TransitKey.LLM_CONFIG from OpenBao Transit
- Keys never touch disk in plaintext (in-memory only)
- Transparent to LlmManagerService and providers
- Follows proven pattern from account-encryption.middleware.ts
Files:
- apps/api/src/prisma/llm-encryption.middleware.ts (new)
- apps/api/src/prisma/llm-encryption.middleware.spec.ts (new)
- apps/api/scripts/encrypt-llm-keys.ts (new)
- apps/api/prisma/migrations/20260207_encrypt_llm_api_keys/ (new)
- apps/api/src/prisma/prisma.service.ts (modified)
- apps/api/package.json (modified)
Note: The migration script (encrypt-llm-keys.ts) is not included in
tsconfig.json to avoid rootDir conflicts. It's executed via tsx which
handles TypeScript directly.
Refs #359
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implements secure credential encryption using OpenBao Transit API with
automatic fallback to AES-256-GCM when OpenBao is unavailable.
Features:
- AppRole authentication with automatic token renewal at 50% TTL
- Transit encrypt/decrypt with 4 named keys
- Automatic fallback to CryptoService when OpenBao unavailable
- Auto-detection of ciphertext format (vault:v1: vs AES)
- Request timeout protection (5s default)
- Health indicator for monitoring
- Backward compatible with existing AES-encrypted data
Security:
- ERROR-level logging for fallback
- Proper error propagation (no silent failures)
- Request timeouts prevent hung operations
- Secure credential file reading
Migrations:
- Account encryption middleware uses VaultService
- Uses TransitKey.ACCOUNT_TOKENS for OAuth tokens
- Backward compatible with existing encrypted data
Tests: 56 tests passing (36 VaultService + 20 middleware)
Closes#353
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implements Row-Level Security (RLS) policies on accounts and sessions tables with FORCE enforcement.
Core Implementation:
- Added FORCE ROW LEVEL SECURITY to accounts and sessions tables
- Created conditional owner bypass policies (when current_user_id() IS NULL)
- Created user-scoped access policies using current_user_id() helper
- Documented PostgreSQL superuser limitation with production deployment guide
Security Features:
- Prevents cross-user data access at database level
- Defense-in-depth security layer complementing application logic
- Owner bypass allows migrations and BetterAuth operations when no RLS context
- Production requires non-superuser application role (documented in migration)
Test Coverage:
- 22 comprehensive integration tests (9 accounts + 9 sessions + 4 context)
- Complete CRUD coverage: CREATE, READ, UPDATE, DELETE (own + others)
- Superuser detection with fail-fast error message
- Verification that blocked DELETE operations preserve data
- 100% test coverage, all tests passing
Integration:
- Uses RLS context provider from #351 (runWithRlsClient, getRlsClient)
- Parameterized queries using set_config() for security
- Transaction-scoped session variables with SET LOCAL
Files Created:
- apps/api/prisma/migrations/20260207_add_auth_rls_policies/migration.sql
- apps/api/src/auth/auth-rls.integration.spec.ts
Fixes#350
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add comprehensive JSDoc and inline comments documenting the known race
condition in the in-memory fallback path of ThrottlerValkeyStorageService.
The non-atomic read-modify-write in incrementMemory() is intentionally
left without a mutex because:
- It is only the fallback path when Valkey is unavailable
- The primary Valkey path uses atomic INCR and is race-free
- Adding locking to a rarely-used degraded path adds complexity
with minimal benefit
Also adds Logger.warn calls when falling back to in-memory mode
at runtime (Redis command failures).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace all console.error calls in MCP services with NestJS Logger
instances for consistent structured logging in production.
- mcp-hub.service.ts: Add Logger instance, replace console.error in
onModuleDestroy cleanup
- stdio-transport.ts: Add Logger instance, replace console.error for
stderr output (as warn) and JSON parse failures (as error)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
createAuthMiddleware was calling SET LOCAL on the raw PrismaClient
outside of any transaction. In PostgreSQL, SET LOCAL without a
transaction acts as a session-level SET, which can leak RLS context
to subsequent requests sharing the same pooled connection, enabling
cross-tenant data access.
Wrapped the setCurrentUser call and downstream handler execution
inside a $transaction block so SET LOCAL is automatically reverted
when the transaction ends (on both success and failure).
Added comprehensive test suite for db-context module verifying:
- RLS context is set on the transaction client, not the raw client
- next() executes inside the transaction boundary
- Authentication errors prevent any transaction from starting
- Errors in downstream handlers propagate correctly
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Set forbidNonWhitelisted: true in ValidationPipe to reject requests
with unknown DTO properties, preventing mass assignment vulnerabilities
- Reject requests with no Origin header in production (SEC-API-26)
- Restrict localhost:3001 to development mode only
- Update CORS tests to cover production/development origin validation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Graceful container shutdown: detect "not running" containers and skip
force-remove escalation, only SIGKILL for genuine stop failures
- data: URI stripping: add security audit logging via NestJS Logger
when data: URIs are blocked in markdown links and images
- Orchestrator bootstrap: replace void bootstrap() with .catch() handler
for clear startup failure logging and clean process.exit(1)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
All other search DTOs (SemanticSearchBodyDto, HybridSearchBodyDto,
BrainQueryDto, BrainSearchDto) already enforce @MaxLength(500) on their
query fields. SearchQueryDto.q was missed, leaving the full-text
knowledge search endpoint accepting arbitrarily long queries.
Adds @MaxLength(500) decorator and validation test coverage.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace Promise.all of individual findUnique queries per tag with a
single findMany batch query. Only missing tags are created individually.
Tag associations now use createMany instead of individual creates.
Also deduplicates tags by slug via Map, preventing duplicate entries.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>