- Validate error against allowlist of OAuth error codes
- Unknown errors map to generic message
- Encode all URL parameters
Refs #337
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Created Zod schemas for TaskState, AgentState, and OrchestratorEvent
- Added ValkeyValidationError class for detailed error context
- Validate task and agent state data after JSON.parse
- Validate events in subscribeToEvents handler
- Corrupted/tampered data now rejected with clear errors including:
- Key name for context
- Data snippet (truncated to 100 chars)
- Underlying Zod validation error
- Prevents silent propagation of invalid data (SEC-ORCH-6)
- Added 20 new tests for validation scenarios
Refs #337
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Use SCAN with cursor for non-blocking iteration
- Prevents Redis DoS under high key counts
- Same API, safer implementation
Refs #337
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add COORDINATOR_API_KEY config option to orchestrator.config.ts
- Include X-API-Key header in coordinator requests when configured
- Log security warning if COORDINATOR_API_KEY not configured in production
- Log security warning if coordinator URL uses HTTP in production
- Add tests verifying API key inclusion in requests and warning behavior
Refs #337
- Sandbox now enabled by default for security
- Logs prominent warning when explicitly disabled
- Agents run in containers unless SANDBOX_ENABLED=false
Refs #337
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add OIDC_ENABLED environment variable to control OIDC authentication
- Validate required OIDC env vars (OIDC_ISSUER, OIDC_CLIENT_ID, OIDC_CLIENT_SECRET)
are present when OIDC is enabled
- Validate OIDC_ISSUER ends with trailing slash for correct discovery URL
- Throw descriptive error at startup if configuration is invalid
- Skip OIDC plugin registration when OIDC is disabled
- Add comprehensive tests for validation logic (17 test cases)
Refs #337
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
SEC-API-2: WorkspaceGuard now propagates database errors as 500s instead of
returning "access denied". Only Prisma P2025 (record not found) is treated
as "user not a member".
SEC-API-3: PermissionGuard now propagates database errors as 500s instead of
returning null role (which caused permission denied). Only Prisma P2025 is
treated as "not a member".
This prevents connection timeouts, pool exhaustion, and other infrastructure
errors from being misreported to users as authorization failures.
Refs #337
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add scanError field and scannedSuccessfully flag to SecretScanResult
- File read errors no longer falsely report as "clean"
- Callers can distinguish clean files from scan failures
- Update getScanSummary to track filesWithErrors count
- SecretsDetectedError now reports files that couldn't be scanned
- Add tests verifying error handling behavior for file access issues
Refs #337
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add OrchestratorApiKeyGuard to protect agent management endpoints (spawn,
kill, kill-all, status) from unauthorized access. Uses X-API-Key header
with constant-time comparison to prevent timing attacks.
- Create apps/orchestrator/src/common/guards/api-key.guard.ts
- Add comprehensive tests for all guard scenarios
- Apply guard to AgentsController (controller-level protection)
- Document ORCHESTRATOR_API_KEY in .env.example files
- Health endpoints remain unauthenticated for monitoring
Security: Prevents unauthorized users from draining API credits or
killing all agents via unprotected endpoints.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix CRITICAL: Correct 5 environment variable names to match actual config
(VALKEY_HOST not ORCHESTRATOR_VALKEY_HOST, CLAUDE_API_KEY not ORCHESTRATOR_CLAUDE_API_KEY, etc.)
- Fix CRITICAL: Correct quality gate profiles table to match actual gate-config service
(minimal = tests only, not typecheck+lint; add agent type defaults)
- Fix IMPORTANT: Add missing gateProfile optional field to spawn request docs
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix CRITICAL: Increase single-spawn threshold from 10ms to 50ms (CI flakiness)
- Fix CRITICAL: Replace no-op validation test with real backoff scale tests
- Fix IMPORTANT: Add warmup iterations before all timed measurements
- Fix IMPORTANT: Increase scan position ratio tolerance to 10x for sub-ms noise
- Refactored queue perf tests to use actual service methods (calculateBackoffDelay)
- Helper function to reduce spawn request duplication
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Create TaskProgressWidget showing live agent task execution progress:
- Fetches from orchestrator /agents API with 15s auto-refresh
- Shows stats (total/active/done/stopped), sorted task list
- Agent type badges (worker/reviewer/tester)
- Elapsed time tracking, error display
- Dark mode support, PDA-friendly language
- Registered in WidgetRegistry for dashboard use
Includes 7 unit tests covering all states.
Fixes#101
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Update README with complete API reference, module architecture tree,
service catalog, Valkey state keys, quality gate profiles, and
configuration reference.
Fixes#230
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add GET /agents endpoint to orchestrator controller
- Update AgentStatusWidget to fetch from real API instead of mock data
- Add comprehensive tests for listAgents endpoint
- Auto-refresh agent list every 30 seconds
- Display agent status with proper icons and formatting
- Show error states when API is unavailable
Fixes#233
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Implements comprehensive LLM usage tracking with analytics endpoints.
Implementation:
- Added LlmUsageLog model to Prisma schema
- Created llm-usage module with service, controller, and DTOs
- Added tracking for token usage, costs, and durations
- Implemented analytics aggregation by provider, model, and task type
- Added filtering by workspace, provider, model, user, and date range
Testing:
- 20 unit tests with 90.8% coverage (exceeds 85% requirement)
- Tests for service and controller with full error handling
- Tests use Vitest following project conventions
API Endpoints:
- GET /api/llm-usage/analytics - Aggregated usage analytics
- GET /api/llm-usage/by-workspace/:workspaceId - Workspace usage logs
- GET /api/llm-usage/by-workspace/:workspaceId/provider/:provider - Provider logs
- GET /api/llm-usage/by-workspace/:workspaceId/model/:model - Model logs
Database:
- LlmUsageLog table with indexes for efficient queries
- Relations to User, Workspace, and LlmProviderInstance
- Ready for migration with: pnpm prisma migrate dev
Refs #309
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Closes three CSRF security gaps identified in code review:
1. Added X-CSRF-Token and X-Workspace-Id to CORS allowed headers
- Updated apps/api/src/main.ts to accept CSRF token headers
2. Integrated CSRF token handling in web client
- Added fetchCsrfToken() to fetch token from API
- Store token in memory (not localStorage for security)
- Automatically include X-CSRF-Token in POST/PUT/PATCH/DELETE
- Implement automatic token refresh on 403 CSRF errors
- Added comprehensive test coverage for CSRF functionality
3. Applied CSRF Guard globally
- Added CsrfGuard as APP_GUARD in app.module.ts
- Verified @SkipCsrf() decorator works for exempted endpoints
All tests passing. CSRF protection now enforced application-wide.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Replaced setTimeout hacks with proper polling mechanism:
- Added pollForQueryResponse() function with configurable polling interval
- Polls every 500ms with 30s timeout
- Properly handles DELIVERED and FAILED message states
- Throws errors for failures and timeouts
Updated dashboard to use polling instead of arbitrary delays:
- Removed setTimeout(resolve, 1000) hacks
- Added proper async/await for query responses
- Improved response data parsing for new query format
- Better error handling via polling exceptions
This fixes race conditions and unreliable data loading.
Fixes#298
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Update WorkspaceGuard to support query string as fallback (backward compatibility)
- Priority order: Header > Param > Body > Query
- Update web client to send workspace ID via X-Workspace-Id header (recommended)
- Extend apiRequest helpers to accept workspace ID option
- Update fetchTasks to use header instead of query parameter
- Add comprehensive tests for all workspace ID transmission methods
- Tests passing: API 11 tests, Web 6 new tests (total 494)
This ensures consistent workspace ID handling with proper multi-tenant isolation
while maintaining backward compatibility with existing query string approaches.
Fixes#194
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Update AuthUser type in @mosaic/shared to include workspace fields
- Update AuthGuard to support both cookie-based and Bearer token authentication
- Add /auth/session endpoint for session validation
- Install and configure cookie-parser middleware
- Update CurrentUser decorator to use shared AuthUser type
- Update tests for cookie and token authentication (20 tests passing)
This ensures consistent authentication handling across API and web client,
with proper type safety and support for both web browsers (cookies) and
API clients (Bearer tokens).
Fixes#193
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add retry capability with exponential backoff for HTTP requests.
- Implement withRetry utility with configurable retry logic
- Exponential backoff: 1s, 2s, 4s, 8s (max)
- Maximum 3 retries by default
- Retry on network errors (ECONNREFUSED, ETIMEDOUT, etc.)
- Retry on 5xx server errors and 429 rate limit
- Do NOT retry on 4xx client errors
- Integrate with connection service for HTTP requests
Fixes#293
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add DTO validation for FederationCapabilities to ensure proper structure.
- Create FederationCapabilitiesDto with class-validator decorators
- Validate boolean types for capability flags
- Validate string type for protocolVersion
- Update IncomingConnectionRequestDto to use validated DTO
- Add comprehensive unit tests for DTO validation
Fixes#295
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add protocol version validation during connection handshake.
- Define FEDERATION_PROTOCOL_VERSION constant (1.0)
- Validate version on both outgoing and incoming connections
- Require exact version match for compatibility
- Log and audit version mismatches
Fixes#292
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add test to verify workspace connection limit enforcement.
Default limit is 100 connections per workspace.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Security improvements:
- Create redaction utility to prevent PII leakage in logs
- Redact sensitive fields: privateKey, tokens, passwords, metadata, payloads
- Redact user IDs: convert to "user-***"
- Redact instance IDs: convert to "instance-***"
- Support recursive redaction for nested objects and arrays
Changes:
- Add redact.util.ts with redaction functions
- Add comprehensive test coverage for redaction
- Support for:
- Sensitive field detection (privateKey, token, etc.)
- User ID redaction (userId, remoteUserId, localUserId, user.id)
- Instance ID redaction (instanceId, remoteInstanceId, instance.id)
- Nested object and array redaction
- Primitive and null/undefined handling
Next steps:
- Apply redactSensitiveData() to all logger calls in federation services
- Use debug level for detailed logs with sensitive data
Part of M7.1 Remediation Sprint P1 security fixes.
Refs #287
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Added @UseGuards(AuthGuard) and rate limiting (@Throttle) to
/api/v1/federation/identity/verify endpoint. Configured strict
rate limit (10 req/min) to prevent abuse of this previously
public endpoint. Added test to verify guards are applied.
Security improvement: Prevents unauthorized access and rate limit
abuse of identity verification endpoint.
Fixes#290
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Modified decrypt() error handling to only log error type without
stack traces, error details, or encrypted content. Added test to
verify sensitive data is not exposed in logs.
Security improvement: Prevents leakage of encrypted data or partial
decryption results through error logs.
Fixes#289
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Changed modulusLength from 2048 to 4096 in generateKeypair() method
following NIST recommendations for long-term security. Added test to
verify generated keys meet the minimum size requirement.
Security improvement: RSA-4096 provides better protection against
future cryptographic attacks as computational power increases.
Fixes#288
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Move status validation from post-retrieval checks into Prisma WHERE
clauses. This prevents TOCTOU issues and ensures only ACTIVE
connections are retrieved. Removed redundant status checks after
retrieval in both query and command services.
Security improvement: Enforces status=ACTIVE in database query rather
than checking after retrieval, preventing race conditions.
Fixes#283
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>