- Add sanitize_for_prompt() function to security module
- Remove suspicious control characters (except whitespace)
- Detect and log common prompt injection patterns
- Escape dangerous XML-like tags used for prompt manipulation
- Truncate user content to max length (default 50000 chars)
- Integrate sanitization in parser before building LLM prompts
- Add comprehensive test suite (12 new tests)
Refs #338
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add isProductionEnvironment() check to prevent YOLO mode bypass
- Log warning when YOLO mode request is blocked in production
- Fall back to process.env.NODE_ENV when config service returns undefined
- Add comprehensive tests for production blocking behavior
SECURITY: YOLO mode bypasses all quality gates which is dangerous in
production environments. This change ensures quality gates are always
enforced when NODE_ENV=production.
Refs #338
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add MAX_CONCURRENT_AGENTS configuration (default: 20)
- Check current agent count before spawning
- Reject spawn requests with 429 Too Many Requests when limit reached
- Add comprehensive tests for limit enforcement
Refs #338
- Add @nestjs/throttler for rate limiting support
- Configure multiple throttle profiles: default (100/min), strict (10/min for spawn/kill), status (200/min for polling)
- Apply strict rate limits to spawn and kill endpoints to prevent DoS
- Apply higher rate limits to status/health endpoints for monitoring
- Add OrchestratorThrottlerGuard with X-Forwarded-For support for proxy setups
- Add unit tests for throttler guard
Refs #338
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Drop all Linux capabilities by default (CapDrop: ALL)
- Enable read-only root filesystem (agents write to mounted /workspace volume)
- Limit process count to 100 to prevent fork bombs (PidsLimit)
- Add no-new-privileges security option to prevent privilege escalation
- Add DockerSecurityOptions type with configurable security settings
- All options are configurable via config but secure by default
- Add comprehensive tests for security hardening options (20+ new tests)
Refs #338
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add DEFAULT_ENV_WHITELIST constant with safe env vars (AGENT_ID, TASK_ID,
NODE_ENV, LOG_LEVEL, TZ, MOSAIC_* vars, etc.)
- Implement filterEnvVars() to separate allowed/filtered vars
- Log security warning when non-whitelisted vars are filtered
- Support custom whitelist via orchestrator.sandbox.envWhitelist config
- Add comprehensive tests for whitelist functionality (39 tests passing)
Prevents accidental leakage of secrets like API keys, database credentials,
AWS secrets, etc. to Docker containers.
Refs #338
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Log ERROR when queue corruption detected with error details
- Create timestamped backup before discarding corrupted data
- Add comprehensive tests for corruption handling
Refs #338
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implement circuit breaker pattern to prevent infinite retry loops on
repeated failures (SEC-ORCH-7). The circuit breaker tracks consecutive
failures and opens after a threshold is reached, blocking further
requests until a cooldown period elapses.
Circuit breaker states:
- CLOSED: Normal operation, requests pass through
- OPEN: After N consecutive failures, all requests blocked
- HALF_OPEN: After cooldown, allow one test request
Changes:
- Add circuit_breaker.py with CircuitBreaker class
- Integrate circuit breaker into Coordinator.start() loop
- Integrate circuit breaker into OrchestrationLoop.start() loop
- Integrate per-agent circuit breakers into ContextMonitor
- Add comprehensive tests for circuit breaker behavior
- Log state transitions and circuit breaker stats on shutdown
Configuration (defaults):
- failure_threshold: 5 consecutive failures
- cooldown_seconds: 30 seconds
Refs #338
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Create centralized config module (apps/web/src/lib/config.ts) exporting:
- API_BASE_URL: Main API server URL from NEXT_PUBLIC_API_URL
- ORCHESTRATOR_URL: Orchestrator service URL from NEXT_PUBLIC_ORCHESTRATOR_URL
- Helper functions for building full URLs
- Update client.ts to import from central config
- Update LoginButton.tsx to use API_BASE_URL from config
- Update useWebSocket.ts to use API_BASE_URL from config
- Update AgentStatusWidget.tsx to use ORCHESTRATOR_URL from config
- Update TaskProgressWidget.tsx to use ORCHESTRATOR_URL from config
- Update useGraphData.ts to use API_BASE_URL from config
- Fixed wrong default port (was 8000, now uses correct 3001)
- Add comprehensive tests for config module
- Update useWebSocket tests to properly mock config module
Refs #338
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Show Coming Soon placeholder in production for both widget versions
- Widget available in development mode only
- Added tests verifying environment-based behavior
- Use runtime check for testability (isDevelopment function vs constant)
Refs #338
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add error state tracking for both projects and agents API calls
- Show error UI (amber alert icon + message) when fetch fails
- Clear data on error to avoid showing stale information
- Added tests for error handling: API failures, network errors
Refs #338
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Store previous state before PATCH request
- Apply optimistic update immediately on drag
- Rollback UI to original position on API error
- Show error toast notification on failure
- Add comprehensive tests for optimistic updates and rollback
Refs #338
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add validateWebSocketSecurity() to warn when using ws:// in production
- Add connect_error event handler to capture connection failures
- Expose connectionError state to consumers via hook and provider
- Add comprehensive tests for WSS enforcement and error handling
Refs #338
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add error logging for auth check failures in development mode
- Distinguish network/backend errors from normal unauthenticated state
- Expose authError state to UI (network | backend | null)
- Add comprehensive tests for error handling scenarios
Refs #338
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Create ComingSoon component for production placeholders
- Federation connections page shows Coming Soon in production
- Workspaces settings page shows Coming Soon in production
- Teams page shows Coming Soon in production
- Add comprehensive tests for environment-based rendering
Refs #338
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Replace raw fetch() with apiPost/apiPatch/apiDelete in:
- ImportExportActions.tsx: POST for file imports
- KanbanBoard.tsx: PATCH for task status updates
- ActiveProjectsWidget.tsx: POST for widget data fetches
- useLayouts.ts: POST/PATCH/DELETE for layout management
- Add apiPostFormData() method to API client for FormData uploads
- Ensures CSRF token is included in all state-changing requests
- Update tests to mock CSRF token fetch for API client usage
Refs #338
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add federation.config.ts with UUID v4 validation for DEFAULT_WORKSPACE_ID
- Validate at module initialization (fail fast if misconfigured)
- Replace hardcoded "default" fallback with proper validation
- Add 18 tests covering valid UUIDs, invalid formats, and missing values
- Clear error messages with expected UUID format
Refs #338
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Apply restrictive rate limits (10 req/min) to prevent brute-force attacks
- Log requests with path and client IP for monitoring and debugging
- Extract client IP handling for proxy setups (X-Forwarded-For)
- Add comprehensive tests for rate limiting and logging behavior
Refs #338
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Replace workspace ownership check with explicit SYSTEM_ADMIN_IDS env var
- System admin access is now explicit and configurable via environment
- Workspace owners no longer automatically get system admin privileges
- Add 15 unit tests verifying security separation
- Add SYSTEM_ADMIN_IDS documentation to .env.example
Refs #338
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
New package providing CLI tools that work with both Gitea and GitHub:
Commands:
- mosaic-issue-{create,list,view,assign,edit,close,reopen,comment}
- mosaic-pr-{create,list,view,merge,review,close}
- mosaic-milestone-{create,list,close}
Features:
- Auto-detects platform (Gitea vs GitHub) from git remote
- Unified interface regardless of platform
- Available via `pnpm exec mosaic-*` in monorepo context
Updated docs/claude/orchestrator.md:
- Added CLI Tools section with usage examples
- Updated issue creation to use package commands
This makes Mosaic Stack fully self-contained for orchestration tooling.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Log at ERROR level when falling back to in-memory storage
- Track and expose degraded mode status for health checks
- Add isUsingFallback() method to check fallback state
- Add getHealthStatus() method for health check endpoints
- Add comprehensive tests for fallback behavior and health status
Refs #338
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Token now includes HMAC binding to session ID
- Validates session binding on verification
- Adds CSRF_SECRET configuration requirement
- Requires authentication for CSRF token endpoint
- 51 new tests covering session binding security
Security: CSRF tokens are now cryptographically tied to user sessions,
preventing token reuse across sessions and mitigating session fixation
attacks.
Token format: {random_part}:{hmac(random_part + user_id, secret)}
Refs #338
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Replace console.error with NestJS Logger
- Include entry ID and workspace ID in error context
- Easier to track and debug embedding issues
Refs #338
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Skip client initialization when OPENAI_API_KEY not configured
- Set openai property to null instead of creating with dummy key
- Methods return gracefully when embeddings not available
- Updated tests to verify client is not instantiated without key
Refs #338
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Verify tasks.service includes workspaceId in all queries
- Verify knowledge.service includes workspaceId in all queries
- Verify projects.service includes workspaceId in all queries
- Verify events.service includes workspaceId in all queries
- Add 39 tests covering create, findAll, findOne, update, remove operations
- Document security concern: findAll accepts empty query without workspaceId
- Ensures tenant isolation is maintained at query level
Refs #337
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
QA automation reports in docs/reports/qa-automation/ are ephemeral and
should not be committed. They are cleaned up by the orchestrator after
task completion.
- Nullish coalescing (??) doesn't work with booleans as expected
- When readOnly=false, ?? never evaluates right side (!selectedNode)
- Changed to logical OR (||) for correct disabled state calculation
- Added comprehensive tests verifying the fix:
* readOnly=false with no selection: editing disabled
* readOnly=false with selection: editing enabled
* readOnly=true: editing always disabled
- Removed unused eslint-disable directive
Refs #337
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Use OIDC_ISSUER and OIDC_CLIENT_ID from environment for JWT validation
- Federation OIDC properly configured from environment variables
- Fail fast with clear error when OIDC config is missing
- Handle trailing slash normalization for issuer URL
- Add tests verifying env var usage and missing config error handling
Refs #337
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Validate error against allowlist of OAuth error codes
- Unknown errors map to generic message
- Encode all URL parameters
Refs #337
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Created Zod schemas for TaskState, AgentState, and OrchestratorEvent
- Added ValkeyValidationError class for detailed error context
- Validate task and agent state data after JSON.parse
- Validate events in subscribeToEvents handler
- Corrupted/tampered data now rejected with clear errors including:
- Key name for context
- Data snippet (truncated to 100 chars)
- Underlying Zod validation error
- Prevents silent propagation of invalid data (SEC-ORCH-6)
- Added 20 new tests for validation scenarios
Refs #337
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Use SCAN with cursor for non-blocking iteration
- Prevents Redis DoS under high key counts
- Same API, safer implementation
Refs #337
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add COORDINATOR_API_KEY config option to orchestrator.config.ts
- Include X-API-Key header in coordinator requests when configured
- Log security warning if COORDINATOR_API_KEY not configured in production
- Log security warning if coordinator URL uses HTTP in production
- Add tests verifying API key inclusion in requests and warning behavior
Refs #337
- Sandbox now enabled by default for security
- Logs prominent warning when explicitly disabled
- Agents run in containers unless SANDBOX_ENABLED=false
Refs #337
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add OIDC_ENABLED environment variable to control OIDC authentication
- Validate required OIDC env vars (OIDC_ISSUER, OIDC_CLIENT_ID, OIDC_CLIENT_SECRET)
are present when OIDC is enabled
- Validate OIDC_ISSUER ends with trailing slash for correct discovery URL
- Throw descriptive error at startup if configuration is invalid
- Skip OIDC plugin registration when OIDC is disabled
- Add comprehensive tests for validation logic (17 test cases)
Refs #337
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
SEC-API-2: WorkspaceGuard now propagates database errors as 500s instead of
returning "access denied". Only Prisma P2025 (record not found) is treated
as "user not a member".
SEC-API-3: PermissionGuard now propagates database errors as 500s instead of
returning null role (which caused permission denied). Only Prisma P2025 is
treated as "not a member".
This prevents connection timeouts, pool exhaustion, and other infrastructure
errors from being misreported to users as authorization failures.
Refs #337
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add scanError field and scannedSuccessfully flag to SecretScanResult
- File read errors no longer falsely report as "clean"
- Callers can distinguish clean files from scan failures
- Update getScanSummary to track filesWithErrors count
- SecretsDetectedError now reports files that couldn't be scanned
- Add tests verifying error handling behavior for file access issues
Refs #337
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add OrchestratorApiKeyGuard to protect agent management endpoints (spawn,
kill, kill-all, status) from unauthorized access. Uses X-API-Key header
with constant-time comparison to prevent timing attacks.
- Create apps/orchestrator/src/common/guards/api-key.guard.ts
- Add comprehensive tests for all guard scenarios
- Apply guard to AgentsController (controller-level protection)
- Document ORCHESTRATOR_API_KEY in .env.example files
- Health endpoints remain unauthenticated for monitoring
Security: Prevents unauthorized users from draining API credits or
killing all agents via unprotected endpoints.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Update CLAUDE.md to point to universal orchestrator guide
- Add docs/tasks.md with 28 tasks across 4 phases:
- Phase 1: Critical Security (MS-SEC-001 to MS-SEC-010)
- Phase 2: High Security (MS-HIGH-001 to MS-HIGH-006)
- Phase 3: Code Quality (MS-CQ-001 to MS-CQ-007)
- Phase 4: Test Coverage (MS-TEST-001 to MS-TEST-005)
- Add project-specific task-tracking.md reference
Based on comprehensive codebase review (124 findings).