# Mosaic Orchestrator Agent orchestration service for Mosaic Stack built with NestJS. ## Overview The Orchestrator is the execution plane of Mosaic Stack, responsible for: - Spawning and managing Claude agents (worker, reviewer, tester) - Task queue management via BullMQ with Valkey backend - Agent lifecycle state machine (spawning → running → completed/failed/killed) - Git workflow automation with worktree isolation per agent - Quality gate enforcement via Coordinator integration - Killswitch emergency stop with cleanup - Docker sandbox isolation (optional) - Secret scanning on agent commits ## Architecture ``` AppModule ├── HealthModule → GET /health, GET /health/ready ├── AgentsModule → POST /agents/spawn, GET /agents/:id/status, kill endpoints │ ├── QueueModule → BullMQ task queue (priority 1-10, retry with backoff) │ ├── SpawnerModule → Agent session management, Docker sandbox, lifecycle FSM │ ├── KillswitchModule → Emergency kill + cleanup (Docker, worktree, Valkey state) │ └── ValkeyModule → Distributed state persistence and pub/sub events ├── CoordinatorModule → Quality gate checks (typecheck, lint, tests, coverage, AI review) ├── GitModule → Clone, branch, commit, push, conflict detection, secret scanning └── MonitorModule → Agent health monitoring (placeholder) ``` Part of the Mosaic Stack monorepo at `apps/orchestrator/`. Controlled by `apps/coordinator/` (Quality Coordinator). Monitored via `apps/web/` (Agent Dashboard). ## API Reference ### Health | Method | Path | Description | | ------ | --------------- | ----------------- | | GET | `/health` | Uptime and status | | GET | `/health/ready` | Readiness check | ### Agents | Method | Path | Description | | ------ | ------------------------- | ------------------------- | | POST | `/agents/spawn` | Spawn a new agent | | GET | `/agents/:agentId/status` | Get agent status | | POST | `/agents/:agentId/kill` | Kill a single agent | | POST | `/agents/kill-all` | Kill all active agents | | GET | `/agents/events` | SSE lifecycle/task events | ### Queue | Method | Path | Description | | ------ | --------------- | ---------------------------- | | GET | `/queue/stats` | Queue depth and worker stats | | POST | `/queue/pause` | Pause queue processing | | POST | `/queue/resume` | Resume queue processing | #### POST /agents/spawn ```json { "taskId": "string (required)", "agentType": "worker | reviewer | tester", "gateProfile": "strict | standard | minimal | custom (optional)", "context": { "repository": "https://git.example.com/repo.git", "branch": "main", "workItems": ["US-001"], "skills": ["typescript"] } } ``` Response: ```json { "agentId": "uuid", "status": "spawning" } ``` #### GET /agents/:agentId/status Response: ```json { "agentId": "uuid", "taskId": "string", "status": "spawning | running | completed | failed | killed", "spawnedAt": "ISO timestamp", "startedAt": "ISO timestamp (optional)", "completedAt": "ISO timestamp (optional)", "error": "string (optional)" } ``` #### POST /agents/kill-all Response: ```json { "message": "Kill all completed: 3 killed, 0 failed", "total": 3, "killed": 3, "failed": 0, "errors": [] } ``` ## Services | Service | Module | Responsibility | | ------------------------ | ----------- | ---------------------------------------------------- | | AgentSpawnerService | Spawner | Create agent sessions, generate UUIDs, track state | | AgentLifecycleService | Spawner | State machine transitions with Valkey pub/sub events | | DockerSandboxService | Spawner | Container creation with memory/CPU limits | | QueueService | Queue | BullMQ priority queue with exponential backoff retry | | KillswitchService | Killswitch | Emergency agent termination with audit logging | | CleanupService | Killswitch | Multi-step cleanup (Docker, worktree, Valkey state) | | GitOperationsService | Git | Clone, branch, commit, push operations | | WorktreeManagerService | Git | Per-agent worktree isolation | | ConflictDetectionService | Git | Merge conflict detection before push | | SecretScannerService | Git | Detect hardcoded secrets (AWS, API keys, JWTs, etc.) | | ValkeyService | Valkey | Distributed state and event pub/sub | | CoordinatorClientService | Coordinator | HTTP client for quality gate API with retry | | QualityGatesService | Coordinator | Pre-commit and post-commit gate evaluation | ## Valkey State Keys ``` orchestrator:task:{taskId} → TaskState (status, agentId, context, timestamps) orchestrator:agent:{agentId} → AgentState (status, taskId, timestamps, error) orchestrator:events → Pub/sub channel for lifecycle events ``` ## Quality Gate Profiles | Profile | Default For | Gates | | -------- | ----------- | --------------------------------------------------------------------- | | strict | reviewer | typecheck, lint, tests, coverage (85%), build, integration, AI review | | standard | worker | typecheck, lint, tests, coverage (85%) | | minimal | tester | tests only | ## Development ```bash # Install dependencies (from monorepo root) pnpm install # Run in dev mode pnpm --filter @mosaic/orchestrator dev # Build pnpm --filter @mosaic/orchestrator build # Run unit tests pnpm --filter @mosaic/orchestrator test # Run E2E/integration tests pnpm --filter @mosaic/orchestrator test:e2e # Type check pnpm --filter @mosaic/orchestrator typecheck # Lint pnpm --filter @mosaic/orchestrator lint ``` ## Testing - **Unit tests:** Co-located `*.spec.ts` files (19 test files, 447+ tests) - **Integration tests:** `tests/integration/*.e2e-spec.ts` (17 E2E tests) - **Coverage threshold:** 85% (lines, functions, branches, statements) ## Configuration Environment variables loaded via `@nestjs/config`. Key variables: | Variable | Description | | -------------------------------- | -------------------------------------------------- | | `ORCHESTRATOR_PORT` | HTTP port (default: 3001) | | `CLAUDE_API_KEY` | Claude API key for agents | | `VALKEY_HOST` | Valkey/Redis host (default: localhost) | | `VALKEY_PORT` | Valkey/Redis port (default: 6379) | | `COORDINATOR_URL` | Quality Coordinator base URL | | `SANDBOX_ENABLED` | Enable Docker sandbox (true/false) | | `MAX_CONCURRENT_AGENTS` | Maximum concurrent in-memory sessions (default: 2) | | `ORCHESTRATOR_QUEUE_CONCURRENCY` | BullMQ worker concurrency (default: 1) | | `SANDBOX_DEFAULT_MEMORY_MB` | Sandbox memory limit in MB (default: 256) | ## Related Documentation - Design: `docs/design/agent-orchestration.md` - Setup: `docs/ORCHESTRATOR-MONOREPO-SETUP.md` - Milestone: M6-AgentOrchestration (0.0.6)