diff --git a/apps/orchestrator/README.md b/apps/orchestrator/README.md index a0a442c..74d7834 100644 --- a/apps/orchestrator/README.md +++ b/apps/orchestrator/README.md @@ -6,59 +6,186 @@ Agent orchestration service for Mosaic Stack built with NestJS. The Orchestrator is the execution plane of Mosaic Stack, responsible for: -- Spawning and managing Claude agents -- Task queue management (Valkey-backed) -- Agent health monitoring and recovery -- Git workflow automation -- Quality gate enforcement callbacks -- Killswitch emergency stop +- Spawning and managing Claude agents (worker, reviewer, tester) +- Task queue management via BullMQ with Valkey backend +- Agent lifecycle state machine (spawning → running → completed/failed/killed) +- Git workflow automation with worktree isolation per agent +- Quality gate enforcement via Coordinator integration +- Killswitch emergency stop with cleanup +- Docker sandbox isolation (optional) +- Secret scanning on agent commits ## Architecture -Part of the Mosaic Stack monorepo at `apps/orchestrator/`. +``` +AppModule +├── HealthModule → GET /health, GET /health/ready +├── AgentsModule → POST /agents/spawn, GET /agents/:id/status, kill endpoints +│ ├── QueueModule → BullMQ task queue (priority 1-10, retry with backoff) +│ ├── SpawnerModule → Agent session management, Docker sandbox, lifecycle FSM +│ ├── KillswitchModule → Emergency kill + cleanup (Docker, worktree, Valkey state) +│ └── ValkeyModule → Distributed state persistence and pub/sub events +├── CoordinatorModule → Quality gate checks (typecheck, lint, tests, coverage, AI review) +├── GitModule → Clone, branch, commit, push, conflict detection, secret scanning +└── MonitorModule → Agent health monitoring (placeholder) +``` +Part of the Mosaic Stack monorepo at `apps/orchestrator/`. Controlled by `apps/coordinator/` (Quality Coordinator). Monitored via `apps/web/` (Agent Dashboard). +## API Reference + +### Health + +| Method | Path | Description | +| ------ | --------------- | ----------------- | +| GET | `/health` | Uptime and status | +| GET | `/health/ready` | Readiness check | + +### Agents + +| Method | Path | Description | +| ------ | ------------------------- | ---------------------- | +| POST | `/agents/spawn` | Spawn a new agent | +| GET | `/agents/:agentId/status` | Get agent status | +| POST | `/agents/:agentId/kill` | Kill a single agent | +| POST | `/agents/kill-all` | Kill all active agents | + +#### POST /agents/spawn + +```json +{ + "taskId": "string (required)", + "agentType": "worker | reviewer | tester", + "context": { + "repository": "https://git.example.com/repo.git", + "branch": "main", + "workItems": ["US-001"], + "skills": ["typescript"] + } +} +``` + +Response: + +```json +{ + "agentId": "uuid", + "status": "spawning" +} +``` + +#### GET /agents/:agentId/status + +Response: + +```json +{ + "agentId": "uuid", + "taskId": "string", + "status": "spawning | running | completed | failed | killed", + "spawnedAt": "ISO timestamp", + "startedAt": "ISO timestamp (optional)", + "completedAt": "ISO timestamp (optional)", + "error": "string (optional)" +} +``` + +#### POST /agents/kill-all + +Response: + +```json +{ + "message": "Kill all completed: 3 killed, 0 failed", + "total": 3, + "killed": 3, + "failed": 0, + "errors": [] +} +``` + +## Services + +| Service | Module | Responsibility | +| ------------------------ | ----------- | ---------------------------------------------------- | +| AgentSpawnerService | Spawner | Create agent sessions, generate UUIDs, track state | +| AgentLifecycleService | Spawner | State machine transitions with Valkey pub/sub events | +| DockerSandboxService | Spawner | Container creation with memory/CPU limits | +| QueueService | Queue | BullMQ priority queue with exponential backoff retry | +| KillswitchService | Killswitch | Emergency agent termination with audit logging | +| CleanupService | Killswitch | Multi-step cleanup (Docker, worktree, Valkey state) | +| GitOperationsService | Git | Clone, branch, commit, push operations | +| WorktreeManagerService | Git | Per-agent worktree isolation | +| ConflictDetectionService | Git | Merge conflict detection before push | +| SecretScannerService | Git | Detect hardcoded secrets (AWS, API keys, JWTs, etc.) | +| ValkeyService | Valkey | Distributed state and event pub/sub | +| CoordinatorClientService | Coordinator | HTTP client for quality gate API with retry | +| QualityGatesService | Coordinator | Pre-commit and post-commit gate evaluation | + +## Valkey State Keys + +``` +orchestrator:task:{taskId} → TaskState (status, agentId, context, timestamps) +orchestrator:agent:{agentId} → AgentState (status, taskId, timestamps, error) +orchestrator:events → Pub/sub channel for lifecycle events +``` + +## Quality Gate Profiles + +| Profile | Pre-commit | Post-commit | +| -------- | ---------------------- | --------------------------------------------- | +| strict | typecheck, lint, tests | coverage (85%), build, integration, AI review | +| standard | typecheck, lint, tests | coverage (85%), build | +| minimal | typecheck, lint | build | + ## Development ```bash # Install dependencies (from monorepo root) pnpm install -# Run in dev mode (watch mode) +# Run in dev mode pnpm --filter @mosaic/orchestrator dev # Build pnpm --filter @mosaic/orchestrator build -# Start production -pnpm --filter @mosaic/orchestrator start:prod - -# Test +# Run unit tests pnpm --filter @mosaic/orchestrator test -# Generate module (NestJS CLI) -cd apps/orchestrator -nest generate module -nest generate controller -nest generate service +# Run E2E/integration tests +pnpm --filter @mosaic/orchestrator test:e2e + +# Type check +pnpm --filter @mosaic/orchestrator typecheck + +# Lint +pnpm --filter @mosaic/orchestrator lint ``` -## NestJS Architecture +## Testing -- **Modules:** Feature-based organization (spawner, queue, monitor, etc.) -- **Controllers:** HTTP endpoints (health, agents, tasks) -- **Services:** Business logic -- **Providers:** Dependency injection +- **Unit tests:** Co-located `*.spec.ts` files (19 test files, 447+ tests) +- **Integration tests:** `tests/integration/*.e2e-spec.ts` (17 E2E tests) +- **Coverage threshold:** 85% (lines, functions, branches, statements) ## Configuration -Environment variables loaded via @nestjs/config. -See `.env.example` for required vars. +Environment variables loaded via `@nestjs/config`. Key variables: -## Documentation +| Variable | Description | +| ------------------------------ | ---------------------------- | +| `ORCHESTRATOR_PORT` | HTTP port (default: 3001) | +| `ORCHESTRATOR_CLAUDE_API_KEY` | Claude API key for agents | +| `ORCHESTRATOR_VALKEY_HOST` | Valkey/Redis host | +| `ORCHESTRATOR_VALKEY_PORT` | Valkey/Redis port | +| `ORCHESTRATOR_COORDINATOR_URL` | Quality Coordinator base URL | +| `ORCHESTRATOR_DOCKER_ENABLED` | Enable Docker sandbox | -- Architecture: `/docs/ORCHESTRATOR-MONOREPO-SETUP.md` -- API Contracts: `/docs/M6-ISSUE-AUDIT.md` +## Related Documentation + +- Design: `docs/design/agent-orchestration.md` +- Setup: `docs/ORCHESTRATOR-MONOREPO-SETUP.md` - Milestone: M6-AgentOrchestration (0.0.6)