From 751005391bf05dd4b244ae4003eae22dbf5ebdc6 Mon Sep 17 00:00:00 2001 From: Jason Woltje Date: Thu, 5 Feb 2026 12:49:54 -0600 Subject: [PATCH 1/2] docs(#230): Comprehensive orchestrator documentation Update README with complete API reference, module architecture tree, service catalog, Valkey state keys, quality gate profiles, and configuration reference. Fixes #230 Co-Authored-By: Claude Opus 4.5 --- apps/orchestrator/README.md | 181 ++++++++++++++++++++++++++++++------ 1 file changed, 154 insertions(+), 27 deletions(-) diff --git a/apps/orchestrator/README.md b/apps/orchestrator/README.md index a0a442c..74d7834 100644 --- a/apps/orchestrator/README.md +++ b/apps/orchestrator/README.md @@ -6,59 +6,186 @@ Agent orchestration service for Mosaic Stack built with NestJS. The Orchestrator is the execution plane of Mosaic Stack, responsible for: -- Spawning and managing Claude agents -- Task queue management (Valkey-backed) -- Agent health monitoring and recovery -- Git workflow automation -- Quality gate enforcement callbacks -- Killswitch emergency stop +- Spawning and managing Claude agents (worker, reviewer, tester) +- Task queue management via BullMQ with Valkey backend +- Agent lifecycle state machine (spawning → running → completed/failed/killed) +- Git workflow automation with worktree isolation per agent +- Quality gate enforcement via Coordinator integration +- Killswitch emergency stop with cleanup +- Docker sandbox isolation (optional) +- Secret scanning on agent commits ## Architecture -Part of the Mosaic Stack monorepo at `apps/orchestrator/`. +``` +AppModule +├── HealthModule → GET /health, GET /health/ready +├── AgentsModule → POST /agents/spawn, GET /agents/:id/status, kill endpoints +│ ├── QueueModule → BullMQ task queue (priority 1-10, retry with backoff) +│ ├── SpawnerModule → Agent session management, Docker sandbox, lifecycle FSM +│ ├── KillswitchModule → Emergency kill + cleanup (Docker, worktree, Valkey state) +│ └── ValkeyModule → Distributed state persistence and pub/sub events +├── CoordinatorModule → Quality gate checks (typecheck, lint, tests, coverage, AI review) +├── GitModule → Clone, branch, commit, push, conflict detection, secret scanning +└── MonitorModule → Agent health monitoring (placeholder) +``` +Part of the Mosaic Stack monorepo at `apps/orchestrator/`. Controlled by `apps/coordinator/` (Quality Coordinator). Monitored via `apps/web/` (Agent Dashboard). +## API Reference + +### Health + +| Method | Path | Description | +| ------ | --------------- | ----------------- | +| GET | `/health` | Uptime and status | +| GET | `/health/ready` | Readiness check | + +### Agents + +| Method | Path | Description | +| ------ | ------------------------- | ---------------------- | +| POST | `/agents/spawn` | Spawn a new agent | +| GET | `/agents/:agentId/status` | Get agent status | +| POST | `/agents/:agentId/kill` | Kill a single agent | +| POST | `/agents/kill-all` | Kill all active agents | + +#### POST /agents/spawn + +```json +{ + "taskId": "string (required)", + "agentType": "worker | reviewer | tester", + "context": { + "repository": "https://git.example.com/repo.git", + "branch": "main", + "workItems": ["US-001"], + "skills": ["typescript"] + } +} +``` + +Response: + +```json +{ + "agentId": "uuid", + "status": "spawning" +} +``` + +#### GET /agents/:agentId/status + +Response: + +```json +{ + "agentId": "uuid", + "taskId": "string", + "status": "spawning | running | completed | failed | killed", + "spawnedAt": "ISO timestamp", + "startedAt": "ISO timestamp (optional)", + "completedAt": "ISO timestamp (optional)", + "error": "string (optional)" +} +``` + +#### POST /agents/kill-all + +Response: + +```json +{ + "message": "Kill all completed: 3 killed, 0 failed", + "total": 3, + "killed": 3, + "failed": 0, + "errors": [] +} +``` + +## Services + +| Service | Module | Responsibility | +| ------------------------ | ----------- | ---------------------------------------------------- | +| AgentSpawnerService | Spawner | Create agent sessions, generate UUIDs, track state | +| AgentLifecycleService | Spawner | State machine transitions with Valkey pub/sub events | +| DockerSandboxService | Spawner | Container creation with memory/CPU limits | +| QueueService | Queue | BullMQ priority queue with exponential backoff retry | +| KillswitchService | Killswitch | Emergency agent termination with audit logging | +| CleanupService | Killswitch | Multi-step cleanup (Docker, worktree, Valkey state) | +| GitOperationsService | Git | Clone, branch, commit, push operations | +| WorktreeManagerService | Git | Per-agent worktree isolation | +| ConflictDetectionService | Git | Merge conflict detection before push | +| SecretScannerService | Git | Detect hardcoded secrets (AWS, API keys, JWTs, etc.) | +| ValkeyService | Valkey | Distributed state and event pub/sub | +| CoordinatorClientService | Coordinator | HTTP client for quality gate API with retry | +| QualityGatesService | Coordinator | Pre-commit and post-commit gate evaluation | + +## Valkey State Keys + +``` +orchestrator:task:{taskId} → TaskState (status, agentId, context, timestamps) +orchestrator:agent:{agentId} → AgentState (status, taskId, timestamps, error) +orchestrator:events → Pub/sub channel for lifecycle events +``` + +## Quality Gate Profiles + +| Profile | Pre-commit | Post-commit | +| -------- | ---------------------- | --------------------------------------------- | +| strict | typecheck, lint, tests | coverage (85%), build, integration, AI review | +| standard | typecheck, lint, tests | coverage (85%), build | +| minimal | typecheck, lint | build | + ## Development ```bash # Install dependencies (from monorepo root) pnpm install -# Run in dev mode (watch mode) +# Run in dev mode pnpm --filter @mosaic/orchestrator dev # Build pnpm --filter @mosaic/orchestrator build -# Start production -pnpm --filter @mosaic/orchestrator start:prod - -# Test +# Run unit tests pnpm --filter @mosaic/orchestrator test -# Generate module (NestJS CLI) -cd apps/orchestrator -nest generate module -nest generate controller -nest generate service +# Run E2E/integration tests +pnpm --filter @mosaic/orchestrator test:e2e + +# Type check +pnpm --filter @mosaic/orchestrator typecheck + +# Lint +pnpm --filter @mosaic/orchestrator lint ``` -## NestJS Architecture +## Testing -- **Modules:** Feature-based organization (spawner, queue, monitor, etc.) -- **Controllers:** HTTP endpoints (health, agents, tasks) -- **Services:** Business logic -- **Providers:** Dependency injection +- **Unit tests:** Co-located `*.spec.ts` files (19 test files, 447+ tests) +- **Integration tests:** `tests/integration/*.e2e-spec.ts` (17 E2E tests) +- **Coverage threshold:** 85% (lines, functions, branches, statements) ## Configuration -Environment variables loaded via @nestjs/config. -See `.env.example` for required vars. +Environment variables loaded via `@nestjs/config`. Key variables: -## Documentation +| Variable | Description | +| ------------------------------ | ---------------------------- | +| `ORCHESTRATOR_PORT` | HTTP port (default: 3001) | +| `ORCHESTRATOR_CLAUDE_API_KEY` | Claude API key for agents | +| `ORCHESTRATOR_VALKEY_HOST` | Valkey/Redis host | +| `ORCHESTRATOR_VALKEY_PORT` | Valkey/Redis port | +| `ORCHESTRATOR_COORDINATOR_URL` | Quality Coordinator base URL | +| `ORCHESTRATOR_DOCKER_ENABLED` | Enable Docker sandbox | -- Architecture: `/docs/ORCHESTRATOR-MONOREPO-SETUP.md` -- API Contracts: `/docs/M6-ISSUE-AUDIT.md` +## Related Documentation + +- Design: `docs/design/agent-orchestration.md` +- Setup: `docs/ORCHESTRATOR-MONOREPO-SETUP.md` - Milestone: M6-AgentOrchestration (0.0.6) -- 2.49.1 From 5a0f090cc54d251fd4dc28fb3c07eac315a83c2f Mon Sep 17 00:00:00 2001 From: Jason Woltje Date: Thu, 5 Feb 2026 13:24:54 -0600 Subject: [PATCH 2/2] fix(#230): Correct documentation errors from code review - Fix CRITICAL: Correct 5 environment variable names to match actual config (VALKEY_HOST not ORCHESTRATOR_VALKEY_HOST, CLAUDE_API_KEY not ORCHESTRATOR_CLAUDE_API_KEY, etc.) - Fix CRITICAL: Correct quality gate profiles table to match actual gate-config service (minimal = tests only, not typecheck+lint; add agent type defaults) - Fix IMPORTANT: Add missing gateProfile optional field to spawn request docs Co-Authored-By: Claude Opus 4.5 --- apps/orchestrator/README.md | 27 ++++++++++++++------------- 1 file changed, 14 insertions(+), 13 deletions(-) diff --git a/apps/orchestrator/README.md b/apps/orchestrator/README.md index 74d7834..3621f7d 100644 --- a/apps/orchestrator/README.md +++ b/apps/orchestrator/README.md @@ -58,6 +58,7 @@ Monitored via `apps/web/` (Agent Dashboard). { "taskId": "string (required)", "agentType": "worker | reviewer | tester", + "gateProfile": "strict | standard | minimal | custom (optional)", "context": { "repository": "https://git.example.com/repo.git", "branch": "main", @@ -134,11 +135,11 @@ orchestrator:events → Pub/sub channel for lifecycle events ## Quality Gate Profiles -| Profile | Pre-commit | Post-commit | -| -------- | ---------------------- | --------------------------------------------- | -| strict | typecheck, lint, tests | coverage (85%), build, integration, AI review | -| standard | typecheck, lint, tests | coverage (85%), build | -| minimal | typecheck, lint | build | +| Profile | Default For | Gates | +| -------- | ----------- | --------------------------------------------------------------------- | +| strict | reviewer | typecheck, lint, tests, coverage (85%), build, integration, AI review | +| standard | worker | typecheck, lint, tests, coverage (85%) | +| minimal | tester | tests only | ## Development @@ -175,14 +176,14 @@ pnpm --filter @mosaic/orchestrator lint Environment variables loaded via `@nestjs/config`. Key variables: -| Variable | Description | -| ------------------------------ | ---------------------------- | -| `ORCHESTRATOR_PORT` | HTTP port (default: 3001) | -| `ORCHESTRATOR_CLAUDE_API_KEY` | Claude API key for agents | -| `ORCHESTRATOR_VALKEY_HOST` | Valkey/Redis host | -| `ORCHESTRATOR_VALKEY_PORT` | Valkey/Redis port | -| `ORCHESTRATOR_COORDINATOR_URL` | Quality Coordinator base URL | -| `ORCHESTRATOR_DOCKER_ENABLED` | Enable Docker sandbox | +| Variable | Description | +| ------------------- | -------------------------------------- | +| `ORCHESTRATOR_PORT` | HTTP port (default: 3001) | +| `CLAUDE_API_KEY` | Claude API key for agents | +| `VALKEY_HOST` | Valkey/Redis host (default: localhost) | +| `VALKEY_PORT` | Valkey/Redis port (default: 6379) | +| `COORDINATOR_URL` | Quality Coordinator base URL | +| `SANDBOX_ENABLED` | Enable Docker sandbox (true/false) | ## Related Documentation -- 2.49.1