Merge pull request 'docs(#230): Comprehensive orchestrator documentation' (#333) from feature/230-documentation into develop
Some checks are pending
ci/woodpecker/push/woodpecker Pipeline is pending
Some checks are pending
ci/woodpecker/push/woodpecker Pipeline is pending
Reviewed-on: #333
This commit was merged in pull request #333.
This commit is contained in:
@@ -6,59 +6,187 @@ Agent orchestration service for Mosaic Stack built with NestJS.
|
|||||||
|
|
||||||
The Orchestrator is the execution plane of Mosaic Stack, responsible for:
|
The Orchestrator is the execution plane of Mosaic Stack, responsible for:
|
||||||
|
|
||||||
- Spawning and managing Claude agents
|
- Spawning and managing Claude agents (worker, reviewer, tester)
|
||||||
- Task queue management (Valkey-backed)
|
- Task queue management via BullMQ with Valkey backend
|
||||||
- Agent health monitoring and recovery
|
- Agent lifecycle state machine (spawning → running → completed/failed/killed)
|
||||||
- Git workflow automation
|
- Git workflow automation with worktree isolation per agent
|
||||||
- Quality gate enforcement callbacks
|
- Quality gate enforcement via Coordinator integration
|
||||||
- Killswitch emergency stop
|
- Killswitch emergency stop with cleanup
|
||||||
|
- Docker sandbox isolation (optional)
|
||||||
|
- Secret scanning on agent commits
|
||||||
|
|
||||||
## Architecture
|
## Architecture
|
||||||
|
|
||||||
Part of the Mosaic Stack monorepo at `apps/orchestrator/`.
|
```
|
||||||
|
AppModule
|
||||||
|
├── HealthModule → GET /health, GET /health/ready
|
||||||
|
├── AgentsModule → POST /agents/spawn, GET /agents/:id/status, kill endpoints
|
||||||
|
│ ├── QueueModule → BullMQ task queue (priority 1-10, retry with backoff)
|
||||||
|
│ ├── SpawnerModule → Agent session management, Docker sandbox, lifecycle FSM
|
||||||
|
│ ├── KillswitchModule → Emergency kill + cleanup (Docker, worktree, Valkey state)
|
||||||
|
│ └── ValkeyModule → Distributed state persistence and pub/sub events
|
||||||
|
├── CoordinatorModule → Quality gate checks (typecheck, lint, tests, coverage, AI review)
|
||||||
|
├── GitModule → Clone, branch, commit, push, conflict detection, secret scanning
|
||||||
|
└── MonitorModule → Agent health monitoring (placeholder)
|
||||||
|
```
|
||||||
|
|
||||||
|
Part of the Mosaic Stack monorepo at `apps/orchestrator/`.
|
||||||
Controlled by `apps/coordinator/` (Quality Coordinator).
|
Controlled by `apps/coordinator/` (Quality Coordinator).
|
||||||
Monitored via `apps/web/` (Agent Dashboard).
|
Monitored via `apps/web/` (Agent Dashboard).
|
||||||
|
|
||||||
|
## API Reference
|
||||||
|
|
||||||
|
### Health
|
||||||
|
|
||||||
|
| Method | Path | Description |
|
||||||
|
| ------ | --------------- | ----------------- |
|
||||||
|
| GET | `/health` | Uptime and status |
|
||||||
|
| GET | `/health/ready` | Readiness check |
|
||||||
|
|
||||||
|
### Agents
|
||||||
|
|
||||||
|
| Method | Path | Description |
|
||||||
|
| ------ | ------------------------- | ---------------------- |
|
||||||
|
| POST | `/agents/spawn` | Spawn a new agent |
|
||||||
|
| GET | `/agents/:agentId/status` | Get agent status |
|
||||||
|
| POST | `/agents/:agentId/kill` | Kill a single agent |
|
||||||
|
| POST | `/agents/kill-all` | Kill all active agents |
|
||||||
|
|
||||||
|
#### POST /agents/spawn
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"taskId": "string (required)",
|
||||||
|
"agentType": "worker | reviewer | tester",
|
||||||
|
"gateProfile": "strict | standard | minimal | custom (optional)",
|
||||||
|
"context": {
|
||||||
|
"repository": "https://git.example.com/repo.git",
|
||||||
|
"branch": "main",
|
||||||
|
"workItems": ["US-001"],
|
||||||
|
"skills": ["typescript"]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Response:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"agentId": "uuid",
|
||||||
|
"status": "spawning"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
#### GET /agents/:agentId/status
|
||||||
|
|
||||||
|
Response:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"agentId": "uuid",
|
||||||
|
"taskId": "string",
|
||||||
|
"status": "spawning | running | completed | failed | killed",
|
||||||
|
"spawnedAt": "ISO timestamp",
|
||||||
|
"startedAt": "ISO timestamp (optional)",
|
||||||
|
"completedAt": "ISO timestamp (optional)",
|
||||||
|
"error": "string (optional)"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
#### POST /agents/kill-all
|
||||||
|
|
||||||
|
Response:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"message": "Kill all completed: 3 killed, 0 failed",
|
||||||
|
"total": 3,
|
||||||
|
"killed": 3,
|
||||||
|
"failed": 0,
|
||||||
|
"errors": []
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Services
|
||||||
|
|
||||||
|
| Service | Module | Responsibility |
|
||||||
|
| ------------------------ | ----------- | ---------------------------------------------------- |
|
||||||
|
| AgentSpawnerService | Spawner | Create agent sessions, generate UUIDs, track state |
|
||||||
|
| AgentLifecycleService | Spawner | State machine transitions with Valkey pub/sub events |
|
||||||
|
| DockerSandboxService | Spawner | Container creation with memory/CPU limits |
|
||||||
|
| QueueService | Queue | BullMQ priority queue with exponential backoff retry |
|
||||||
|
| KillswitchService | Killswitch | Emergency agent termination with audit logging |
|
||||||
|
| CleanupService | Killswitch | Multi-step cleanup (Docker, worktree, Valkey state) |
|
||||||
|
| GitOperationsService | Git | Clone, branch, commit, push operations |
|
||||||
|
| WorktreeManagerService | Git | Per-agent worktree isolation |
|
||||||
|
| ConflictDetectionService | Git | Merge conflict detection before push |
|
||||||
|
| SecretScannerService | Git | Detect hardcoded secrets (AWS, API keys, JWTs, etc.) |
|
||||||
|
| ValkeyService | Valkey | Distributed state and event pub/sub |
|
||||||
|
| CoordinatorClientService | Coordinator | HTTP client for quality gate API with retry |
|
||||||
|
| QualityGatesService | Coordinator | Pre-commit and post-commit gate evaluation |
|
||||||
|
|
||||||
|
## Valkey State Keys
|
||||||
|
|
||||||
|
```
|
||||||
|
orchestrator:task:{taskId} → TaskState (status, agentId, context, timestamps)
|
||||||
|
orchestrator:agent:{agentId} → AgentState (status, taskId, timestamps, error)
|
||||||
|
orchestrator:events → Pub/sub channel for lifecycle events
|
||||||
|
```
|
||||||
|
|
||||||
|
## Quality Gate Profiles
|
||||||
|
|
||||||
|
| Profile | Default For | Gates |
|
||||||
|
| -------- | ----------- | --------------------------------------------------------------------- |
|
||||||
|
| strict | reviewer | typecheck, lint, tests, coverage (85%), build, integration, AI review |
|
||||||
|
| standard | worker | typecheck, lint, tests, coverage (85%) |
|
||||||
|
| minimal | tester | tests only |
|
||||||
|
|
||||||
## Development
|
## Development
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Install dependencies (from monorepo root)
|
# Install dependencies (from monorepo root)
|
||||||
pnpm install
|
pnpm install
|
||||||
|
|
||||||
# Run in dev mode (watch mode)
|
# Run in dev mode
|
||||||
pnpm --filter @mosaic/orchestrator dev
|
pnpm --filter @mosaic/orchestrator dev
|
||||||
|
|
||||||
# Build
|
# Build
|
||||||
pnpm --filter @mosaic/orchestrator build
|
pnpm --filter @mosaic/orchestrator build
|
||||||
|
|
||||||
# Start production
|
# Run unit tests
|
||||||
pnpm --filter @mosaic/orchestrator start:prod
|
|
||||||
|
|
||||||
# Test
|
|
||||||
pnpm --filter @mosaic/orchestrator test
|
pnpm --filter @mosaic/orchestrator test
|
||||||
|
|
||||||
# Generate module (NestJS CLI)
|
# Run E2E/integration tests
|
||||||
cd apps/orchestrator
|
pnpm --filter @mosaic/orchestrator test:e2e
|
||||||
nest generate module <name>
|
|
||||||
nest generate controller <name>
|
# Type check
|
||||||
nest generate service <name>
|
pnpm --filter @mosaic/orchestrator typecheck
|
||||||
|
|
||||||
|
# Lint
|
||||||
|
pnpm --filter @mosaic/orchestrator lint
|
||||||
```
|
```
|
||||||
|
|
||||||
## NestJS Architecture
|
## Testing
|
||||||
|
|
||||||
- **Modules:** Feature-based organization (spawner, queue, monitor, etc.)
|
- **Unit tests:** Co-located `*.spec.ts` files (19 test files, 447+ tests)
|
||||||
- **Controllers:** HTTP endpoints (health, agents, tasks)
|
- **Integration tests:** `tests/integration/*.e2e-spec.ts` (17 E2E tests)
|
||||||
- **Services:** Business logic
|
- **Coverage threshold:** 85% (lines, functions, branches, statements)
|
||||||
- **Providers:** Dependency injection
|
|
||||||
|
|
||||||
## Configuration
|
## Configuration
|
||||||
|
|
||||||
Environment variables loaded via @nestjs/config.
|
Environment variables loaded via `@nestjs/config`. Key variables:
|
||||||
See `.env.example` for required vars.
|
|
||||||
|
|
||||||
## Documentation
|
| Variable | Description |
|
||||||
|
| ------------------- | -------------------------------------- |
|
||||||
|
| `ORCHESTRATOR_PORT` | HTTP port (default: 3001) |
|
||||||
|
| `CLAUDE_API_KEY` | Claude API key for agents |
|
||||||
|
| `VALKEY_HOST` | Valkey/Redis host (default: localhost) |
|
||||||
|
| `VALKEY_PORT` | Valkey/Redis port (default: 6379) |
|
||||||
|
| `COORDINATOR_URL` | Quality Coordinator base URL |
|
||||||
|
| `SANDBOX_ENABLED` | Enable Docker sandbox (true/false) |
|
||||||
|
|
||||||
- Architecture: `/docs/ORCHESTRATOR-MONOREPO-SETUP.md`
|
## Related Documentation
|
||||||
- API Contracts: `/docs/M6-ISSUE-AUDIT.md`
|
|
||||||
|
- Design: `docs/design/agent-orchestration.md`
|
||||||
|
- Setup: `docs/ORCHESTRATOR-MONOREPO-SETUP.md`
|
||||||
- Milestone: M6-AgentOrchestration (0.0.6)
|
- Milestone: M6-AgentOrchestration (0.0.6)
|
||||||
|
|||||||
Reference in New Issue
Block a user