- Add MAX_CONCURRENT_AGENTS configuration (default: 20) - Check current agent count before spawning - Reject spawn requests with 429 Too Many Requests when limit reached - Add comprehensive tests for limit enforcement Refs #338
Mosaic Orchestrator
Agent orchestration service for Mosaic Stack built with NestJS.
Overview
The Orchestrator is the execution plane of Mosaic Stack, responsible for:
- Spawning and managing Claude agents (worker, reviewer, tester)
- Task queue management via BullMQ with Valkey backend
- Agent lifecycle state machine (spawning → running → completed/failed/killed)
- Git workflow automation with worktree isolation per agent
- Quality gate enforcement via Coordinator integration
- Killswitch emergency stop with cleanup
- Docker sandbox isolation (optional)
- Secret scanning on agent commits
Architecture
AppModule
├── HealthModule → GET /health, GET /health/ready
├── AgentsModule → POST /agents/spawn, GET /agents/:id/status, kill endpoints
│ ├── QueueModule → BullMQ task queue (priority 1-10, retry with backoff)
│ ├── SpawnerModule → Agent session management, Docker sandbox, lifecycle FSM
│ ├── KillswitchModule → Emergency kill + cleanup (Docker, worktree, Valkey state)
│ └── ValkeyModule → Distributed state persistence and pub/sub events
├── CoordinatorModule → Quality gate checks (typecheck, lint, tests, coverage, AI review)
├── GitModule → Clone, branch, commit, push, conflict detection, secret scanning
└── MonitorModule → Agent health monitoring (placeholder)
Part of the Mosaic Stack monorepo at apps/orchestrator/.
Controlled by apps/coordinator/ (Quality Coordinator).
Monitored via apps/web/ (Agent Dashboard).
API Reference
Health
| Method | Path | Description |
|---|---|---|
| GET | /health |
Uptime and status |
| GET | /health/ready |
Readiness check |
Agents
| Method | Path | Description |
|---|---|---|
| POST | /agents/spawn |
Spawn a new agent |
| GET | /agents/:agentId/status |
Get agent status |
| POST | /agents/:agentId/kill |
Kill a single agent |
| POST | /agents/kill-all |
Kill all active agents |
POST /agents/spawn
{
"taskId": "string (required)",
"agentType": "worker | reviewer | tester",
"gateProfile": "strict | standard | minimal | custom (optional)",
"context": {
"repository": "https://git.example.com/repo.git",
"branch": "main",
"workItems": ["US-001"],
"skills": ["typescript"]
}
}
Response:
{
"agentId": "uuid",
"status": "spawning"
}
GET /agents/:agentId/status
Response:
{
"agentId": "uuid",
"taskId": "string",
"status": "spawning | running | completed | failed | killed",
"spawnedAt": "ISO timestamp",
"startedAt": "ISO timestamp (optional)",
"completedAt": "ISO timestamp (optional)",
"error": "string (optional)"
}
POST /agents/kill-all
Response:
{
"message": "Kill all completed: 3 killed, 0 failed",
"total": 3,
"killed": 3,
"failed": 0,
"errors": []
}
Services
| Service | Module | Responsibility |
|---|---|---|
| AgentSpawnerService | Spawner | Create agent sessions, generate UUIDs, track state |
| AgentLifecycleService | Spawner | State machine transitions with Valkey pub/sub events |
| DockerSandboxService | Spawner | Container creation with memory/CPU limits |
| QueueService | Queue | BullMQ priority queue with exponential backoff retry |
| KillswitchService | Killswitch | Emergency agent termination with audit logging |
| CleanupService | Killswitch | Multi-step cleanup (Docker, worktree, Valkey state) |
| GitOperationsService | Git | Clone, branch, commit, push operations |
| WorktreeManagerService | Git | Per-agent worktree isolation |
| ConflictDetectionService | Git | Merge conflict detection before push |
| SecretScannerService | Git | Detect hardcoded secrets (AWS, API keys, JWTs, etc.) |
| ValkeyService | Valkey | Distributed state and event pub/sub |
| CoordinatorClientService | Coordinator | HTTP client for quality gate API with retry |
| QualityGatesService | Coordinator | Pre-commit and post-commit gate evaluation |
Valkey State Keys
orchestrator:task:{taskId} → TaskState (status, agentId, context, timestamps)
orchestrator:agent:{agentId} → AgentState (status, taskId, timestamps, error)
orchestrator:events → Pub/sub channel for lifecycle events
Quality Gate Profiles
| Profile | Default For | Gates |
|---|---|---|
| strict | reviewer | typecheck, lint, tests, coverage (85%), build, integration, AI review |
| standard | worker | typecheck, lint, tests, coverage (85%) |
| minimal | tester | tests only |
Development
# Install dependencies (from monorepo root)
pnpm install
# Run in dev mode
pnpm --filter @mosaic/orchestrator dev
# Build
pnpm --filter @mosaic/orchestrator build
# Run unit tests
pnpm --filter @mosaic/orchestrator test
# Run E2E/integration tests
pnpm --filter @mosaic/orchestrator test:e2e
# Type check
pnpm --filter @mosaic/orchestrator typecheck
# Lint
pnpm --filter @mosaic/orchestrator lint
Testing
- Unit tests: Co-located
*.spec.tsfiles (19 test files, 447+ tests) - Integration tests:
tests/integration/*.e2e-spec.ts(17 E2E tests) - Coverage threshold: 85% (lines, functions, branches, statements)
Configuration
Environment variables loaded via @nestjs/config. Key variables:
| Variable | Description |
|---|---|
ORCHESTRATOR_PORT |
HTTP port (default: 3001) |
CLAUDE_API_KEY |
Claude API key for agents |
VALKEY_HOST |
Valkey/Redis host (default: localhost) |
VALKEY_PORT |
Valkey/Redis port (default: 6379) |
COORDINATOR_URL |
Quality Coordinator base URL |
SANDBOX_ENABLED |
Enable Docker sandbox (true/false) |
Related Documentation
- Design:
docs/design/agent-orchestration.md - Setup:
docs/ORCHESTRATOR-MONOREPO-SETUP.md - Milestone: M6-AgentOrchestration (0.0.6)