Files
stack/apps/orchestrator
Jason Woltje c9ad3a661a
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
fix(CQ-ORCH-9): Deduplicate spawn validation logic
Remove duplicate validateSpawnRequest from AgentsController. Validation
is now handled exclusively by:
1. ValidationPipe + DTO decorators (HTTP layer, class-validator)
2. AgentSpawnerService.validateSpawnRequest (business logic layer)

This eliminates the maintenance burden and divergence risk of having
identical validation in two places. Controller tests for the removed
duplicate validation are also removed since they are fully covered by
the service tests and DTO validation decorators.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 14:09:06 -06:00
..

Mosaic Orchestrator

Agent orchestration service for Mosaic Stack built with NestJS.

Overview

The Orchestrator is the execution plane of Mosaic Stack, responsible for:

  • Spawning and managing Claude agents (worker, reviewer, tester)
  • Task queue management via BullMQ with Valkey backend
  • Agent lifecycle state machine (spawning → running → completed/failed/killed)
  • Git workflow automation with worktree isolation per agent
  • Quality gate enforcement via Coordinator integration
  • Killswitch emergency stop with cleanup
  • Docker sandbox isolation (optional)
  • Secret scanning on agent commits

Architecture

AppModule
├── HealthModule          → GET /health, GET /health/ready
├── AgentsModule          → POST /agents/spawn, GET /agents/:id/status, kill endpoints
│   ├── QueueModule       → BullMQ task queue (priority 1-10, retry with backoff)
│   ├── SpawnerModule     → Agent session management, Docker sandbox, lifecycle FSM
│   ├── KillswitchModule  → Emergency kill + cleanup (Docker, worktree, Valkey state)
│   └── ValkeyModule      → Distributed state persistence and pub/sub events
├── CoordinatorModule     → Quality gate checks (typecheck, lint, tests, coverage, AI review)
├── GitModule             → Clone, branch, commit, push, conflict detection, secret scanning
└── MonitorModule         → Agent health monitoring (placeholder)

Part of the Mosaic Stack monorepo at apps/orchestrator/. Controlled by apps/coordinator/ (Quality Coordinator). Monitored via apps/web/ (Agent Dashboard).

API Reference

Health

Method Path Description
GET /health Uptime and status
GET /health/ready Readiness check

Agents

Method Path Description
POST /agents/spawn Spawn a new agent
GET /agents/:agentId/status Get agent status
POST /agents/:agentId/kill Kill a single agent
POST /agents/kill-all Kill all active agents

POST /agents/spawn

{
  "taskId": "string (required)",
  "agentType": "worker | reviewer | tester",
  "gateProfile": "strict | standard | minimal | custom (optional)",
  "context": {
    "repository": "https://git.example.com/repo.git",
    "branch": "main",
    "workItems": ["US-001"],
    "skills": ["typescript"]
  }
}

Response:

{
  "agentId": "uuid",
  "status": "spawning"
}

GET /agents/:agentId/status

Response:

{
  "agentId": "uuid",
  "taskId": "string",
  "status": "spawning | running | completed | failed | killed",
  "spawnedAt": "ISO timestamp",
  "startedAt": "ISO timestamp (optional)",
  "completedAt": "ISO timestamp (optional)",
  "error": "string (optional)"
}

POST /agents/kill-all

Response:

{
  "message": "Kill all completed: 3 killed, 0 failed",
  "total": 3,
  "killed": 3,
  "failed": 0,
  "errors": []
}

Services

Service Module Responsibility
AgentSpawnerService Spawner Create agent sessions, generate UUIDs, track state
AgentLifecycleService Spawner State machine transitions with Valkey pub/sub events
DockerSandboxService Spawner Container creation with memory/CPU limits
QueueService Queue BullMQ priority queue with exponential backoff retry
KillswitchService Killswitch Emergency agent termination with audit logging
CleanupService Killswitch Multi-step cleanup (Docker, worktree, Valkey state)
GitOperationsService Git Clone, branch, commit, push operations
WorktreeManagerService Git Per-agent worktree isolation
ConflictDetectionService Git Merge conflict detection before push
SecretScannerService Git Detect hardcoded secrets (AWS, API keys, JWTs, etc.)
ValkeyService Valkey Distributed state and event pub/sub
CoordinatorClientService Coordinator HTTP client for quality gate API with retry
QualityGatesService Coordinator Pre-commit and post-commit gate evaluation

Valkey State Keys

orchestrator:task:{taskId}    → TaskState (status, agentId, context, timestamps)
orchestrator:agent:{agentId}  → AgentState (status, taskId, timestamps, error)
orchestrator:events           → Pub/sub channel for lifecycle events

Quality Gate Profiles

Profile Default For Gates
strict reviewer typecheck, lint, tests, coverage (85%), build, integration, AI review
standard worker typecheck, lint, tests, coverage (85%)
minimal tester tests only

Development

# Install dependencies (from monorepo root)
pnpm install

# Run in dev mode
pnpm --filter @mosaic/orchestrator dev

# Build
pnpm --filter @mosaic/orchestrator build

# Run unit tests
pnpm --filter @mosaic/orchestrator test

# Run E2E/integration tests
pnpm --filter @mosaic/orchestrator test:e2e

# Type check
pnpm --filter @mosaic/orchestrator typecheck

# Lint
pnpm --filter @mosaic/orchestrator lint

Testing

  • Unit tests: Co-located *.spec.ts files (19 test files, 447+ tests)
  • Integration tests: tests/integration/*.e2e-spec.ts (17 E2E tests)
  • Coverage threshold: 85% (lines, functions, branches, statements)

Configuration

Environment variables loaded via @nestjs/config. Key variables:

Variable Description
ORCHESTRATOR_PORT HTTP port (default: 3001)
CLAUDE_API_KEY Claude API key for agents
VALKEY_HOST Valkey/Redis host (default: localhost)
VALKEY_PORT Valkey/Redis port (default: 6379)
COORDINATOR_URL Quality Coordinator base URL
SANDBOX_ENABLED Enable Docker sandbox (true/false)
  • Design: docs/design/agent-orchestration.md
  • Setup: docs/ORCHESTRATOR-MONOREPO-SETUP.md
  • Milestone: M6-AgentOrchestration (0.0.6)