Files
stack/apps/orchestrator

Mosaic Orchestrator

Agent orchestration service for Mosaic Stack built with NestJS.

Overview

The Orchestrator is the execution plane of Mosaic Stack, responsible for:

  • Spawning and managing Claude agents (worker, reviewer, tester)
  • Task queue management via BullMQ with Valkey backend
  • Agent lifecycle state machine (spawning → running → completed/failed/killed)
  • Git workflow automation with worktree isolation per agent
  • Quality gate enforcement via Coordinator integration
  • Killswitch emergency stop with cleanup
  • Docker sandbox isolation (optional)
  • Secret scanning on agent commits

Architecture

AppModule
├── HealthModule          → GET /health, GET /health/ready
├── AgentsModule          → POST /agents/spawn, GET /agents/:id/status, kill endpoints
│   ├── QueueModule       → BullMQ task queue (priority 1-10, retry with backoff)
│   ├── SpawnerModule     → Agent session management, Docker sandbox, lifecycle FSM
│   ├── KillswitchModule  → Emergency kill + cleanup (Docker, worktree, Valkey state)
│   └── ValkeyModule      → Distributed state persistence and pub/sub events
├── CoordinatorModule     → Quality gate checks (typecheck, lint, tests, coverage, AI review)
├── GitModule             → Clone, branch, commit, push, conflict detection, secret scanning
└── MonitorModule         → Agent health monitoring (placeholder)

Part of the Mosaic Stack monorepo at apps/orchestrator/. Controlled by apps/coordinator/ (Quality Coordinator). Monitored via apps/web/ (Agent Dashboard).

API Reference

Health

Method Path Description
GET /health Uptime and status
GET /health/ready Readiness check

Agents

Method Path Description
POST /agents/spawn Spawn a new agent
GET /agents/:agentId/status Get agent status
POST /agents/:agentId/kill Kill a single agent
POST /agents/kill-all Kill all active agents
GET /agents/events SSE lifecycle/task events
GET /agents/events/recent Recent events (polling)

Queue

Method Path Description
GET /queue/stats Queue depth and worker stats
POST /queue/pause Pause queue processing
POST /queue/resume Resume queue processing

POST /agents/spawn

{
  "taskId": "string (required)",
  "agentType": "worker | reviewer | tester",
  "gateProfile": "strict | standard | minimal | custom (optional)",
  "context": {
    "repository": "https://git.example.com/repo.git",
    "branch": "main",
    "workItems": ["US-001"],
    "skills": ["typescript"]
  }
}

Response:

{
  "agentId": "uuid",
  "status": "spawning"
}

GET /agents/:agentId/status

Response:

{
  "agentId": "uuid",
  "taskId": "string",
  "status": "spawning | running | completed | failed | killed",
  "spawnedAt": "ISO timestamp",
  "startedAt": "ISO timestamp (optional)",
  "completedAt": "ISO timestamp (optional)",
  "error": "string (optional)"
}

POST /agents/kill-all

Response:

{
  "message": "Kill all completed: 3 killed, 0 failed",
  "total": 3,
  "killed": 3,
  "failed": 0,
  "errors": []
}

Services

Service Module Responsibility
AgentSpawnerService Spawner Create agent sessions, generate UUIDs, track state
AgentLifecycleService Spawner State machine transitions with Valkey pub/sub events
DockerSandboxService Spawner Container creation with memory/CPU limits
QueueService Queue BullMQ priority queue with exponential backoff retry
KillswitchService Killswitch Emergency agent termination with audit logging
CleanupService Killswitch Multi-step cleanup (Docker, worktree, Valkey state)
GitOperationsService Git Clone, branch, commit, push operations
WorktreeManagerService Git Per-agent worktree isolation
ConflictDetectionService Git Merge conflict detection before push
SecretScannerService Git Detect hardcoded secrets (AWS, API keys, JWTs, etc.)
ValkeyService Valkey Distributed state and event pub/sub
CoordinatorClientService Coordinator HTTP client for quality gate API with retry
QualityGatesService Coordinator Pre-commit and post-commit gate evaluation

Valkey State Keys

orchestrator:task:{taskId}    → TaskState (status, agentId, context, timestamps)
orchestrator:agent:{agentId}  → AgentState (status, taskId, timestamps, error)
orchestrator:events           → Pub/sub channel for lifecycle events

Quality Gate Profiles

Profile Default For Gates
strict reviewer typecheck, lint, tests, coverage (85%), build, integration, AI review
standard worker typecheck, lint, tests, coverage (85%)
minimal tester tests only

Development

# Install dependencies (from monorepo root)
pnpm install

# Run in dev mode
pnpm --filter @mosaic/orchestrator dev

# Build
pnpm --filter @mosaic/orchestrator build

# Run unit tests
pnpm --filter @mosaic/orchestrator test

# Run E2E/integration tests
pnpm --filter @mosaic/orchestrator test:e2e

# Type check
pnpm --filter @mosaic/orchestrator typecheck

# Lint
pnpm --filter @mosaic/orchestrator lint

Testing

  • Unit tests: Co-located *.spec.ts files (19 test files, 447+ tests)
  • Integration tests: tests/integration/*.e2e-spec.ts (17 E2E tests)
  • Coverage threshold: 85% (lines, functions, branches, statements)

Configuration

Environment variables loaded via @nestjs/config. Key variables:

Variable Description
ORCHESTRATOR_PORT HTTP port (default: 3001)
CLAUDE_API_KEY Claude API key for agents
VALKEY_HOST Valkey/Redis host (default: localhost)
VALKEY_PORT Valkey/Redis port (default: 6379)
COORDINATOR_URL Quality Coordinator base URL
SANDBOX_ENABLED Enable Docker sandbox (true/false)
MAX_CONCURRENT_AGENTS Maximum concurrent in-memory sessions (default: 2)
ORCHESTRATOR_QUEUE_CONCURRENCY BullMQ worker concurrency (default: 1)
SANDBOX_DEFAULT_MEMORY_MB Sandbox memory limit in MB (default: 256)
  • Design: docs/design/agent-orchestration.md
  • Setup: docs/ORCHESTRATOR-MONOREPO-SETUP.md
  • Milestone: M6-AgentOrchestration (0.0.6)