Files
stack/apps/orchestrator
Jason Woltje d2ed1f2817
Some checks failed
ci/woodpecker/push/infra Pipeline was successful
ci/woodpecker/push/orchestrator Pipeline failed
ci/woodpecker/push/api Pipeline failed
ci/woodpecker/push/coordinator Pipeline was successful
ci/woodpecker/push/web Pipeline was successful
fix: eliminate apt-get from Kaniko builds, use static dumb-init binary
Kaniko fundamentally cannot run apt-get update on bookworm (Debian 12)
due to GPG signature verification failures during filesystem snapshots.
Neither --snapshot-mode=redo nor clearing /var/lib/apt/lists/* resolves
this.

Changes:
- Replace apt-get install dumb-init with ADD from GitHub releases
  (static x86_64 binary) in api, web, and orchestrator Dockerfiles
- Switch coordinator builder from python:3.11-slim to python:3.11
  (full image includes build tools, avoids 336MB build-essential)
- Replace wget healthcheck with node-based check in orchestrator
  (wget no longer installed)
- Exclude telemetry lifecycle integration tests in CI (fail due to
  runner disk pressure on PostgreSQL, not code issues)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 20:06:06 -06:00
..

Mosaic Orchestrator

Agent orchestration service for Mosaic Stack built with NestJS.

Overview

The Orchestrator is the execution plane of Mosaic Stack, responsible for:

  • Spawning and managing Claude agents (worker, reviewer, tester)
  • Task queue management via BullMQ with Valkey backend
  • Agent lifecycle state machine (spawning → running → completed/failed/killed)
  • Git workflow automation with worktree isolation per agent
  • Quality gate enforcement via Coordinator integration
  • Killswitch emergency stop with cleanup
  • Docker sandbox isolation (optional)
  • Secret scanning on agent commits

Architecture

AppModule
├── HealthModule          → GET /health, GET /health/ready
├── AgentsModule          → POST /agents/spawn, GET /agents/:id/status, kill endpoints
│   ├── QueueModule       → BullMQ task queue (priority 1-10, retry with backoff)
│   ├── SpawnerModule     → Agent session management, Docker sandbox, lifecycle FSM
│   ├── KillswitchModule  → Emergency kill + cleanup (Docker, worktree, Valkey state)
│   └── ValkeyModule      → Distributed state persistence and pub/sub events
├── CoordinatorModule     → Quality gate checks (typecheck, lint, tests, coverage, AI review)
├── GitModule             → Clone, branch, commit, push, conflict detection, secret scanning
└── MonitorModule         → Agent health monitoring (placeholder)

Part of the Mosaic Stack monorepo at apps/orchestrator/. Controlled by apps/coordinator/ (Quality Coordinator). Monitored via apps/web/ (Agent Dashboard).

API Reference

Health

Method Path Description
GET /health Uptime and status
GET /health/ready Readiness check

Agents

Method Path Description
POST /agents/spawn Spawn a new agent
GET /agents/:agentId/status Get agent status
POST /agents/:agentId/kill Kill a single agent
POST /agents/kill-all Kill all active agents

POST /agents/spawn

{
  "taskId": "string (required)",
  "agentType": "worker | reviewer | tester",
  "gateProfile": "strict | standard | minimal | custom (optional)",
  "context": {
    "repository": "https://git.example.com/repo.git",
    "branch": "main",
    "workItems": ["US-001"],
    "skills": ["typescript"]
  }
}

Response:

{
  "agentId": "uuid",
  "status": "spawning"
}

GET /agents/:agentId/status

Response:

{
  "agentId": "uuid",
  "taskId": "string",
  "status": "spawning | running | completed | failed | killed",
  "spawnedAt": "ISO timestamp",
  "startedAt": "ISO timestamp (optional)",
  "completedAt": "ISO timestamp (optional)",
  "error": "string (optional)"
}

POST /agents/kill-all

Response:

{
  "message": "Kill all completed: 3 killed, 0 failed",
  "total": 3,
  "killed": 3,
  "failed": 0,
  "errors": []
}

Services

Service Module Responsibility
AgentSpawnerService Spawner Create agent sessions, generate UUIDs, track state
AgentLifecycleService Spawner State machine transitions with Valkey pub/sub events
DockerSandboxService Spawner Container creation with memory/CPU limits
QueueService Queue BullMQ priority queue with exponential backoff retry
KillswitchService Killswitch Emergency agent termination with audit logging
CleanupService Killswitch Multi-step cleanup (Docker, worktree, Valkey state)
GitOperationsService Git Clone, branch, commit, push operations
WorktreeManagerService Git Per-agent worktree isolation
ConflictDetectionService Git Merge conflict detection before push
SecretScannerService Git Detect hardcoded secrets (AWS, API keys, JWTs, etc.)
ValkeyService Valkey Distributed state and event pub/sub
CoordinatorClientService Coordinator HTTP client for quality gate API with retry
QualityGatesService Coordinator Pre-commit and post-commit gate evaluation

Valkey State Keys

orchestrator:task:{taskId}    → TaskState (status, agentId, context, timestamps)
orchestrator:agent:{agentId}  → AgentState (status, taskId, timestamps, error)
orchestrator:events           → Pub/sub channel for lifecycle events

Quality Gate Profiles

Profile Default For Gates
strict reviewer typecheck, lint, tests, coverage (85%), build, integration, AI review
standard worker typecheck, lint, tests, coverage (85%)
minimal tester tests only

Development

# Install dependencies (from monorepo root)
pnpm install

# Run in dev mode
pnpm --filter @mosaic/orchestrator dev

# Build
pnpm --filter @mosaic/orchestrator build

# Run unit tests
pnpm --filter @mosaic/orchestrator test

# Run E2E/integration tests
pnpm --filter @mosaic/orchestrator test:e2e

# Type check
pnpm --filter @mosaic/orchestrator typecheck

# Lint
pnpm --filter @mosaic/orchestrator lint

Testing

  • Unit tests: Co-located *.spec.ts files (19 test files, 447+ tests)
  • Integration tests: tests/integration/*.e2e-spec.ts (17 E2E tests)
  • Coverage threshold: 85% (lines, functions, branches, statements)

Configuration

Environment variables loaded via @nestjs/config. Key variables:

Variable Description
ORCHESTRATOR_PORT HTTP port (default: 3001)
CLAUDE_API_KEY Claude API key for agents
VALKEY_HOST Valkey/Redis host (default: localhost)
VALKEY_PORT Valkey/Redis port (default: 6379)
COORDINATOR_URL Quality Coordinator base URL
SANDBOX_ENABLED Enable Docker sandbox (true/false)
  • Design: docs/design/agent-orchestration.md
  • Setup: docs/ORCHESTRATOR-MONOREPO-SETUP.md
  • Milestone: M6-AgentOrchestration (0.0.6)