mosaic/stack

Files

Jason Woltje 12abdfe81d feat(#93 ): implement agent spawn via federation

Implements FED-010: Agent Spawn via Federation feature that enables
spawning and managing Claude agents on remote federated Mosaic Stack
instances via COMMAND message type.

Features:
- Federation agent command types (spawn, status, kill)
- FederationAgentService for handling agent operations
- Integration with orchestrator's agent spawner/lifecycle services
- API endpoints for spawning, querying status, and killing agents
- Full command routing through federation COMMAND infrastructure
- Comprehensive test coverage (12/12 tests passing)

Architecture:
- Hub → Spoke: Spawn agents on remote instances
- Command flow: FederationController → FederationAgentService →
  CommandService → Remote Orchestrator
- Response handling: Remote orchestrator returns agent status/results
- Security: Connection validation, signature verification

Files created:
- apps/api/src/federation/types/federation-agent.types.ts
- apps/api/src/federation/federation-agent.service.ts
- apps/api/src/federation/federation-agent.service.spec.ts

Files modified:
- apps/api/src/federation/command.service.ts (agent command routing)
- apps/api/src/federation/federation.controller.ts (agent endpoints)
- apps/api/src/federation/federation.module.ts (service registration)
- apps/orchestrator/src/api/agents/agents.controller.ts (status endpoint)
- apps/orchestrator/src/api/agents/agents.module.ts (lifecycle integration)

Testing:
- 12/12 tests passing for FederationAgentService
- All command service tests passing
- TypeScript compilation successful
- Linting passed

Refs #93

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

2026-02-03 14:37:06 -06:00

7.5 KiB

Raw Blame History

Issue #93: Agent Spawn via Federation (FED-010)

Objective

Implement the ability to spawn and manage agents on remote Mosaic Stack instances via the federation COMMAND message type. This enables distributed agent execution where the hub can delegate agent tasks to spoke instances.

Requirements

Send agent spawn commands to remote instances via federation COMMAND messages
Handle incoming agent spawn requests from remote instances
Track agent lifecycle (spawn → running → completed/failed/killed)
Return agent status and results to the requesting instance
Proper authorization and security checks
TypeScript type safety (no explicit 'any')
Comprehensive error handling and validation
85%+ test coverage

Background

This builds on the complete foundation from Phases 1-4:

Phase 1-2: Instance Identity, Connection Protocol
Phase 3: OIDC, Identity Linking, QUERY/COMMAND/EVENT message types
Phase 4: Connection Manager UI, Aggregated Dashboard

The orchestrator app already has:

AgentSpawnerService: Spawns agents using Anthropic SDK
AgentLifecycleService: Manages agent state transitions
ValkeyService: Persists agent state and pub/sub events
Docker sandbox capabilities

Approach

Phase 1: Define Federation Agent Command Types (TDD)

Create federation-agent.types.ts with:
- SpawnAgentCommandPayload interface
- AgentStatusCommandPayload interface
- KillAgentCommandPayload interface
- AgentCommandResponse interface

Phase 2: Implement Federation Agent Service (TDD)

Create federation-agent.service.ts in API that:
- Sends spawn/status/kill commands to remote instances
- Handles incoming agent commands from remote instances
- Integrates with orchestrator services via HTTP
- Validates permissions and workspace access

Phase 3: Implement Agent Command Handler in Orchestrator (TDD)

Create agent-command.controller.ts in orchestrator that:
- Exposes HTTP endpoints for federation agent commands
- Delegates to AgentSpawnerService and AgentLifecycleService
- Returns agent status and results
- Validates authentication and authorization

Phase 4: Integrate with Command Service (TDD)

Update command.service.ts to route "agent.spawn" commands
Add command type handlers
Update response processing for agent commands

Phase 5: Add Federation Agent API Endpoints (TDD)

Add endpoints to federation controller:
- POST /api/v1/federation/agents/spawn - Spawn agent on remote instance
- GET /api/v1/federation/agents/:agentId/status - Get agent status
- POST /api/v1/federation/agents/:agentId/kill - Kill agent on remote instance

Phase 6: End-to-End Testing

Create integration tests for full spawn→run→complete flow
Test error scenarios (connection failures, auth failures, etc.)
Test concurrent agent execution
Verify state persistence and recovery

Design Decisions

Command Types

// Spawn agent on remote instance
{
  commandType: "agent.spawn",
  payload: {
    taskId: "task-123",
    agentType: "worker" | "reviewer" | "tester",
    context: {
      repository: "git.example.com/org/repo",
      branch: "feature-branch",
      workItems: ["item-1", "item-2"],
      instructions: "Task instructions..."
    },
    options: {
      timeout: 3600000, // 1 hour
      maxRetries: 3
    }
  }
}

// Get agent status
{
  commandType: "agent.status",
  payload: {
    agentId: "agent-uuid"
  }
}

// Kill agent
{
  commandType: "agent.kill",
  payload: {
    agentId: "agent-uuid"
  }
}

Response Format

// Spawn response
{
  success: true,
  data: {
    agentId: "agent-uuid",
    state: "spawning",
    spawnedAt: "2026-02-03T14:30:00Z"
  }
}

// Status response
{
  success: true,
  data: {
    agentId: "agent-uuid",
    taskId: "task-123",
    status: "running",
    spawnedAt: "2026-02-03T14:30:00Z",
    startedAt: "2026-02-03T14:30:05Z",
    progress: {
      // Agent-specific progress data
    }
  }
}

// Error response
{
  success: false,
  error: "Agent not found"
}

Architecture

┌─────────────┐                    ┌─────────────┐
│   Hub API   │                    │  Spoke API  │
│ (Federation)│◄──────────────────►│ (Federation)│
└──────┬──────┘  COMMAND Messages  └──────┬──────┘
       │                                   │
       │                                   │
┌──────▼──────┐                    ┌──────▼──────┐
│ Orchestrator│                    │ Orchestrator│
│   (HTTP)    │                    │   (HTTP)    │
└──────┬──────┘                    └──────┬──────┘
       │                                   │
  ┌────┴────┐                         ┌────┴────┐
  │ Spawner │                         │ Spawner │
  │Lifecycle│                         │Lifecycle│
  └─────────┘                         └─────────┘

Security Considerations

Validate federation connection is ACTIVE
Verify signature on all incoming commands
Check workspace permissions for agent operations
Rate limit agent spawn requests
Validate agent ownership before status/kill operations
Sanitize all inputs to prevent injection attacks

File Structure

apps/api/src/federation/
├── types/
│   ├── federation-agent.types.ts      # NEW
│   └── message.types.ts               # EXISTING
├── federation-agent.service.ts         # NEW
├── federation-agent.service.spec.ts   # NEW
├── command.service.ts                  # UPDATE
└── federation.controller.ts            # UPDATE

apps/orchestrator/src/api/
├── agent-command.controller.ts         # NEW
├── agent-command.controller.spec.ts   # NEW
└── ...

Progress

Create scratchpad
Review existing architecture
Define federation agent types (federation-agent.types.ts)
Write tests for FederationAgentService (12 tests)
Implement FederationAgentService
Update CommandService to route agent commands
Add FederationAgentService to federation module
Add federation agent endpoints to FederationController
Add agent status endpoint to orchestrator AgentsController
Update AgentsModule to include lifecycle service
Run all tests (12/12 passing for FederationAgentService)
TypeScript type checking (passing)
Run full test suite
Linting
Security review
Integration testing
Documentation update
Commit changes

Testing Strategy

Unit Tests: Test each service method in isolation
Integration Tests: Test full command flow (API → Orchestrator → Agent)
Error Tests: Test failure scenarios (network, auth, validation)
Concurrent Tests: Test multiple agents spawning simultaneously
State Tests: Test agent lifecycle state transitions

Notes

Orchestrator already has complete agent spawner/lifecycle infrastructure
Need to expose HTTP API in orchestrator for federation to call
Agent state is persisted in Valkey (Redis-compatible)
Consider WebSocket for real-time agent status updates (future enhancement)
May need to add orchestrator URL to federation connection metadata