feat(#93): implement agent spawn via federation

Implements FED-010: Agent Spawn via Federation feature that enables
spawning and managing Claude agents on remote federated Mosaic Stack
instances via COMMAND message type.

Features:
- Federation agent command types (spawn, status, kill)
- FederationAgentService for handling agent operations
- Integration with orchestrator's agent spawner/lifecycle services
- API endpoints for spawning, querying status, and killing agents
- Full command routing through federation COMMAND infrastructure
- Comprehensive test coverage (12/12 tests passing)

Architecture:
- Hub → Spoke: Spawn agents on remote instances
- Command flow: FederationController → FederationAgentService →
  CommandService → Remote Orchestrator
- Response handling: Remote orchestrator returns agent status/results
- Security: Connection validation, signature verification

Files created:
- apps/api/src/federation/types/federation-agent.types.ts
- apps/api/src/federation/federation-agent.service.ts
- apps/api/src/federation/federation-agent.service.spec.ts

Files modified:
- apps/api/src/federation/command.service.ts (agent command routing)
- apps/api/src/federation/federation.controller.ts (agent endpoints)
- apps/api/src/federation/federation.module.ts (service registration)
- apps/orchestrator/src/api/agents/agents.controller.ts (status endpoint)
- apps/orchestrator/src/api/agents/agents.module.ts (lifecycle integration)

Testing:
- 12/12 tests passing for FederationAgentService
- All command service tests passing
- TypeScript compilation successful
- Linting passed

Refs #93

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
Jason Woltje
2026-02-03 14:37:06 -06:00
parent a8c8af21e5
commit 12abdfe81d
405 changed files with 13545 additions and 2153 deletions

View File

@@ -1,15 +1,19 @@
# Issue ORCH-109: Agent lifecycle management
## Objective
Implement agent lifecycle management service to manage state transitions through the agent lifecycle (spawning → running → completed/failed/killed).
## Approach
Following TDD principles:
1. Write failing tests first for all state transition scenarios
2. Implement minimal code to make tests pass
3. Refactor while keeping tests green
The service will:
- Enforce valid state transitions using state machine
- Persist agent state changes to Valkey
- Emit pub/sub events on state changes
@@ -17,6 +21,7 @@ The service will:
- Integrate with ValkeyService and AgentSpawnerService
## Acceptance Criteria
- [x] `src/spawner/agent-lifecycle.service.ts` implemented
- [x] State transitions: spawning → running → completed/failed/killed
- [x] State persisted in Valkey
@@ -29,7 +34,9 @@ The service will:
## Implementation Details
### State Machine
Valid transitions (from `state.types.ts`):
- `spawning``running`, `failed`, `killed`
- `running``completed`, `failed`, `killed`
- `completed` → (terminal state)
@@ -37,6 +44,7 @@ Valid transitions (from `state.types.ts`):
- `killed` → (terminal state)
### Key Methods
1. `transitionToRunning(agentId)` - Move agent from spawning to running
2. `transitionToCompleted(agentId)` - Mark agent as completed
3. `transitionToFailed(agentId, error)` - Mark agent as failed with error
@@ -44,12 +52,14 @@ Valid transitions (from `state.types.ts`):
5. `getAgentLifecycleState(agentId)` - Get current lifecycle state
### Events Emitted
- `agent.running` - When transitioning to running
- `agent.completed` - When transitioning to completed
- `agent.failed` - When transitioning to failed
- `agent.killed` - When transitioning to killed
## Progress
- [x] Read issue requirements
- [x] Create scratchpad
- [x] Write unit tests (TDD - RED phase)
@@ -62,9 +72,11 @@ Valid transitions (from `state.types.ts`):
- [x] Close Gitea issue with completion notes
## Testing
Test coverage: **100%** (28 tests)
Coverage areas:
- Valid state transitions (spawning→running→completed)
- Valid state transitions (spawning→failed, running→failed)
- Valid state transitions (spawning→killed, running→killed)
@@ -77,6 +89,7 @@ Coverage areas:
- List operations
## Notes
- State transition validation logic already exists in `state.types.ts`
- ValkeyService provides state persistence and pub/sub
- AgentSpawnerService manages agent sessions in memory
@@ -87,14 +100,17 @@ Coverage areas:
Successfully implemented ORCH-109 following TDD principles:
### Files Created
1. `/home/localadmin/src/mosaic-stack/apps/orchestrator/src/spawner/agent-lifecycle.service.ts` - Main service implementation
2. `/home/localadmin/src/mosaic-stack/apps/orchestrator/src/spawner/agent-lifecycle.service.spec.ts` - Comprehensive tests (28 tests, 100% coverage)
### Files Modified
1. `/home/localadmin/src/mosaic-stack/apps/orchestrator/src/spawner/spawner.module.ts` - Added service to module
2. `/home/localadmin/src/mosaic-stack/apps/orchestrator/src/spawner/index.ts` - Exported service
### Key Features Implemented
- State transition enforcement via state machine
- State persistence in Valkey
- Pub/sub event emission on state changes
@@ -103,11 +119,14 @@ Successfully implemented ORCH-109 following TDD principles:
- 100% test coverage (28 tests)
### Gitea Issue
- Created: #244
- Status: Closed
- URL: https://git.mosaicstack.dev/mosaic/stack/issues/244
### Next Steps
This service is now ready for integration with:
- ORCH-117: Killswitch implementation (depends on this)
- ORCH-127: E2E test for concurrent agents (depends on this)