stack/docs/scratchpads/orch-109-lifecycle.md

# Issue ORCH-109: Agent lifecycle management

## Objective
Implement agent lifecycle management service to manage state transitions through the agent lifecycle (spawning → running → completed/failed/killed).

## Approach
Following TDD principles:
1. Write failing tests first for all state transition scenarios
2. Implement minimal code to make tests pass
3. Refactor while keeping tests green

The service will:
- Enforce valid state transitions using state machine
- Persist agent state changes to Valkey
- Emit pub/sub events on state changes
- Track agent metadata (startedAt, completedAt, error)
- Integrate with ValkeyService and AgentSpawnerService

## Acceptance Criteria
- [x] `src/spawner/agent-lifecycle.service.ts` implemented
- [x] State transitions: spawning → running → completed/failed/killed
- [x] State persisted in Valkey
- [x] Events emitted on state changes (pub/sub)
- [x] Agent metadata tracked (startedAt, completedAt, error)
- [x] State machine enforces valid transitions only
- [x] Comprehensive unit tests with ≥85% coverage
- [x] Tests follow TDD (written first)

## Implementation Details

### State Machine
Valid transitions (from `state.types.ts`):
- `spawning` → `running`, `failed`, `killed`
- `running` → `completed`, `failed`, `killed`
- `completed` → (terminal state)
- `failed` → (terminal state)
- `killed` → (terminal state)

### Key Methods
1. `transitionToRunning(agentId)` - Move agent from spawning to running
2. `transitionToCompleted(agentId)` - Mark agent as completed
3. `transitionToFailed(agentId, error)` - Mark agent as failed with error
4. `transitionToKilled(agentId)` - Mark agent as killed
5. `getAgentLifecycleState(agentId)` - Get current lifecycle state

### Events Emitted
- `agent.running` - When transitioning to running
- `agent.completed` - When transitioning to completed
- `agent.failed` - When transitioning to failed
- `agent.killed` - When transitioning to killed

## Progress
- [x] Read issue requirements
- [x] Create scratchpad
- [x] Write unit tests (TDD - RED phase)
- [x] Implement service (TDD - GREEN phase)
- [x] Refactor and add edge case tests
- [x] Verify test coverage = 100%
- [x] Add service to module exports
- [x] Verify build passes
- [x] Create Gitea issue
- [x] Close Gitea issue with completion notes

## Testing
Test coverage: **100%** (28 tests)

Coverage areas:
- Valid state transitions (spawning→running→completed)
- Valid state transitions (spawning→failed, running→failed)
- Valid state transitions (spawning→killed, running→killed)
- Invalid state transitions (should throw errors)
- Event emission on state changes
- State persistence in Valkey
- Metadata tracking (timestamps, errors)
- Conditional timestamp setting (startedAt, completedAt)
- Agent not found error handling
- List operations

## Notes
- State transition validation logic already exists in `state.types.ts`
- ValkeyService provides state persistence and pub/sub
- AgentSpawnerService manages agent sessions in memory
- This service bridges the two by managing lifecycle + persistence

## Completion Summary

Successfully implemented ORCH-109 following TDD principles:

### Files Created
1. `/home/localadmin/src/mosaic-stack/apps/orchestrator/src/spawner/agent-lifecycle.service.ts` - Main service implementation
2. `/home/localadmin/src/mosaic-stack/apps/orchestrator/src/spawner/agent-lifecycle.service.spec.ts` - Comprehensive tests (28 tests, 100% coverage)

### Files Modified
1. `/home/localadmin/src/mosaic-stack/apps/orchestrator/src/spawner/spawner.module.ts` - Added service to module
2. `/home/localadmin/src/mosaic-stack/apps/orchestrator/src/spawner/index.ts` - Exported service

### Key Features Implemented
- State transition enforcement via state machine
- State persistence in Valkey
- Pub/sub event emission on state changes
- Metadata tracking (startedAt, completedAt, error)
- Comprehensive error handling
- 100% test coverage (28 tests)

### Gitea Issue
- Created: #244
- Status: Closed
- URL: https://git.mosaicstack.dev/mosaic/stack/issues/244

### Next Steps
This service is now ready for integration with:
- ORCH-117: Killswitch implementation (depends on this)
- ORCH-127: E2E test for concurrent agents (depends on this)