Files
stack/docs/scratchpads/orch-107-valkey.md
Jason Woltje 12abdfe81d feat(#93): implement agent spawn via federation
Implements FED-010: Agent Spawn via Federation feature that enables
spawning and managing Claude agents on remote federated Mosaic Stack
instances via COMMAND message type.

Features:
- Federation agent command types (spawn, status, kill)
- FederationAgentService for handling agent operations
- Integration with orchestrator's agent spawner/lifecycle services
- API endpoints for spawning, querying status, and killing agents
- Full command routing through federation COMMAND infrastructure
- Comprehensive test coverage (12/12 tests passing)

Architecture:
- Hub → Spoke: Spawn agents on remote instances
- Command flow: FederationController → FederationAgentService →
  CommandService → Remote Orchestrator
- Response handling: Remote orchestrator returns agent status/results
- Security: Connection validation, signature verification

Files created:
- apps/api/src/federation/types/federation-agent.types.ts
- apps/api/src/federation/federation-agent.service.ts
- apps/api/src/federation/federation-agent.service.spec.ts

Files modified:
- apps/api/src/federation/command.service.ts (agent command routing)
- apps/api/src/federation/federation.controller.ts (agent endpoints)
- apps/api/src/federation/federation.module.ts (service registration)
- apps/orchestrator/src/api/agents/agents.controller.ts (status endpoint)
- apps/orchestrator/src/api/agents/agents.module.ts (lifecycle integration)

Testing:
- 12/12 tests passing for FederationAgentService
- All command service tests passing
- TypeScript compilation successful
- Linting passed

Refs #93

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-03 14:37:06 -06:00

238 lines
6.8 KiB
Markdown

# Issue ORCH-107: Valkey client and state management
## Objective
Implement Valkey client and state management system for the orchestrator service using ioredis for:
- Connection management
- State persistence for tasks and agents
- Pub/sub for events (agent spawned, completed, failed)
- Task and agent state machines
## Acceptance Criteria
- [x] Create scratchpad document
- [x] `src/valkey/client.ts` with ioredis connection
- [x] State schema implemented (tasks, agents, queue)
- [x] Pub/sub for events (agent spawned, completed, failed)
- [x] Task state: pending, assigned, executing, completed, failed
- [x] Agent state: spawning, running, completed, failed, killed
- [x] Unit tests with ≥85% coverage (TDD approach) - **Achieved 96.96% branch coverage**
- [x] Configuration from environment variables
## Approach
### TDD Implementation Plan (Red-Green-Refactor)
1. **Phase 1: Valkey Client Foundation**
- Write tests for ValkeyClient connection management
- Implement ValkeyClient with ioredis
- Write tests for basic get/set/delete operations
- Implement basic operations
2. **Phase 2: State Schema & Persistence**
- Write tests for task state persistence
- Implement task state operations
- Write tests for agent state persistence
- Implement agent state operations
3. **Phase 3: Pub/Sub Events**
- Write tests for event publishing
- Implement event publishing
- Write tests for event subscription
- Implement event subscription
4. **Phase 4: NestJS Service Integration**
- Write tests for ValkeyService
- Implement ValkeyService with dependency injection
- Update ValkeyModule with providers
### State Schema Design
**Task State:**
```typescript
interface TaskState {
taskId: string;
status: "pending" | "assigned" | "executing" | "completed" | "failed";
agentId?: string;
context: TaskContext;
createdAt: string;
updatedAt: string;
metadata?: Record<string, unknown>;
}
```
**Agent State:**
```typescript
interface AgentState {
agentId: string;
status: "spawning" | "running" | "completed" | "failed" | "killed";
taskId: string;
startedAt?: string;
completedAt?: string;
error?: string;
metadata?: Record<string, unknown>;
}
```
**Event Types:**
```typescript
type EventType =
| "agent.spawned"
| "agent.running"
| "agent.completed"
| "agent.failed"
| "agent.killed"
| "task.assigned"
| "task.executing"
| "task.completed"
| "task.failed";
```
### File Structure
```
apps/orchestrator/src/valkey/
├── valkey.module.ts # NestJS module (exists, needs update)
├── valkey.client.ts # ioredis client wrapper (new)
├── valkey.client.spec.ts # Client tests (new)
├── valkey.service.ts # NestJS service (new)
├── valkey.service.spec.ts # Service tests (new)
├── types/
│ ├── index.ts # Type exports (new)
│ ├── state.types.ts # State interfaces (new)
│ └── events.types.ts # Event interfaces (new)
└── index.ts # Public API exports (new)
```
## Progress
### Phase 1: Types and Interfaces
- [x] Create state.types.ts with TaskState and AgentState
- [x] Create events.types.ts with event interfaces
- [x] Create index.ts for type exports
### Phase 2: Valkey Client (TDD)
- [x] Write ValkeyClient tests (connection, basic ops)
- [x] Implement ValkeyClient
- [x] Write state persistence tests
- [x] Implement state persistence methods
### Phase 3: Pub/Sub (TDD)
- [x] Write pub/sub tests
- [x] Implement pub/sub methods
### Phase 4: NestJS Service (TDD)
- [x] Write ValkeyService tests
- [x] Implement ValkeyService
- [x] Update ValkeyModule
- [x] Add configuration support for VALKEY_PASSWORD
- [x] Update .env.example with VALKEY_HOST and VALKEY_PASSWORD
## Testing
- Using vitest for unit tests
- Mock ioredis using ioredis-mock or manual mocks
- Target: ≥85% coverage
- Run: `pnpm test` in apps/orchestrator
## Summary
Implementation of ORCH-107 is complete. All acceptance criteria have been met:
### What Was Built
1. **State Management Types** (`types/state.types.ts`, `types/events.types.ts`)
- TaskState and AgentState interfaces
- State transition validation
- Event types for pub/sub
- Full TypeScript type safety
2. **Valkey Client** (`valkey.client.ts`)
- ioredis connection management
- Task state CRUD operations
- Agent state CRUD operations
- Pub/sub event system
- State transition enforcement
- Error handling
3. **NestJS Service** (`valkey.service.ts`)
- Dependency injection integration
- Configuration management via ConfigService
- Lifecycle management (onModuleDestroy)
- Convenience methods for common operations
4. **Module Integration** (`valkey.module.ts`)
- Proper NestJS module setup
- Service provider configuration
- ConfigModule import
5. **Comprehensive Tests** (45 tests, 96.96% coverage)
- ValkeyClient unit tests (27 tests)
- ValkeyService unit tests (18 tests)
- All state transitions tested
- Error handling tested
- Pub/sub functionality tested
- Edge cases covered
### Configuration
Added environment variable support:
- `VALKEY_HOST` - Valkey server host (default: localhost)
- `VALKEY_PORT` - Valkey server port (default: 6379)
- `VALKEY_PASSWORD` - Optional password for authentication
- `VALKEY_URL` - Alternative connection string format
### Key Features
- **State Machines**: Enforces valid state transitions for tasks and agents
- **Type Safety**: Full TypeScript types with validation
- **Pub/Sub Events**: Real-time event notifications for state changes
- **Modularity**: Clean separation of concerns (client, service, module)
- **Testability**: Fully mocked tests, no actual Valkey connection required
- **Configuration**: Environment-based configuration via NestJS ConfigService
### Next Steps
This implementation provides the foundation for:
- ORCH-108: BullMQ task queue (uses Valkey for state persistence)
- ORCH-109: Agent lifecycle management (uses state management)
- Future orchestrator features that need state persistence
## Notes
### Environment Variables
From orchestrator.config.ts:
- VALKEY_HOST (default: localhost)
- VALKEY_PORT (default: 6379)
- VALKEY_URL (default: redis://localhost:6379)
- VALKEY_PASSWORD (optional, from .env.example)
### Dependencies
- ioredis: Already installed in package.json (^5.9.2)
- @nestjs/config: Already installed
- Configuration already set up in src/config/orchestrator.config.ts
### Key Design Decisions
1. Use ioredis for Valkey client (Redis-compatible)
2. State keys pattern: `orchestrator:{type}:{id}`
- Tasks: `orchestrator:task:{taskId}`
- Agents: `orchestrator:agent:{agentId}`
3. Pub/sub channel pattern: `orchestrator:events`
4. All timestamps in ISO 8601 format
5. State transitions enforced by state machine logic
6. Mock ioredis in tests (no actual Valkey connection needed)