Implements FED-010: Agent Spawn via Federation feature that enables spawning and managing Claude agents on remote federated Mosaic Stack instances via COMMAND message type. Features: - Federation agent command types (spawn, status, kill) - FederationAgentService for handling agent operations - Integration with orchestrator's agent spawner/lifecycle services - API endpoints for spawning, querying status, and killing agents - Full command routing through federation COMMAND infrastructure - Comprehensive test coverage (12/12 tests passing) Architecture: - Hub → Spoke: Spawn agents on remote instances - Command flow: FederationController → FederationAgentService → CommandService → Remote Orchestrator - Response handling: Remote orchestrator returns agent status/results - Security: Connection validation, signature verification Files created: - apps/api/src/federation/types/federation-agent.types.ts - apps/api/src/federation/federation-agent.service.ts - apps/api/src/federation/federation-agent.service.spec.ts Files modified: - apps/api/src/federation/command.service.ts (agent command routing) - apps/api/src/federation/federation.controller.ts (agent endpoints) - apps/api/src/federation/federation.module.ts (service registration) - apps/orchestrator/src/api/agents/agents.controller.ts (status endpoint) - apps/orchestrator/src/api/agents/agents.module.ts (lifecycle integration) Testing: - 12/12 tests passing for FederationAgentService - All command service tests passing - TypeScript compilation successful - Linting passed Refs #93 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
238 lines
6.8 KiB
Markdown
238 lines
6.8 KiB
Markdown
# Issue ORCH-107: Valkey client and state management
|
|
|
|
## Objective
|
|
|
|
Implement Valkey client and state management system for the orchestrator service using ioredis for:
|
|
|
|
- Connection management
|
|
- State persistence for tasks and agents
|
|
- Pub/sub for events (agent spawned, completed, failed)
|
|
- Task and agent state machines
|
|
|
|
## Acceptance Criteria
|
|
|
|
- [x] Create scratchpad document
|
|
- [x] `src/valkey/client.ts` with ioredis connection
|
|
- [x] State schema implemented (tasks, agents, queue)
|
|
- [x] Pub/sub for events (agent spawned, completed, failed)
|
|
- [x] Task state: pending, assigned, executing, completed, failed
|
|
- [x] Agent state: spawning, running, completed, failed, killed
|
|
- [x] Unit tests with ≥85% coverage (TDD approach) - **Achieved 96.96% branch coverage**
|
|
- [x] Configuration from environment variables
|
|
|
|
## Approach
|
|
|
|
### TDD Implementation Plan (Red-Green-Refactor)
|
|
|
|
1. **Phase 1: Valkey Client Foundation**
|
|
- Write tests for ValkeyClient connection management
|
|
- Implement ValkeyClient with ioredis
|
|
- Write tests for basic get/set/delete operations
|
|
- Implement basic operations
|
|
|
|
2. **Phase 2: State Schema & Persistence**
|
|
- Write tests for task state persistence
|
|
- Implement task state operations
|
|
- Write tests for agent state persistence
|
|
- Implement agent state operations
|
|
|
|
3. **Phase 3: Pub/Sub Events**
|
|
- Write tests for event publishing
|
|
- Implement event publishing
|
|
- Write tests for event subscription
|
|
- Implement event subscription
|
|
|
|
4. **Phase 4: NestJS Service Integration**
|
|
- Write tests for ValkeyService
|
|
- Implement ValkeyService with dependency injection
|
|
- Update ValkeyModule with providers
|
|
|
|
### State Schema Design
|
|
|
|
**Task State:**
|
|
|
|
```typescript
|
|
interface TaskState {
|
|
taskId: string;
|
|
status: "pending" | "assigned" | "executing" | "completed" | "failed";
|
|
agentId?: string;
|
|
context: TaskContext;
|
|
createdAt: string;
|
|
updatedAt: string;
|
|
metadata?: Record<string, unknown>;
|
|
}
|
|
```
|
|
|
|
**Agent State:**
|
|
|
|
```typescript
|
|
interface AgentState {
|
|
agentId: string;
|
|
status: "spawning" | "running" | "completed" | "failed" | "killed";
|
|
taskId: string;
|
|
startedAt?: string;
|
|
completedAt?: string;
|
|
error?: string;
|
|
metadata?: Record<string, unknown>;
|
|
}
|
|
```
|
|
|
|
**Event Types:**
|
|
|
|
```typescript
|
|
type EventType =
|
|
| "agent.spawned"
|
|
| "agent.running"
|
|
| "agent.completed"
|
|
| "agent.failed"
|
|
| "agent.killed"
|
|
| "task.assigned"
|
|
| "task.executing"
|
|
| "task.completed"
|
|
| "task.failed";
|
|
```
|
|
|
|
### File Structure
|
|
|
|
```
|
|
apps/orchestrator/src/valkey/
|
|
├── valkey.module.ts # NestJS module (exists, needs update)
|
|
├── valkey.client.ts # ioredis client wrapper (new)
|
|
├── valkey.client.spec.ts # Client tests (new)
|
|
├── valkey.service.ts # NestJS service (new)
|
|
├── valkey.service.spec.ts # Service tests (new)
|
|
├── types/
|
|
│ ├── index.ts # Type exports (new)
|
|
│ ├── state.types.ts # State interfaces (new)
|
|
│ └── events.types.ts # Event interfaces (new)
|
|
└── index.ts # Public API exports (new)
|
|
```
|
|
|
|
## Progress
|
|
|
|
### Phase 1: Types and Interfaces
|
|
|
|
- [x] Create state.types.ts with TaskState and AgentState
|
|
- [x] Create events.types.ts with event interfaces
|
|
- [x] Create index.ts for type exports
|
|
|
|
### Phase 2: Valkey Client (TDD)
|
|
|
|
- [x] Write ValkeyClient tests (connection, basic ops)
|
|
- [x] Implement ValkeyClient
|
|
- [x] Write state persistence tests
|
|
- [x] Implement state persistence methods
|
|
|
|
### Phase 3: Pub/Sub (TDD)
|
|
|
|
- [x] Write pub/sub tests
|
|
- [x] Implement pub/sub methods
|
|
|
|
### Phase 4: NestJS Service (TDD)
|
|
|
|
- [x] Write ValkeyService tests
|
|
- [x] Implement ValkeyService
|
|
- [x] Update ValkeyModule
|
|
- [x] Add configuration support for VALKEY_PASSWORD
|
|
- [x] Update .env.example with VALKEY_HOST and VALKEY_PASSWORD
|
|
|
|
## Testing
|
|
|
|
- Using vitest for unit tests
|
|
- Mock ioredis using ioredis-mock or manual mocks
|
|
- Target: ≥85% coverage
|
|
- Run: `pnpm test` in apps/orchestrator
|
|
|
|
## Summary
|
|
|
|
Implementation of ORCH-107 is complete. All acceptance criteria have been met:
|
|
|
|
### What Was Built
|
|
|
|
1. **State Management Types** (`types/state.types.ts`, `types/events.types.ts`)
|
|
- TaskState and AgentState interfaces
|
|
- State transition validation
|
|
- Event types for pub/sub
|
|
- Full TypeScript type safety
|
|
|
|
2. **Valkey Client** (`valkey.client.ts`)
|
|
- ioredis connection management
|
|
- Task state CRUD operations
|
|
- Agent state CRUD operations
|
|
- Pub/sub event system
|
|
- State transition enforcement
|
|
- Error handling
|
|
|
|
3. **NestJS Service** (`valkey.service.ts`)
|
|
- Dependency injection integration
|
|
- Configuration management via ConfigService
|
|
- Lifecycle management (onModuleDestroy)
|
|
- Convenience methods for common operations
|
|
|
|
4. **Module Integration** (`valkey.module.ts`)
|
|
- Proper NestJS module setup
|
|
- Service provider configuration
|
|
- ConfigModule import
|
|
|
|
5. **Comprehensive Tests** (45 tests, 96.96% coverage)
|
|
- ValkeyClient unit tests (27 tests)
|
|
- ValkeyService unit tests (18 tests)
|
|
- All state transitions tested
|
|
- Error handling tested
|
|
- Pub/sub functionality tested
|
|
- Edge cases covered
|
|
|
|
### Configuration
|
|
|
|
Added environment variable support:
|
|
|
|
- `VALKEY_HOST` - Valkey server host (default: localhost)
|
|
- `VALKEY_PORT` - Valkey server port (default: 6379)
|
|
- `VALKEY_PASSWORD` - Optional password for authentication
|
|
- `VALKEY_URL` - Alternative connection string format
|
|
|
|
### Key Features
|
|
|
|
- **State Machines**: Enforces valid state transitions for tasks and agents
|
|
- **Type Safety**: Full TypeScript types with validation
|
|
- **Pub/Sub Events**: Real-time event notifications for state changes
|
|
- **Modularity**: Clean separation of concerns (client, service, module)
|
|
- **Testability**: Fully mocked tests, no actual Valkey connection required
|
|
- **Configuration**: Environment-based configuration via NestJS ConfigService
|
|
|
|
### Next Steps
|
|
|
|
This implementation provides the foundation for:
|
|
|
|
- ORCH-108: BullMQ task queue (uses Valkey for state persistence)
|
|
- ORCH-109: Agent lifecycle management (uses state management)
|
|
- Future orchestrator features that need state persistence
|
|
|
|
## Notes
|
|
|
|
### Environment Variables
|
|
|
|
From orchestrator.config.ts:
|
|
|
|
- VALKEY_HOST (default: localhost)
|
|
- VALKEY_PORT (default: 6379)
|
|
- VALKEY_URL (default: redis://localhost:6379)
|
|
- VALKEY_PASSWORD (optional, from .env.example)
|
|
|
|
### Dependencies
|
|
|
|
- ioredis: Already installed in package.json (^5.9.2)
|
|
- @nestjs/config: Already installed
|
|
- Configuration already set up in src/config/orchestrator.config.ts
|
|
|
|
### Key Design Decisions
|
|
|
|
1. Use ioredis for Valkey client (Redis-compatible)
|
|
2. State keys pattern: `orchestrator:{type}:{id}`
|
|
- Tasks: `orchestrator:task:{taskId}`
|
|
- Agents: `orchestrator:agent:{agentId}`
|
|
3. Pub/sub channel pattern: `orchestrator:events`
|
|
4. All timestamps in ISO 8601 format
|
|
5. State transitions enforced by state machine logic
|
|
6. Mock ioredis in tests (no actual Valkey connection needed)
|