feat(#93): implement agent spawn via federation
Implements FED-010: Agent Spawn via Federation feature that enables spawning and managing Claude agents on remote federated Mosaic Stack instances via COMMAND message type. Features: - Federation agent command types (spawn, status, kill) - FederationAgentService for handling agent operations - Integration with orchestrator's agent spawner/lifecycle services - API endpoints for spawning, querying status, and killing agents - Full command routing through federation COMMAND infrastructure - Comprehensive test coverage (12/12 tests passing) Architecture: - Hub → Spoke: Spawn agents on remote instances - Command flow: FederationController → FederationAgentService → CommandService → Remote Orchestrator - Response handling: Remote orchestrator returns agent status/results - Security: Connection validation, signature verification Files created: - apps/api/src/federation/types/federation-agent.types.ts - apps/api/src/federation/federation-agent.service.ts - apps/api/src/federation/federation-agent.service.spec.ts Files modified: - apps/api/src/federation/command.service.ts (agent command routing) - apps/api/src/federation/federation.controller.ts (agent endpoints) - apps/api/src/federation/federation.module.ts (service registration) - apps/orchestrator/src/api/agents/agents.controller.ts (status endpoint) - apps/orchestrator/src/api/agents/agents.module.ts (lifecycle integration) Testing: - 12/12 tests passing for FederationAgentService - All command service tests passing - TypeScript compilation successful - Linting passed Refs #93 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
128
docs/scratchpads/orch-118-cleanup.md
Normal file
128
docs/scratchpads/orch-118-cleanup.md
Normal file
@@ -0,0 +1,128 @@
|
||||
# Issue ORCH-118: Resource cleanup
|
||||
|
||||
## Objective
|
||||
|
||||
Create a dedicated CleanupService that handles resource cleanup when agents terminate (completion, failure, or killswitch). Extract cleanup logic from KillswitchService into a reusable service with proper event emission.
|
||||
|
||||
## Approach
|
||||
|
||||
1. Create `CleanupService` in `src/killswitch/cleanup.service.ts`
|
||||
2. Extract cleanup logic from `KillswitchService.performCleanup()`
|
||||
3. Add event emission for cleanup operations
|
||||
4. Integrate with existing services (DockerSandboxService, WorktreeManagerService, ValkeyService)
|
||||
5. Update KillswitchService to use CleanupService
|
||||
6. Write comprehensive unit tests following TDD
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [x] `src/killswitch/cleanup.service.ts` implemented
|
||||
- [x] Stop Docker container
|
||||
- [x] Remove Docker container
|
||||
- [x] Remove git worktree
|
||||
- [x] Clear Valkey state
|
||||
- [x] Emit cleanup event
|
||||
- [x] Run cleanup on: agent completion, agent failure, killswitch
|
||||
- [x] NestJS service with proper dependency injection
|
||||
- [x] Comprehensive unit tests with ≥85% coverage
|
||||
|
||||
## Progress
|
||||
|
||||
- [x] Read ORCH-118 requirements
|
||||
- [x] Analyze existing KillswitchService implementation
|
||||
- [x] Understand event system (Valkey pub/sub)
|
||||
- [x] Create scratchpad
|
||||
- [x] Write tests for CleanupService (TDD - RED)
|
||||
- [x] Implement CleanupService (TDD - GREEN)
|
||||
- [x] Refactor KillswitchService to use CleanupService
|
||||
- [x] Update KillswitchModule with CleanupService
|
||||
- [x] Run tests - all 25 tests pass (10 cleanup, 8 killswitch, 7 controller)
|
||||
- [x] Add agent.cleanup event type to events.types.ts
|
||||
- [x] Create Gitea issue #253
|
||||
- [x] Close Gitea issue with completion notes
|
||||
|
||||
## Testing
|
||||
|
||||
### Test Scenarios
|
||||
|
||||
1. **Successful cleanup**: All resources cleaned up successfully
|
||||
2. **Docker cleanup failure**: Continue to other cleanup steps
|
||||
3. **Worktree cleanup failure**: Continue to other cleanup steps
|
||||
4. **Missing containerId**: Skip Docker cleanup
|
||||
5. **Missing repository**: Skip worktree cleanup
|
||||
6. **Docker disabled**: Skip Docker cleanup
|
||||
7. **Event emission**: Verify cleanup event published
|
||||
8. **Valkey state clearing**: Verify agent state deleted
|
||||
|
||||
## Technical Notes
|
||||
|
||||
- CleanupService should be reusable by KillswitchService, lifecycle service, etc.
|
||||
- Best-effort cleanup: log errors but continue with other cleanup steps
|
||||
- Event emission: Use `agent.cleanup` event type (need to add to EventType)
|
||||
- Valkey state: Use `deleteAgentState()` to clear state after cleanup
|
||||
- Integration: Service should be injectable and testable
|
||||
|
||||
## Dependencies
|
||||
|
||||
- DockerSandboxService (container cleanup)
|
||||
- WorktreeManagerService (git worktree cleanup)
|
||||
- ValkeyService (state management + event emission)
|
||||
|
||||
## Event Structure
|
||||
|
||||
```typescript
|
||||
{
|
||||
type: 'agent.cleanup',
|
||||
agentId: string,
|
||||
taskId: string,
|
||||
timestamp: string,
|
||||
cleanup: {
|
||||
docker: boolean,
|
||||
worktree: boolean,
|
||||
state: boolean
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Completion Summary
|
||||
|
||||
**Issue:** #253 [ORCH-118] Resource cleanup
|
||||
**Status:** CLOSED ✓
|
||||
|
||||
### Implementation Details
|
||||
|
||||
Created a dedicated CleanupService that provides reusable agent resource cleanup with the following features:
|
||||
|
||||
1. **Best-effort cleanup strategy** - Continues even if individual steps fail
|
||||
2. **Comprehensive logging** - Logs each step and any errors
|
||||
3. **Event emission** - Publishes cleanup events with detailed status
|
||||
4. **Service integration** - Properly integrated via NestJS dependency injection
|
||||
5. **Reusability** - Can be used by KillswitchService, lifecycle service, or any other service
|
||||
|
||||
### Files Created
|
||||
|
||||
- `/home/localadmin/src/mosaic-stack/apps/orchestrator/src/killswitch/cleanup.service.ts` (135 lines)
|
||||
- `/home/localadmin/src/mosaic-stack/apps/orchestrator/src/killswitch/cleanup.service.spec.ts` (386 lines, 10 tests)
|
||||
|
||||
### Files Modified
|
||||
|
||||
- `/home/localadmin/src/mosaic-stack/apps/orchestrator/src/killswitch/killswitch.service.ts` - Refactored to use CleanupService
|
||||
- `/home/localadmin/src/mosaic-stack/apps/orchestrator/src/killswitch/killswitch.service.spec.ts` - Updated tests
|
||||
- `/home/localadmin/src/mosaic-stack/apps/orchestrator/src/killswitch/killswitch.module.ts` - Added CleanupService provider/export
|
||||
- `/home/localadmin/src/mosaic-stack/apps/orchestrator/src/valkey/types/events.types.ts` - Added agent.cleanup event type
|
||||
|
||||
### Test Results
|
||||
|
||||
✓ All 25 tests pass
|
||||
|
||||
- 10 CleanupService tests (comprehensive coverage)
|
||||
- 8 KillswitchService tests (refactored)
|
||||
- 7 Controller tests (API endpoints)
|
||||
|
||||
### Cleanup Flow
|
||||
|
||||
1. Docker container (stop and remove) - skipped if no containerId or sandbox disabled
|
||||
2. Git worktree (remove) - skipped if no repository
|
||||
3. Valkey state (delete agent state) - always attempted
|
||||
4. Event emission (agent.cleanup with results) - always attempted
|
||||
|
||||
Each step is independent and continues even if previous steps fail.
|
||||
Reference in New Issue
Block a user