Files
stack/docs/scratchpads/orch-118-cleanup.md
Jason Woltje 12abdfe81d feat(#93): implement agent spawn via federation
Implements FED-010: Agent Spawn via Federation feature that enables
spawning and managing Claude agents on remote federated Mosaic Stack
instances via COMMAND message type.

Features:
- Federation agent command types (spawn, status, kill)
- FederationAgentService for handling agent operations
- Integration with orchestrator's agent spawner/lifecycle services
- API endpoints for spawning, querying status, and killing agents
- Full command routing through federation COMMAND infrastructure
- Comprehensive test coverage (12/12 tests passing)

Architecture:
- Hub → Spoke: Spawn agents on remote instances
- Command flow: FederationController → FederationAgentService →
  CommandService → Remote Orchestrator
- Response handling: Remote orchestrator returns agent status/results
- Security: Connection validation, signature verification

Files created:
- apps/api/src/federation/types/federation-agent.types.ts
- apps/api/src/federation/federation-agent.service.ts
- apps/api/src/federation/federation-agent.service.spec.ts

Files modified:
- apps/api/src/federation/command.service.ts (agent command routing)
- apps/api/src/federation/federation.controller.ts (agent endpoints)
- apps/api/src/federation/federation.module.ts (service registration)
- apps/orchestrator/src/api/agents/agents.controller.ts (status endpoint)
- apps/orchestrator/src/api/agents/agents.module.ts (lifecycle integration)

Testing:
- 12/12 tests passing for FederationAgentService
- All command service tests passing
- TypeScript compilation successful
- Linting passed

Refs #93

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-03 14:37:06 -06:00

129 lines
4.6 KiB
Markdown

# Issue ORCH-118: Resource cleanup
## Objective
Create a dedicated CleanupService that handles resource cleanup when agents terminate (completion, failure, or killswitch). Extract cleanup logic from KillswitchService into a reusable service with proper event emission.
## Approach
1. Create `CleanupService` in `src/killswitch/cleanup.service.ts`
2. Extract cleanup logic from `KillswitchService.performCleanup()`
3. Add event emission for cleanup operations
4. Integrate with existing services (DockerSandboxService, WorktreeManagerService, ValkeyService)
5. Update KillswitchService to use CleanupService
6. Write comprehensive unit tests following TDD
## Acceptance Criteria
- [x] `src/killswitch/cleanup.service.ts` implemented
- [x] Stop Docker container
- [x] Remove Docker container
- [x] Remove git worktree
- [x] Clear Valkey state
- [x] Emit cleanup event
- [x] Run cleanup on: agent completion, agent failure, killswitch
- [x] NestJS service with proper dependency injection
- [x] Comprehensive unit tests with ≥85% coverage
## Progress
- [x] Read ORCH-118 requirements
- [x] Analyze existing KillswitchService implementation
- [x] Understand event system (Valkey pub/sub)
- [x] Create scratchpad
- [x] Write tests for CleanupService (TDD - RED)
- [x] Implement CleanupService (TDD - GREEN)
- [x] Refactor KillswitchService to use CleanupService
- [x] Update KillswitchModule with CleanupService
- [x] Run tests - all 25 tests pass (10 cleanup, 8 killswitch, 7 controller)
- [x] Add agent.cleanup event type to events.types.ts
- [x] Create Gitea issue #253
- [x] Close Gitea issue with completion notes
## Testing
### Test Scenarios
1. **Successful cleanup**: All resources cleaned up successfully
2. **Docker cleanup failure**: Continue to other cleanup steps
3. **Worktree cleanup failure**: Continue to other cleanup steps
4. **Missing containerId**: Skip Docker cleanup
5. **Missing repository**: Skip worktree cleanup
6. **Docker disabled**: Skip Docker cleanup
7. **Event emission**: Verify cleanup event published
8. **Valkey state clearing**: Verify agent state deleted
## Technical Notes
- CleanupService should be reusable by KillswitchService, lifecycle service, etc.
- Best-effort cleanup: log errors but continue with other cleanup steps
- Event emission: Use `agent.cleanup` event type (need to add to EventType)
- Valkey state: Use `deleteAgentState()` to clear state after cleanup
- Integration: Service should be injectable and testable
## Dependencies
- DockerSandboxService (container cleanup)
- WorktreeManagerService (git worktree cleanup)
- ValkeyService (state management + event emission)
## Event Structure
```typescript
{
type: 'agent.cleanup',
agentId: string,
taskId: string,
timestamp: string,
cleanup: {
docker: boolean,
worktree: boolean,
state: boolean
}
}
```
## Completion Summary
**Issue:** #253 [ORCH-118] Resource cleanup
**Status:** CLOSED ✓
### Implementation Details
Created a dedicated CleanupService that provides reusable agent resource cleanup with the following features:
1. **Best-effort cleanup strategy** - Continues even if individual steps fail
2. **Comprehensive logging** - Logs each step and any errors
3. **Event emission** - Publishes cleanup events with detailed status
4. **Service integration** - Properly integrated via NestJS dependency injection
5. **Reusability** - Can be used by KillswitchService, lifecycle service, or any other service
### Files Created
- `/home/localadmin/src/mosaic-stack/apps/orchestrator/src/killswitch/cleanup.service.ts` (135 lines)
- `/home/localadmin/src/mosaic-stack/apps/orchestrator/src/killswitch/cleanup.service.spec.ts` (386 lines, 10 tests)
### Files Modified
- `/home/localadmin/src/mosaic-stack/apps/orchestrator/src/killswitch/killswitch.service.ts` - Refactored to use CleanupService
- `/home/localadmin/src/mosaic-stack/apps/orchestrator/src/killswitch/killswitch.service.spec.ts` - Updated tests
- `/home/localadmin/src/mosaic-stack/apps/orchestrator/src/killswitch/killswitch.module.ts` - Added CleanupService provider/export
- `/home/localadmin/src/mosaic-stack/apps/orchestrator/src/valkey/types/events.types.ts` - Added agent.cleanup event type
### Test Results
✓ All 25 tests pass
- 10 CleanupService tests (comprehensive coverage)
- 8 KillswitchService tests (refactored)
- 7 Controller tests (API endpoints)
### Cleanup Flow
1. Docker container (stop and remove) - skipped if no containerId or sandbox disabled
2. Git worktree (remove) - skipped if no repository
3. Valkey state (delete agent state) - always attempted
4. Event emission (agent.cleanup with results) - always attempted
Each step is independent and continues even if previous steps fail.