Files
stack/docs/scratchpads/orch-118-cleanup.md
Jason Woltje 12abdfe81d feat(#93): implement agent spawn via federation
Implements FED-010: Agent Spawn via Federation feature that enables
spawning and managing Claude agents on remote federated Mosaic Stack
instances via COMMAND message type.

Features:
- Federation agent command types (spawn, status, kill)
- FederationAgentService for handling agent operations
- Integration with orchestrator's agent spawner/lifecycle services
- API endpoints for spawning, querying status, and killing agents
- Full command routing through federation COMMAND infrastructure
- Comprehensive test coverage (12/12 tests passing)

Architecture:
- Hub → Spoke: Spawn agents on remote instances
- Command flow: FederationController → FederationAgentService →
  CommandService → Remote Orchestrator
- Response handling: Remote orchestrator returns agent status/results
- Security: Connection validation, signature verification

Files created:
- apps/api/src/federation/types/federation-agent.types.ts
- apps/api/src/federation/federation-agent.service.ts
- apps/api/src/federation/federation-agent.service.spec.ts

Files modified:
- apps/api/src/federation/command.service.ts (agent command routing)
- apps/api/src/federation/federation.controller.ts (agent endpoints)
- apps/api/src/federation/federation.module.ts (service registration)
- apps/orchestrator/src/api/agents/agents.controller.ts (status endpoint)
- apps/orchestrator/src/api/agents/agents.module.ts (lifecycle integration)

Testing:
- 12/12 tests passing for FederationAgentService
- All command service tests passing
- TypeScript compilation successful
- Linting passed

Refs #93

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-03 14:37:06 -06:00

4.6 KiB

Issue ORCH-118: Resource cleanup

Objective

Create a dedicated CleanupService that handles resource cleanup when agents terminate (completion, failure, or killswitch). Extract cleanup logic from KillswitchService into a reusable service with proper event emission.

Approach

  1. Create CleanupService in src/killswitch/cleanup.service.ts
  2. Extract cleanup logic from KillswitchService.performCleanup()
  3. Add event emission for cleanup operations
  4. Integrate with existing services (DockerSandboxService, WorktreeManagerService, ValkeyService)
  5. Update KillswitchService to use CleanupService
  6. Write comprehensive unit tests following TDD

Acceptance Criteria

  • src/killswitch/cleanup.service.ts implemented
  • Stop Docker container
  • Remove Docker container
  • Remove git worktree
  • Clear Valkey state
  • Emit cleanup event
  • Run cleanup on: agent completion, agent failure, killswitch
  • NestJS service with proper dependency injection
  • Comprehensive unit tests with ≥85% coverage

Progress

  • Read ORCH-118 requirements
  • Analyze existing KillswitchService implementation
  • Understand event system (Valkey pub/sub)
  • Create scratchpad
  • Write tests for CleanupService (TDD - RED)
  • Implement CleanupService (TDD - GREEN)
  • Refactor KillswitchService to use CleanupService
  • Update KillswitchModule with CleanupService
  • Run tests - all 25 tests pass (10 cleanup, 8 killswitch, 7 controller)
  • Add agent.cleanup event type to events.types.ts
  • Create Gitea issue #253
  • Close Gitea issue with completion notes

Testing

Test Scenarios

  1. Successful cleanup: All resources cleaned up successfully
  2. Docker cleanup failure: Continue to other cleanup steps
  3. Worktree cleanup failure: Continue to other cleanup steps
  4. Missing containerId: Skip Docker cleanup
  5. Missing repository: Skip worktree cleanup
  6. Docker disabled: Skip Docker cleanup
  7. Event emission: Verify cleanup event published
  8. Valkey state clearing: Verify agent state deleted

Technical Notes

  • CleanupService should be reusable by KillswitchService, lifecycle service, etc.
  • Best-effort cleanup: log errors but continue with other cleanup steps
  • Event emission: Use agent.cleanup event type (need to add to EventType)
  • Valkey state: Use deleteAgentState() to clear state after cleanup
  • Integration: Service should be injectable and testable

Dependencies

  • DockerSandboxService (container cleanup)
  • WorktreeManagerService (git worktree cleanup)
  • ValkeyService (state management + event emission)

Event Structure

{
  type: 'agent.cleanup',
  agentId: string,
  taskId: string,
  timestamp: string,
  cleanup: {
    docker: boolean,
    worktree: boolean,
    state: boolean
  }
}

Completion Summary

Issue: #253 [ORCH-118] Resource cleanup Status: CLOSED ✓

Implementation Details

Created a dedicated CleanupService that provides reusable agent resource cleanup with the following features:

  1. Best-effort cleanup strategy - Continues even if individual steps fail
  2. Comprehensive logging - Logs each step and any errors
  3. Event emission - Publishes cleanup events with detailed status
  4. Service integration - Properly integrated via NestJS dependency injection
  5. Reusability - Can be used by KillswitchService, lifecycle service, or any other service

Files Created

  • /home/localadmin/src/mosaic-stack/apps/orchestrator/src/killswitch/cleanup.service.ts (135 lines)
  • /home/localadmin/src/mosaic-stack/apps/orchestrator/src/killswitch/cleanup.service.spec.ts (386 lines, 10 tests)

Files Modified

  • /home/localadmin/src/mosaic-stack/apps/orchestrator/src/killswitch/killswitch.service.ts - Refactored to use CleanupService
  • /home/localadmin/src/mosaic-stack/apps/orchestrator/src/killswitch/killswitch.service.spec.ts - Updated tests
  • /home/localadmin/src/mosaic-stack/apps/orchestrator/src/killswitch/killswitch.module.ts - Added CleanupService provider/export
  • /home/localadmin/src/mosaic-stack/apps/orchestrator/src/valkey/types/events.types.ts - Added agent.cleanup event type

Test Results

✓ All 25 tests pass

  • 10 CleanupService tests (comprehensive coverage)
  • 8 KillswitchService tests (refactored)
  • 7 Controller tests (API endpoints)

Cleanup Flow

  1. Docker container (stop and remove) - skipped if no containerId or sandbox disabled
  2. Git worktree (remove) - skipped if no repository
  3. Valkey state (delete agent state) - always attempted
  4. Event emission (agent.cleanup with results) - always attempted

Each step is independent and continues even if previous steps fail.