Files
stack/docs/scratchpads/orch-117-killswitch.md
Jason Woltje 12abdfe81d feat(#93): implement agent spawn via federation
Implements FED-010: Agent Spawn via Federation feature that enables
spawning and managing Claude agents on remote federated Mosaic Stack
instances via COMMAND message type.

Features:
- Federation agent command types (spawn, status, kill)
- FederationAgentService for handling agent operations
- Integration with orchestrator's agent spawner/lifecycle services
- API endpoints for spawning, querying status, and killing agents
- Full command routing through federation COMMAND infrastructure
- Comprehensive test coverage (12/12 tests passing)

Architecture:
- Hub → Spoke: Spawn agents on remote instances
- Command flow: FederationController → FederationAgentService →
  CommandService → Remote Orchestrator
- Response handling: Remote orchestrator returns agent status/results
- Security: Connection validation, signature verification

Files created:
- apps/api/src/federation/types/federation-agent.types.ts
- apps/api/src/federation/federation-agent.service.ts
- apps/api/src/federation/federation-agent.service.spec.ts

Files modified:
- apps/api/src/federation/command.service.ts (agent command routing)
- apps/api/src/federation/federation.controller.ts (agent endpoints)
- apps/api/src/federation/federation.module.ts (service registration)
- apps/orchestrator/src/api/agents/agents.controller.ts (status endpoint)
- apps/orchestrator/src/api/agents/agents.module.ts (lifecycle integration)

Testing:
- 12/12 tests passing for FederationAgentService
- All command service tests passing
- TypeScript compilation successful
- Linting passed

Refs #93

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-03 14:37:06 -06:00

103 lines
3.7 KiB
Markdown

# Issue ORCH-117: Killswitch Implementation
## Objective
Implement emergency stop functionality to kill single agent or all agents immediately, with proper cleanup of Docker containers, git worktrees, and state updates.
## Approach
1. Create KillswitchService with methods:
- `killAgent(agentId)` - Kill single agent
- `killAllAgents()` - Kill all active agents
2. Implement cleanup orchestration:
- Immediate termination (SIGKILL)
- Cleanup Docker containers (via DockerSandboxService)
- Cleanup git worktrees (via WorktreeManagerService)
- Update agent state to 'killed' (via AgentLifecycleService)
- Audit trail logging
3. Add API endpoints to AgentsController:
- POST /agents/:agentId/kill
- POST /agents/kill-all
4. Follow TDD: write tests first, then implementation
5. Ensure test coverage >= 85%
## Progress
- [x] Read ORCH-117 requirements
- [x] Understand existing service interfaces
- [x] Create scratchpad
- [x] Write killswitch.service.spec.ts tests (13 tests)
- [x] Implement killswitch.service.ts
- [x] Add controller endpoints (POST /agents/:agentId/kill, POST /agents/kill-all)
- [x] Write controller tests (7 tests)
- [x] Update killswitch.module.ts
- [x] Verify test coverage (100% statements, 85% branches, 100% functions)
- [x] Create Gitea issue
- [x] Close Gitea issue
## Testing
Following TDD (Red-Green-Refactor):
1. RED: Write failing tests for killswitch functionality
2. GREEN: Implement minimal code to pass tests
3. REFACTOR: Clean up implementation
Test coverage areas:
- Single agent kill with successful cleanup
- Kill all agents
- Error handling for non-existent agents
- Partial cleanup failures (Docker but not worktree)
- Audit logging verification
## Notes
- Killswitch bypasses all queues - must respond within seconds
- Cleanup should be best-effort (log failures but continue)
- State transition to 'killed' enforced by AgentLifecycleService
- Need to handle agents in different states (spawning, running)
- Docker containers may not exist if sandbox is disabled
## Implementation Summary
### Files Created
1. `/home/localadmin/src/mosaic-stack/apps/orchestrator/src/killswitch/killswitch.service.ts`
- `killAgent(agentId)` - Kill single agent with full cleanup
- `killAllAgents()` - Kill all active agents
- Best-effort cleanup: Docker containers, git worktrees
- Audit trail logging for all killswitch operations
2. `/home/localadmin/src/mosaic-stack/apps/orchestrator/src/killswitch/killswitch.service.spec.ts`
- 13 comprehensive tests covering all scenarios
- 100% code coverage (statements, functions, lines)
- 85% branch coverage
3. `/home/localadmin/src/mosaic-stack/apps/orchestrator/src/api/agents/agents-killswitch.controller.spec.ts`
- 7 controller tests for killswitch endpoints
- Full coverage of success and error paths
### Files Modified
1. `/home/localadmin/src/mosaic-stack/apps/orchestrator/src/killswitch/killswitch.module.ts`
- Added KillswitchService provider
- Imported SpawnerModule, GitModule, ValkeyModule
- Exported KillswitchService for use in controllers
2. `/home/localadmin/src/mosaic-stack/apps/orchestrator/src/api/agents/agents.controller.ts`
- Added POST /agents/:agentId/kill endpoint
- Added POST /agents/kill-all endpoint
- Integrated KillswitchService
3. `/home/localadmin/src/mosaic-stack/apps/orchestrator/src/api/agents/agents.module.ts`
- Imported KillswitchModule
### Test Results
- All 20 tests passing (13 service + 7 controller)
- Killswitch service: 100% coverage
- Error handling: Properly propagates errors from state transitions
- Resilience: Continues cleanup even if Docker or worktree cleanup fails
- Filtering: Only kills active agents (spawning/running states)