Implements FED-010: Agent Spawn via Federation feature that enables spawning and managing Claude agents on remote federated Mosaic Stack instances via COMMAND message type. Features: - Federation agent command types (spawn, status, kill) - FederationAgentService for handling agent operations - Integration with orchestrator's agent spawner/lifecycle services - API endpoints for spawning, querying status, and killing agents - Full command routing through federation COMMAND infrastructure - Comprehensive test coverage (12/12 tests passing) Architecture: - Hub → Spoke: Spawn agents on remote instances - Command flow: FederationController → FederationAgentService → CommandService → Remote Orchestrator - Response handling: Remote orchestrator returns agent status/results - Security: Connection validation, signature verification Files created: - apps/api/src/federation/types/federation-agent.types.ts - apps/api/src/federation/federation-agent.service.ts - apps/api/src/federation/federation-agent.service.spec.ts Files modified: - apps/api/src/federation/command.service.ts (agent command routing) - apps/api/src/federation/federation.controller.ts (agent endpoints) - apps/api/src/federation/federation.module.ts (service registration) - apps/orchestrator/src/api/agents/agents.controller.ts (status endpoint) - apps/orchestrator/src/api/agents/agents.module.ts (lifecycle integration) Testing: - 12/12 tests passing for FederationAgentService - All command service tests passing - TypeScript compilation successful - Linting passed Refs #93 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
140 lines
4.3 KiB
Markdown
140 lines
4.3 KiB
Markdown
# Issue #184: [BLOCKER] Add authentication to coordinator integration endpoints
|
|
|
|
## Objective
|
|
|
|
Add authentication to coordinator integration endpoints to prevent unauthorized access. This is a critical security vulnerability that must be fixed before deployment.
|
|
|
|
## Approach
|
|
|
|
1. Identify all coordinator integration endpoints without authentication
|
|
2. Write security tests first (TDD - RED phase)
|
|
3. Implement authentication mechanism (JWT/bearer token or API key)
|
|
4. Verify all tests pass (GREEN phase)
|
|
5. Refactor if needed while maintaining test coverage
|
|
|
|
## Progress
|
|
|
|
- [x] Create scratchpad
|
|
- [x] Investigate coordinator endpoints
|
|
- [x] Investigate stitcher endpoints
|
|
- [x] Write security tests for unauthenticated endpoints (TDD - RED)
|
|
- [x] Implement authentication (TDD - GREEN)
|
|
- [x] Verify 85% minimum coverage (95.65% achieved)
|
|
- [x] All tests pass (25 new tests passing)
|
|
- [ ] Commit changes
|
|
- [ ] Update issue status
|
|
|
|
## Findings
|
|
|
|
### Unauthenticated Endpoints
|
|
|
|
1. **CoordinatorIntegrationController** (`/coordinator/*`)
|
|
- POST /coordinator/jobs - Create job from coordinator
|
|
- PATCH /coordinator/jobs/:id/status - Update job status
|
|
- PATCH /coordinator/jobs/:id/progress - Update job progress
|
|
- POST /coordinator/jobs/:id/complete - Mark job complete
|
|
- POST /coordinator/jobs/:id/fail - Mark job failed
|
|
- GET /coordinator/jobs/:id - Get job details
|
|
- GET /coordinator/health - Health check
|
|
|
|
2. **StitcherController** (`/stitcher/*`)
|
|
- POST /stitcher/webhook - Webhook from @mosaic bot
|
|
- POST /stitcher/dispatch - Manual job dispatch
|
|
|
|
### Authentication Mechanism
|
|
|
|
**Decision: API Key Authentication**
|
|
|
|
Reasons:
|
|
|
|
- Service-to-service communication (coordinator Python app → NestJS API)
|
|
- No user context needed
|
|
- Simpler than JWT for this use case
|
|
- Consistent with MOSAIC_API_TOKEN pattern already in use
|
|
|
|
Implementation:
|
|
|
|
- Create ApiKeyGuard that checks X-API-Key header
|
|
- Add COORDINATOR_API_KEY to .env.example
|
|
- Coordinator will send this key in X-API-Key header
|
|
- Guard validates key matches COORDINATOR_API_KEY env var
|
|
|
|
## Security Review Notes
|
|
|
|
### Authentication Mechanism: API Key Guard
|
|
|
|
**Implementation:** `/apps/api/src/common/guards/api-key.guard.ts`
|
|
|
|
**Security Features:**
|
|
|
|
1. **Constant-time comparison** - Uses `crypto.timingSafeEqual` to prevent timing attacks
|
|
2. **Header case-insensitivity** - Accepts X-API-Key, x-api-key, X-Api-Key variations
|
|
3. **Empty string validation** - Rejects empty API keys
|
|
4. **Configuration validation** - Fails fast if COORDINATOR_API_KEY is not configured
|
|
5. **Clear error messages** - Differentiates between missing, invalid, and unconfigured keys
|
|
|
|
**Protected Endpoints:**
|
|
|
|
- All CoordinatorIntegrationController endpoints (`/coordinator/*`)
|
|
- All StitcherController endpoints (`/stitcher/*`)
|
|
|
|
**Environment Variable:**
|
|
|
|
- `COORDINATOR_API_KEY` - Must be at least 32 characters (recommended: `openssl rand -base64 32`)
|
|
|
|
**Testing:**
|
|
|
|
- 8 tests for ApiKeyGuard (95.65% coverage)
|
|
- 10 tests for coordinator security
|
|
- 7 tests for stitcher security
|
|
- Total: 25 new security tests
|
|
|
|
**Attack Prevention:**
|
|
|
|
- Timing attacks: Prevented via constant-time comparison
|
|
- Unauthorized access: All endpoints require valid API key
|
|
- Empty/null keys: Explicitly rejected
|
|
- Configuration errors: Server fails to start if misconfigured
|
|
|
|
## Testing
|
|
|
|
### Test Plan
|
|
|
|
1. Security tests to verify authentication is required
|
|
2. Tests to verify valid credentials are accepted
|
|
3. Tests to verify invalid credentials are rejected
|
|
4. Integration tests for end-to-end flows
|
|
|
|
### Test Results
|
|
|
|
**ApiKeyGuard Tests:** 8/8 passing (95.65% coverage)
|
|
|
|
- ✅ Valid API key accepted
|
|
- ✅ Missing API key rejected
|
|
- ✅ Invalid API key rejected
|
|
- ✅ Unconfigured API key rejected
|
|
- ✅ Case-insensitive header handling
|
|
- ✅ Empty string rejection
|
|
- ✅ Timing attack prevention
|
|
|
|
**Coordinator Security Tests:** 10/10 passing
|
|
|
|
- ✅ All endpoints require authentication
|
|
- ✅ Valid API key allows access
|
|
- ✅ Invalid API key blocks access
|
|
|
|
**Stitcher Security Tests:** 7/7 passing
|
|
|
|
- ✅ All endpoints require authentication
|
|
- ✅ Valid API key allows access
|
|
- ✅ Invalid/empty API keys blocked
|
|
- ✅ Webhook submission protection
|
|
|
|
**Existing Tests:** No regressions introduced (1420 tests still passing)
|
|
|
|
## Notes
|
|
|
|
- Priority: CRITICAL SECURITY
|
|
- Impact: Prevents unauthorized access to coordinator integration
|
|
- Coverage requirement: Minimum 85%
|