Implements FED-010: Agent Spawn via Federation feature that enables spawning and managing Claude agents on remote federated Mosaic Stack instances via COMMAND message type. Features: - Federation agent command types (spawn, status, kill) - FederationAgentService for handling agent operations - Integration with orchestrator's agent spawner/lifecycle services - API endpoints for spawning, querying status, and killing agents - Full command routing through federation COMMAND infrastructure - Comprehensive test coverage (12/12 tests passing) Architecture: - Hub → Spoke: Spawn agents on remote instances - Command flow: FederationController → FederationAgentService → CommandService → Remote Orchestrator - Response handling: Remote orchestrator returns agent status/results - Security: Connection validation, signature verification Files created: - apps/api/src/federation/types/federation-agent.types.ts - apps/api/src/federation/federation-agent.service.ts - apps/api/src/federation/federation-agent.service.spec.ts Files modified: - apps/api/src/federation/command.service.ts (agent command routing) - apps/api/src/federation/federation.controller.ts (agent endpoints) - apps/api/src/federation/federation.module.ts (service registration) - apps/orchestrator/src/api/agents/agents.controller.ts (status endpoint) - apps/orchestrator/src/api/agents/agents.module.ts (lifecycle integration) Testing: - 12/12 tests passing for FederationAgentService - All command service tests passing - TypeScript compilation successful - Linting passed Refs #93 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
185 lines
7.1 KiB
Markdown
185 lines
7.1 KiB
Markdown
# Issue #199: Implement rate limiting on webhook endpoints
|
|
|
|
## Objective
|
|
|
|
Implement rate limiting on webhook and public-facing API endpoints to prevent DoS attacks and ensure system stability under high load conditions.
|
|
|
|
## Approach
|
|
|
|
### TDD Implementation Plan
|
|
|
|
1. **RED**: Write failing tests for rate limiting
|
|
- Test rate limit enforcement (429 status)
|
|
- Test Retry-After header inclusion
|
|
- Test per-IP rate limiting
|
|
- Test per-API-key rate limiting
|
|
- Test that legitimate requests are not blocked
|
|
- Test storage mechanism (Redis/in-memory)
|
|
|
|
2. **GREEN**: Implement NestJS throttler
|
|
- Install @nestjs/throttler package
|
|
- Configure global rate limits
|
|
- Configure per-endpoint rate limits
|
|
- Add custom guards for per-API-key limiting
|
|
- Integrate with Valkey (Redis) for distributed limiting
|
|
- Add Retry-After headers to 429 responses
|
|
|
|
3. **REFACTOR**: Optimize and document
|
|
- Extract configuration to environment variables
|
|
- Add documentation
|
|
- Ensure code quality
|
|
|
|
### Identified Webhook Endpoints
|
|
|
|
**Stitcher Module** (`apps/api/src/stitcher/stitcher.controller.ts`):
|
|
|
|
- `POST /stitcher/webhook` - Webhook endpoint for @mosaic bot
|
|
- `POST /stitcher/dispatch` - Manual job dispatch endpoint
|
|
|
|
**Coordinator Integration Module** (`apps/api/src/coordinator-integration/coordinator-integration.controller.ts`):
|
|
|
|
- `POST /coordinator/jobs` - Create a job from coordinator
|
|
- `PATCH /coordinator/jobs/:id/status` - Update job status
|
|
- `PATCH /coordinator/jobs/:id/progress` - Update job progress
|
|
- `POST /coordinator/jobs/:id/complete` - Mark job as complete
|
|
- `POST /coordinator/jobs/:id/fail` - Mark job as failed
|
|
- `GET /coordinator/jobs/:id` - Get job details
|
|
- `GET /coordinator/health` - Integration health check
|
|
|
|
### Rate Limit Configuration
|
|
|
|
**Proposed limits**:
|
|
|
|
- Global default: 100 requests per minute
|
|
- Webhook endpoints: 60 requests per minute per IP
|
|
- Coordinator endpoints: 100 requests per minute per API key
|
|
- Health endpoints: 300 requests per minute (higher for monitoring)
|
|
|
|
**Storage**: Use Valkey (Redis-compatible) for distributed rate limiting across multiple API instances.
|
|
|
|
### Technology Stack
|
|
|
|
- `@nestjs/throttler` - NestJS rate limiting module
|
|
- Valkey (already in project) - Redis-compatible cache for distributed rate limiting
|
|
- Custom guards for per-API-key limiting
|
|
|
|
## Progress
|
|
|
|
- [x] Create scratchpad
|
|
- [x] Identify webhook endpoints requiring rate limiting
|
|
- [x] Define rate limit configuration strategy
|
|
- [x] Write failing tests for rate limiting (RED phase - TDD)
|
|
- [x] Install @nestjs/throttler package
|
|
- [x] Implement ThrottlerModule configuration
|
|
- [x] Implement custom guards for per-API-key limiting
|
|
- [x] Implement ThrottlerValkeyStorageService for distributed rate limiting
|
|
- [x] Add rate limiting decorators to endpoints (GREEN phase - TDD)
|
|
- [x] Add environment variables for rate limiting configuration
|
|
- [x] Verify all tests pass (14/14 tests pass)
|
|
- [x] Commit changes
|
|
- [ ] Update issue #199
|
|
|
|
## Testing Plan
|
|
|
|
### Unit Tests
|
|
|
|
1. **Rate limit enforcement**
|
|
- Verify 429 status code after exceeding limit
|
|
- Verify requests within limit are allowed
|
|
|
|
2. **Retry-After header**
|
|
- Verify header is present in 429 responses
|
|
- Verify header value is correct
|
|
|
|
3. **Per-IP limiting**
|
|
- Verify different IPs have independent limits
|
|
- Verify same IP is rate limited
|
|
|
|
4. **Per-API-key limiting**
|
|
- Verify different API keys have independent limits
|
|
- Verify same API key is rate limited
|
|
|
|
5. **Storage mechanism**
|
|
- Verify Redis/Valkey integration works
|
|
- Verify fallback to in-memory if Redis unavailable
|
|
|
|
### Integration Tests
|
|
|
|
1. **E2E rate limiting**
|
|
- Test actual HTTP requests hitting rate limits
|
|
- Test rate limits reset after time window
|
|
|
|
## Environment Variables
|
|
|
|
```bash
|
|
# Rate limiting configuration
|
|
RATE_LIMIT_TTL=60 # Time window in seconds
|
|
RATE_LIMIT_GLOBAL_LIMIT=100 # Global requests per window
|
|
RATE_LIMIT_WEBHOOK_LIMIT=60 # Webhook endpoint limit
|
|
RATE_LIMIT_COORDINATOR_LIMIT=100 # Coordinator endpoint limit
|
|
RATE_LIMIT_HEALTH_LIMIT=300 # Health endpoint limit
|
|
RATE_LIMIT_STORAGE=redis # redis or memory
|
|
```
|
|
|
|
## Implementation Summary
|
|
|
|
### Files Created
|
|
|
|
1. `/home/localadmin/src/mosaic-stack/apps/api/src/common/throttler/throttler-api-key.guard.ts` - Custom guard for API-key based rate limiting
|
|
2. `/home/localadmin/src/mosaic-stack/apps/api/src/common/throttler/throttler-storage.service.ts` - Valkey/Redis storage for distributed rate limiting
|
|
3. `/home/localadmin/src/mosaic-stack/apps/api/src/common/throttler/index.ts` - Export barrel file
|
|
4. `/home/localadmin/src/mosaic-stack/apps/api/src/stitcher/stitcher.rate-limit.spec.ts` - Rate limiting tests for stitcher endpoints (6 tests)
|
|
5. `/home/localadmin/src/mosaic-stack/apps/api/src/coordinator-integration/coordinator-integration.rate-limit.spec.ts` - Rate limiting tests for coordinator endpoints (8 tests)
|
|
|
|
### Files Modified
|
|
|
|
1. `/home/localadmin/src/mosaic-stack/apps/api/src/app.module.ts` - Added ThrottlerModule and ThrottlerApiKeyGuard
|
|
2. `/home/localadmin/src/mosaic-stack/apps/api/src/stitcher/stitcher.controller.ts` - Added @Throttle decorators (60 req/min)
|
|
3. `/home/localadmin/src/mosaic-stack/apps/api/src/coordinator-integration/coordinator-integration.controller.ts` - Added @Throttle decorators (100 req/min, health: 300 req/min)
|
|
4. `/home/localadmin/src/mosaic-stack/.env.example` - Added rate limiting environment variables
|
|
5. `/home/localadmin/src/mosaic-stack/.env` - Added rate limiting environment variables
|
|
6. `/home/localadmin/src/mosaic-stack/apps/api/package.json` - Added @nestjs/throttler dependency
|
|
|
|
### Test Results
|
|
|
|
- All 14 rate limiting tests pass (6 stitcher + 8 coordinator)
|
|
- Tests verify: rate limit enforcement, Retry-After headers, per-API-key limiting, independent API key tracking
|
|
- TDD approach followed: RED (failing tests) → GREEN (implementation) → REFACTOR
|
|
|
|
### Rate Limits Configured
|
|
|
|
- Stitcher endpoints: 60 requests/minute per API key
|
|
- Coordinator endpoints: 100 requests/minute per API key
|
|
- Health endpoint: 300 requests/minute per API key (higher for monitoring)
|
|
- Storage: Valkey (Redis) for distributed limiting with in-memory fallback
|
|
|
|
## Notes
|
|
|
|
### Why @nestjs/throttler?
|
|
|
|
- Official NestJS package with good TypeScript support
|
|
- Supports Redis for distributed rate limiting
|
|
- Flexible per-route configuration
|
|
- Built-in guard system
|
|
- Active maintenance
|
|
|
|
### Security Considerations
|
|
|
|
- Rate limiting by IP can be bypassed by rotating IPs
|
|
- Implement per-API-key limiting as primary defense
|
|
- Log rate limit violations for monitoring
|
|
- Consider implementing progressive delays for repeated violations
|
|
- Ensure rate limiting doesn't block legitimate traffic
|
|
|
|
### Implementation Details
|
|
|
|
- Use `@Throttle()` decorator for per-endpoint limits
|
|
- Use `@SkipThrottle()` to exclude specific endpoints
|
|
- Custom ThrottlerGuard to extract API key from X-API-Key header
|
|
- Use Valkey connection from existing ValkeyModule
|
|
|
|
## References
|
|
|
|
- [NestJS Throttler Documentation](https://docs.nestjs.com/security/rate-limiting)
|
|
- [OWASP Rate Limiting Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/Denial_of_Service_Cheat_Sheet.html)
|