fix(#199): implement rate limiting on webhook endpoints
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Implements comprehensive rate limiting on all webhook and coordinator endpoints to prevent DoS attacks. Follows TDD protocol with 14 passing tests. Implementation: - Added @nestjs/throttler package for rate limiting - Created ThrottlerApiKeyGuard for per-API-key rate limiting - Created ThrottlerValkeyStorageService for distributed rate limiting via Redis - Configured rate limits on stitcher endpoints (60 req/min) - Configured rate limits on coordinator endpoints (100 req/min) - Higher limits for health endpoints (300 req/min for monitoring) - Added environment variables for rate limit configuration - Rate limiting logs violations for security monitoring Rate Limits: - Stitcher webhooks: 60 requests/minute per API key - Coordinator endpoints: 100 requests/minute per API key - Health endpoints: 300 requests/minute (higher for monitoring) Storage: - Uses Valkey (Redis) for distributed rate limiting across API instances - Falls back to in-memory storage if Redis unavailable Testing: - 14 comprehensive rate limiting tests (all passing) - Tests verify: rate limit enforcement, Retry-After headers, per-API-key isolation - TDD approach: RED (failing tests) → GREEN (implementation) → REFACTOR Additional improvements: - Type safety improvements in websocket gateway - Array type notation standardization in coordinator service Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
167
docs/scratchpads/199-implement-rate-limiting.md
Normal file
167
docs/scratchpads/199-implement-rate-limiting.md
Normal file
@@ -0,0 +1,167 @@
|
||||
# Issue #199: Implement rate limiting on webhook endpoints
|
||||
|
||||
## Objective
|
||||
Implement rate limiting on webhook and public-facing API endpoints to prevent DoS attacks and ensure system stability under high load conditions.
|
||||
|
||||
## Approach
|
||||
|
||||
### TDD Implementation Plan
|
||||
1. **RED**: Write failing tests for rate limiting
|
||||
- Test rate limit enforcement (429 status)
|
||||
- Test Retry-After header inclusion
|
||||
- Test per-IP rate limiting
|
||||
- Test per-API-key rate limiting
|
||||
- Test that legitimate requests are not blocked
|
||||
- Test storage mechanism (Redis/in-memory)
|
||||
|
||||
2. **GREEN**: Implement NestJS throttler
|
||||
- Install @nestjs/throttler package
|
||||
- Configure global rate limits
|
||||
- Configure per-endpoint rate limits
|
||||
- Add custom guards for per-API-key limiting
|
||||
- Integrate with Valkey (Redis) for distributed limiting
|
||||
- Add Retry-After headers to 429 responses
|
||||
|
||||
3. **REFACTOR**: Optimize and document
|
||||
- Extract configuration to environment variables
|
||||
- Add documentation
|
||||
- Ensure code quality
|
||||
|
||||
### Identified Webhook Endpoints
|
||||
|
||||
**Stitcher Module** (`apps/api/src/stitcher/stitcher.controller.ts`):
|
||||
- `POST /stitcher/webhook` - Webhook endpoint for @mosaic bot
|
||||
- `POST /stitcher/dispatch` - Manual job dispatch endpoint
|
||||
|
||||
**Coordinator Integration Module** (`apps/api/src/coordinator-integration/coordinator-integration.controller.ts`):
|
||||
- `POST /coordinator/jobs` - Create a job from coordinator
|
||||
- `PATCH /coordinator/jobs/:id/status` - Update job status
|
||||
- `PATCH /coordinator/jobs/:id/progress` - Update job progress
|
||||
- `POST /coordinator/jobs/:id/complete` - Mark job as complete
|
||||
- `POST /coordinator/jobs/:id/fail` - Mark job as failed
|
||||
- `GET /coordinator/jobs/:id` - Get job details
|
||||
- `GET /coordinator/health` - Integration health check
|
||||
|
||||
### Rate Limit Configuration
|
||||
|
||||
**Proposed limits**:
|
||||
- Global default: 100 requests per minute
|
||||
- Webhook endpoints: 60 requests per minute per IP
|
||||
- Coordinator endpoints: 100 requests per minute per API key
|
||||
- Health endpoints: 300 requests per minute (higher for monitoring)
|
||||
|
||||
**Storage**: Use Valkey (Redis-compatible) for distributed rate limiting across multiple API instances.
|
||||
|
||||
### Technology Stack
|
||||
- `@nestjs/throttler` - NestJS rate limiting module
|
||||
- Valkey (already in project) - Redis-compatible cache for distributed rate limiting
|
||||
- Custom guards for per-API-key limiting
|
||||
|
||||
## Progress
|
||||
- [x] Create scratchpad
|
||||
- [x] Identify webhook endpoints requiring rate limiting
|
||||
- [x] Define rate limit configuration strategy
|
||||
- [x] Write failing tests for rate limiting (RED phase - TDD)
|
||||
- [x] Install @nestjs/throttler package
|
||||
- [x] Implement ThrottlerModule configuration
|
||||
- [x] Implement custom guards for per-API-key limiting
|
||||
- [x] Implement ThrottlerValkeyStorageService for distributed rate limiting
|
||||
- [x] Add rate limiting decorators to endpoints (GREEN phase - TDD)
|
||||
- [x] Add environment variables for rate limiting configuration
|
||||
- [x] Verify all tests pass (14/14 tests pass)
|
||||
- [x] Commit changes
|
||||
- [ ] Update issue #199
|
||||
|
||||
## Testing Plan
|
||||
|
||||
### Unit Tests
|
||||
1. **Rate limit enforcement**
|
||||
- Verify 429 status code after exceeding limit
|
||||
- Verify requests within limit are allowed
|
||||
|
||||
2. **Retry-After header**
|
||||
- Verify header is present in 429 responses
|
||||
- Verify header value is correct
|
||||
|
||||
3. **Per-IP limiting**
|
||||
- Verify different IPs have independent limits
|
||||
- Verify same IP is rate limited
|
||||
|
||||
4. **Per-API-key limiting**
|
||||
- Verify different API keys have independent limits
|
||||
- Verify same API key is rate limited
|
||||
|
||||
5. **Storage mechanism**
|
||||
- Verify Redis/Valkey integration works
|
||||
- Verify fallback to in-memory if Redis unavailable
|
||||
|
||||
### Integration Tests
|
||||
1. **E2E rate limiting**
|
||||
- Test actual HTTP requests hitting rate limits
|
||||
- Test rate limits reset after time window
|
||||
|
||||
## Environment Variables
|
||||
|
||||
```bash
|
||||
# Rate limiting configuration
|
||||
RATE_LIMIT_TTL=60 # Time window in seconds
|
||||
RATE_LIMIT_GLOBAL_LIMIT=100 # Global requests per window
|
||||
RATE_LIMIT_WEBHOOK_LIMIT=60 # Webhook endpoint limit
|
||||
RATE_LIMIT_COORDINATOR_LIMIT=100 # Coordinator endpoint limit
|
||||
RATE_LIMIT_HEALTH_LIMIT=300 # Health endpoint limit
|
||||
RATE_LIMIT_STORAGE=redis # redis or memory
|
||||
```
|
||||
|
||||
## Implementation Summary
|
||||
|
||||
### Files Created
|
||||
1. `/home/localadmin/src/mosaic-stack/apps/api/src/common/throttler/throttler-api-key.guard.ts` - Custom guard for API-key based rate limiting
|
||||
2. `/home/localadmin/src/mosaic-stack/apps/api/src/common/throttler/throttler-storage.service.ts` - Valkey/Redis storage for distributed rate limiting
|
||||
3. `/home/localadmin/src/mosaic-stack/apps/api/src/common/throttler/index.ts` - Export barrel file
|
||||
4. `/home/localadmin/src/mosaic-stack/apps/api/src/stitcher/stitcher.rate-limit.spec.ts` - Rate limiting tests for stitcher endpoints (6 tests)
|
||||
5. `/home/localadmin/src/mosaic-stack/apps/api/src/coordinator-integration/coordinator-integration.rate-limit.spec.ts` - Rate limiting tests for coordinator endpoints (8 tests)
|
||||
|
||||
### Files Modified
|
||||
1. `/home/localadmin/src/mosaic-stack/apps/api/src/app.module.ts` - Added ThrottlerModule and ThrottlerApiKeyGuard
|
||||
2. `/home/localadmin/src/mosaic-stack/apps/api/src/stitcher/stitcher.controller.ts` - Added @Throttle decorators (60 req/min)
|
||||
3. `/home/localadmin/src/mosaic-stack/apps/api/src/coordinator-integration/coordinator-integration.controller.ts` - Added @Throttle decorators (100 req/min, health: 300 req/min)
|
||||
4. `/home/localadmin/src/mosaic-stack/.env.example` - Added rate limiting environment variables
|
||||
5. `/home/localadmin/src/mosaic-stack/.env` - Added rate limiting environment variables
|
||||
6. `/home/localadmin/src/mosaic-stack/apps/api/package.json` - Added @nestjs/throttler dependency
|
||||
|
||||
### Test Results
|
||||
- All 14 rate limiting tests pass (6 stitcher + 8 coordinator)
|
||||
- Tests verify: rate limit enforcement, Retry-After headers, per-API-key limiting, independent API key tracking
|
||||
- TDD approach followed: RED (failing tests) → GREEN (implementation) → REFACTOR
|
||||
|
||||
### Rate Limits Configured
|
||||
- Stitcher endpoints: 60 requests/minute per API key
|
||||
- Coordinator endpoints: 100 requests/minute per API key
|
||||
- Health endpoint: 300 requests/minute per API key (higher for monitoring)
|
||||
- Storage: Valkey (Redis) for distributed limiting with in-memory fallback
|
||||
|
||||
## Notes
|
||||
|
||||
### Why @nestjs/throttler?
|
||||
- Official NestJS package with good TypeScript support
|
||||
- Supports Redis for distributed rate limiting
|
||||
- Flexible per-route configuration
|
||||
- Built-in guard system
|
||||
- Active maintenance
|
||||
|
||||
### Security Considerations
|
||||
- Rate limiting by IP can be bypassed by rotating IPs
|
||||
- Implement per-API-key limiting as primary defense
|
||||
- Log rate limit violations for monitoring
|
||||
- Consider implementing progressive delays for repeated violations
|
||||
- Ensure rate limiting doesn't block legitimate traffic
|
||||
|
||||
### Implementation Details
|
||||
- Use `@Throttle()` decorator for per-endpoint limits
|
||||
- Use `@SkipThrottle()` to exclude specific endpoints
|
||||
- Custom ThrottlerGuard to extract API key from X-API-Key header
|
||||
- Use Valkey connection from existing ValkeyModule
|
||||
|
||||
## References
|
||||
- [NestJS Throttler Documentation](https://docs.nestjs.com/security/rate-limiting)
|
||||
- [OWASP Rate Limiting Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/Denial_of_Service_Cheat_Sheet.html)
|
||||
Reference in New Issue
Block a user