fix(#199): implement rate limiting on webhook endpoints

Implements comprehensive rate limiting on all webhook and coordinator endpoints to prevent DoS attacks. Follows TDD protocol with 14 passing tests. Implementation: - Added @nestjs/throttler package for rate limiting - Created ThrottlerApiKeyGuard for per-API-key rate limiting - Created ThrottlerValkeyStorageService for distributed rate limiting via Redis - Configured rate limits on stitcher endpoints (60 req/min) - Configured rate limits on coordinator endpoints (100 req/min) - Higher limits for health endpoints (300 req/min for monitoring) - Added environment variables for rate limit configuration - Rate limiting logs violations for security monitoring Rate Limits: - Stitcher webhooks: 60 requests/minute per API key - Coordinator endpoints: 100 requests/minute per API key - Health endpoints: 300 requests/minute (higher for monitoring) Storage: - Uses Valkey (Redis) for distributed rate limiting across API instances - Falls back to in-memory storage if Redis unavailable Testing: - 14 comprehensive rate limiting tests (all passing) - Tests verify: rate limit enforcement, Retry-After headers, per-API-key isolation - TDD approach: RED (failing tests) → GREEN (implementation) → REFACTOR Additional improvements: - Type safety improvements in websocket gateway - Array type notation standardization in coordinator service Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-02 13:07:16 -06:00
parent 210b3d2e8f
commit 41d56dadf0
14 changed files with 990 additions and 11 deletions
--- a/.env.example
+++ b/.env.example
@@ -170,6 +170,30 @@ GITEA_WEBHOOK_SECRET=REPLACE_WITH_RANDOM_WEBHOOK_SECRET
 # The coordinator service uses this key to authenticate with the API
 COORDINATOR_API_KEY=REPLACE_WITH_RANDOM_API_KEY_MINIMUM_32_CHARS

+# ======================
+# Rate Limiting
+# ======================
+# Rate limiting prevents DoS attacks on webhook and API endpoints
+# TTL is in seconds, limits are per TTL window
+
+# Global rate limit (applies to all endpoints unless overridden)
+RATE_LIMIT_TTL=60                    # Time window in seconds
+RATE_LIMIT_GLOBAL_LIMIT=100          # Requests per window
+
+# Webhook endpoints (/stitcher/webhook, /stitcher/dispatch)
+RATE_LIMIT_WEBHOOK_LIMIT=60          # Requests per minute
+
+# Coordinator endpoints (/coordinator/*)
+RATE_LIMIT_COORDINATOR_LIMIT=100     # Requests per minute
+
+# Health check endpoints (/coordinator/health)
+RATE_LIMIT_HEALTH_LIMIT=300          # Requests per minute (higher for monitoring)
+
+# Storage backend for rate limiting (redis or memory)
+# redis: Uses Valkey for distributed rate limiting (recommended for production)
+# memory: Uses in-memory storage (single instance only, for development)
+RATE_LIMIT_STORAGE=redis
+
 # ======================
 # Discord Bridge (Optional)
 # ======================