Files
stack/docs/scratchpads/199-implement-rate-limiting.md
Jason Woltje 41d56dadf0
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
fix(#199): implement rate limiting on webhook endpoints
Implements comprehensive rate limiting on all webhook and coordinator endpoints
to prevent DoS attacks. Follows TDD protocol with 14 passing tests.

Implementation:
- Added @nestjs/throttler package for rate limiting
- Created ThrottlerApiKeyGuard for per-API-key rate limiting
- Created ThrottlerValkeyStorageService for distributed rate limiting via Redis
- Configured rate limits on stitcher endpoints (60 req/min)
- Configured rate limits on coordinator endpoints (100 req/min)
- Higher limits for health endpoints (300 req/min for monitoring)
- Added environment variables for rate limit configuration
- Rate limiting logs violations for security monitoring

Rate Limits:
- Stitcher webhooks: 60 requests/minute per API key
- Coordinator endpoints: 100 requests/minute per API key
- Health endpoints: 300 requests/minute (higher for monitoring)

Storage:
- Uses Valkey (Redis) for distributed rate limiting across API instances
- Falls back to in-memory storage if Redis unavailable

Testing:
- 14 comprehensive rate limiting tests (all passing)
- Tests verify: rate limit enforcement, Retry-After headers, per-API-key isolation
- TDD approach: RED (failing tests) → GREEN (implementation) → REFACTOR

Additional improvements:
- Type safety improvements in websocket gateway
- Array type notation standardization in coordinator service

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-02 13:07:16 -06:00

7.1 KiB

Issue #199: Implement rate limiting on webhook endpoints

Objective

Implement rate limiting on webhook and public-facing API endpoints to prevent DoS attacks and ensure system stability under high load conditions.

Approach

TDD Implementation Plan

  1. RED: Write failing tests for rate limiting

    • Test rate limit enforcement (429 status)
    • Test Retry-After header inclusion
    • Test per-IP rate limiting
    • Test per-API-key rate limiting
    • Test that legitimate requests are not blocked
    • Test storage mechanism (Redis/in-memory)
  2. GREEN: Implement NestJS throttler

    • Install @nestjs/throttler package
    • Configure global rate limits
    • Configure per-endpoint rate limits
    • Add custom guards for per-API-key limiting
    • Integrate with Valkey (Redis) for distributed limiting
    • Add Retry-After headers to 429 responses
  3. REFACTOR: Optimize and document

    • Extract configuration to environment variables
    • Add documentation
    • Ensure code quality

Identified Webhook Endpoints

Stitcher Module (apps/api/src/stitcher/stitcher.controller.ts):

  • POST /stitcher/webhook - Webhook endpoint for @mosaic bot
  • POST /stitcher/dispatch - Manual job dispatch endpoint

Coordinator Integration Module (apps/api/src/coordinator-integration/coordinator-integration.controller.ts):

  • POST /coordinator/jobs - Create a job from coordinator
  • PATCH /coordinator/jobs/:id/status - Update job status
  • PATCH /coordinator/jobs/:id/progress - Update job progress
  • POST /coordinator/jobs/:id/complete - Mark job as complete
  • POST /coordinator/jobs/:id/fail - Mark job as failed
  • GET /coordinator/jobs/:id - Get job details
  • GET /coordinator/health - Integration health check

Rate Limit Configuration

Proposed limits:

  • Global default: 100 requests per minute
  • Webhook endpoints: 60 requests per minute per IP
  • Coordinator endpoints: 100 requests per minute per API key
  • Health endpoints: 300 requests per minute (higher for monitoring)

Storage: Use Valkey (Redis-compatible) for distributed rate limiting across multiple API instances.

Technology Stack

  • @nestjs/throttler - NestJS rate limiting module
  • Valkey (already in project) - Redis-compatible cache for distributed rate limiting
  • Custom guards for per-API-key limiting

Progress

  • Create scratchpad
  • Identify webhook endpoints requiring rate limiting
  • Define rate limit configuration strategy
  • Write failing tests for rate limiting (RED phase - TDD)
  • Install @nestjs/throttler package
  • Implement ThrottlerModule configuration
  • Implement custom guards for per-API-key limiting
  • Implement ThrottlerValkeyStorageService for distributed rate limiting
  • Add rate limiting decorators to endpoints (GREEN phase - TDD)
  • Add environment variables for rate limiting configuration
  • Verify all tests pass (14/14 tests pass)
  • Commit changes
  • Update issue #199

Testing Plan

Unit Tests

  1. Rate limit enforcement

    • Verify 429 status code after exceeding limit
    • Verify requests within limit are allowed
  2. Retry-After header

    • Verify header is present in 429 responses
    • Verify header value is correct
  3. Per-IP limiting

    • Verify different IPs have independent limits
    • Verify same IP is rate limited
  4. Per-API-key limiting

    • Verify different API keys have independent limits
    • Verify same API key is rate limited
  5. Storage mechanism

    • Verify Redis/Valkey integration works
    • Verify fallback to in-memory if Redis unavailable

Integration Tests

  1. E2E rate limiting
    • Test actual HTTP requests hitting rate limits
    • Test rate limits reset after time window

Environment Variables

# Rate limiting configuration
RATE_LIMIT_TTL=60                    # Time window in seconds
RATE_LIMIT_GLOBAL_LIMIT=100          # Global requests per window
RATE_LIMIT_WEBHOOK_LIMIT=60          # Webhook endpoint limit
RATE_LIMIT_COORDINATOR_LIMIT=100     # Coordinator endpoint limit
RATE_LIMIT_HEALTH_LIMIT=300          # Health endpoint limit
RATE_LIMIT_STORAGE=redis             # redis or memory

Implementation Summary

Files Created

  1. /home/localadmin/src/mosaic-stack/apps/api/src/common/throttler/throttler-api-key.guard.ts - Custom guard for API-key based rate limiting
  2. /home/localadmin/src/mosaic-stack/apps/api/src/common/throttler/throttler-storage.service.ts - Valkey/Redis storage for distributed rate limiting
  3. /home/localadmin/src/mosaic-stack/apps/api/src/common/throttler/index.ts - Export barrel file
  4. /home/localadmin/src/mosaic-stack/apps/api/src/stitcher/stitcher.rate-limit.spec.ts - Rate limiting tests for stitcher endpoints (6 tests)
  5. /home/localadmin/src/mosaic-stack/apps/api/src/coordinator-integration/coordinator-integration.rate-limit.spec.ts - Rate limiting tests for coordinator endpoints (8 tests)

Files Modified

  1. /home/localadmin/src/mosaic-stack/apps/api/src/app.module.ts - Added ThrottlerModule and ThrottlerApiKeyGuard
  2. /home/localadmin/src/mosaic-stack/apps/api/src/stitcher/stitcher.controller.ts - Added @Throttle decorators (60 req/min)
  3. /home/localadmin/src/mosaic-stack/apps/api/src/coordinator-integration/coordinator-integration.controller.ts - Added @Throttle decorators (100 req/min, health: 300 req/min)
  4. /home/localadmin/src/mosaic-stack/.env.example - Added rate limiting environment variables
  5. /home/localadmin/src/mosaic-stack/.env - Added rate limiting environment variables
  6. /home/localadmin/src/mosaic-stack/apps/api/package.json - Added @nestjs/throttler dependency

Test Results

  • All 14 rate limiting tests pass (6 stitcher + 8 coordinator)
  • Tests verify: rate limit enforcement, Retry-After headers, per-API-key limiting, independent API key tracking
  • TDD approach followed: RED (failing tests) → GREEN (implementation) → REFACTOR

Rate Limits Configured

  • Stitcher endpoints: 60 requests/minute per API key
  • Coordinator endpoints: 100 requests/minute per API key
  • Health endpoint: 300 requests/minute per API key (higher for monitoring)
  • Storage: Valkey (Redis) for distributed limiting with in-memory fallback

Notes

Why @nestjs/throttler?

  • Official NestJS package with good TypeScript support
  • Supports Redis for distributed rate limiting
  • Flexible per-route configuration
  • Built-in guard system
  • Active maintenance

Security Considerations

  • Rate limiting by IP can be bypassed by rotating IPs
  • Implement per-API-key limiting as primary defense
  • Log rate limit violations for monitoring
  • Consider implementing progressive delays for repeated violations
  • Ensure rate limiting doesn't block legitimate traffic

Implementation Details

  • Use @Throttle() decorator for per-endpoint limits
  • Use @SkipThrottle() to exclude specific endpoints
  • Custom ThrottlerGuard to extract API key from X-API-Key header
  • Use Valkey connection from existing ValkeyModule

References