Implements FED-010: Agent Spawn via Federation feature that enables spawning and managing Claude agents on remote federated Mosaic Stack instances via COMMAND message type. Features: - Federation agent command types (spawn, status, kill) - FederationAgentService for handling agent operations - Integration with orchestrator's agent spawner/lifecycle services - API endpoints for spawning, querying status, and killing agents - Full command routing through federation COMMAND infrastructure - Comprehensive test coverage (12/12 tests passing) Architecture: - Hub → Spoke: Spawn agents on remote instances - Command flow: FederationController → FederationAgentService → CommandService → Remote Orchestrator - Response handling: Remote orchestrator returns agent status/results - Security: Connection validation, signature verification Files created: - apps/api/src/federation/types/federation-agent.types.ts - apps/api/src/federation/federation-agent.service.ts - apps/api/src/federation/federation-agent.service.spec.ts Files modified: - apps/api/src/federation/command.service.ts (agent command routing) - apps/api/src/federation/federation.controller.ts (agent endpoints) - apps/api/src/federation/federation.module.ts (service registration) - apps/orchestrator/src/api/agents/agents.controller.ts (status endpoint) - apps/orchestrator/src/api/agents/agents.module.ts (lifecycle integration) Testing: - 12/12 tests passing for FederationAgentService - All command service tests passing - TypeScript compilation successful - Linting passed Refs #93 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
7.1 KiB
7.1 KiB
Issue #199: Implement rate limiting on webhook endpoints
Objective
Implement rate limiting on webhook and public-facing API endpoints to prevent DoS attacks and ensure system stability under high load conditions.
Approach
TDD Implementation Plan
-
RED: Write failing tests for rate limiting
- Test rate limit enforcement (429 status)
- Test Retry-After header inclusion
- Test per-IP rate limiting
- Test per-API-key rate limiting
- Test that legitimate requests are not blocked
- Test storage mechanism (Redis/in-memory)
-
GREEN: Implement NestJS throttler
- Install @nestjs/throttler package
- Configure global rate limits
- Configure per-endpoint rate limits
- Add custom guards for per-API-key limiting
- Integrate with Valkey (Redis) for distributed limiting
- Add Retry-After headers to 429 responses
-
REFACTOR: Optimize and document
- Extract configuration to environment variables
- Add documentation
- Ensure code quality
Identified Webhook Endpoints
Stitcher Module (apps/api/src/stitcher/stitcher.controller.ts):
POST /stitcher/webhook- Webhook endpoint for @mosaic botPOST /stitcher/dispatch- Manual job dispatch endpoint
Coordinator Integration Module (apps/api/src/coordinator-integration/coordinator-integration.controller.ts):
POST /coordinator/jobs- Create a job from coordinatorPATCH /coordinator/jobs/:id/status- Update job statusPATCH /coordinator/jobs/:id/progress- Update job progressPOST /coordinator/jobs/:id/complete- Mark job as completePOST /coordinator/jobs/:id/fail- Mark job as failedGET /coordinator/jobs/:id- Get job detailsGET /coordinator/health- Integration health check
Rate Limit Configuration
Proposed limits:
- Global default: 100 requests per minute
- Webhook endpoints: 60 requests per minute per IP
- Coordinator endpoints: 100 requests per minute per API key
- Health endpoints: 300 requests per minute (higher for monitoring)
Storage: Use Valkey (Redis-compatible) for distributed rate limiting across multiple API instances.
Technology Stack
@nestjs/throttler- NestJS rate limiting module- Valkey (already in project) - Redis-compatible cache for distributed rate limiting
- Custom guards for per-API-key limiting
Progress
- Create scratchpad
- Identify webhook endpoints requiring rate limiting
- Define rate limit configuration strategy
- Write failing tests for rate limiting (RED phase - TDD)
- Install @nestjs/throttler package
- Implement ThrottlerModule configuration
- Implement custom guards for per-API-key limiting
- Implement ThrottlerValkeyStorageService for distributed rate limiting
- Add rate limiting decorators to endpoints (GREEN phase - TDD)
- Add environment variables for rate limiting configuration
- Verify all tests pass (14/14 tests pass)
- Commit changes
- Update issue #199
Testing Plan
Unit Tests
-
Rate limit enforcement
- Verify 429 status code after exceeding limit
- Verify requests within limit are allowed
-
Retry-After header
- Verify header is present in 429 responses
- Verify header value is correct
-
Per-IP limiting
- Verify different IPs have independent limits
- Verify same IP is rate limited
-
Per-API-key limiting
- Verify different API keys have independent limits
- Verify same API key is rate limited
-
Storage mechanism
- Verify Redis/Valkey integration works
- Verify fallback to in-memory if Redis unavailable
Integration Tests
- E2E rate limiting
- Test actual HTTP requests hitting rate limits
- Test rate limits reset after time window
Environment Variables
# Rate limiting configuration
RATE_LIMIT_TTL=60 # Time window in seconds
RATE_LIMIT_GLOBAL_LIMIT=100 # Global requests per window
RATE_LIMIT_WEBHOOK_LIMIT=60 # Webhook endpoint limit
RATE_LIMIT_COORDINATOR_LIMIT=100 # Coordinator endpoint limit
RATE_LIMIT_HEALTH_LIMIT=300 # Health endpoint limit
RATE_LIMIT_STORAGE=redis # redis or memory
Implementation Summary
Files Created
/home/localadmin/src/mosaic-stack/apps/api/src/common/throttler/throttler-api-key.guard.ts- Custom guard for API-key based rate limiting/home/localadmin/src/mosaic-stack/apps/api/src/common/throttler/throttler-storage.service.ts- Valkey/Redis storage for distributed rate limiting/home/localadmin/src/mosaic-stack/apps/api/src/common/throttler/index.ts- Export barrel file/home/localadmin/src/mosaic-stack/apps/api/src/stitcher/stitcher.rate-limit.spec.ts- Rate limiting tests for stitcher endpoints (6 tests)/home/localadmin/src/mosaic-stack/apps/api/src/coordinator-integration/coordinator-integration.rate-limit.spec.ts- Rate limiting tests for coordinator endpoints (8 tests)
Files Modified
/home/localadmin/src/mosaic-stack/apps/api/src/app.module.ts- Added ThrottlerModule and ThrottlerApiKeyGuard/home/localadmin/src/mosaic-stack/apps/api/src/stitcher/stitcher.controller.ts- Added @Throttle decorators (60 req/min)/home/localadmin/src/mosaic-stack/apps/api/src/coordinator-integration/coordinator-integration.controller.ts- Added @Throttle decorators (100 req/min, health: 300 req/min)/home/localadmin/src/mosaic-stack/.env.example- Added rate limiting environment variables/home/localadmin/src/mosaic-stack/.env- Added rate limiting environment variables/home/localadmin/src/mosaic-stack/apps/api/package.json- Added @nestjs/throttler dependency
Test Results
- All 14 rate limiting tests pass (6 stitcher + 8 coordinator)
- Tests verify: rate limit enforcement, Retry-After headers, per-API-key limiting, independent API key tracking
- TDD approach followed: RED (failing tests) → GREEN (implementation) → REFACTOR
Rate Limits Configured
- Stitcher endpoints: 60 requests/minute per API key
- Coordinator endpoints: 100 requests/minute per API key
- Health endpoint: 300 requests/minute per API key (higher for monitoring)
- Storage: Valkey (Redis) for distributed limiting with in-memory fallback
Notes
Why @nestjs/throttler?
- Official NestJS package with good TypeScript support
- Supports Redis for distributed rate limiting
- Flexible per-route configuration
- Built-in guard system
- Active maintenance
Security Considerations
- Rate limiting by IP can be bypassed by rotating IPs
- Implement per-API-key limiting as primary defense
- Log rate limit violations for monitoring
- Consider implementing progressive delays for repeated violations
- Ensure rate limiting doesn't block legitimate traffic
Implementation Details
- Use
@Throttle()decorator for per-endpoint limits - Use
@SkipThrottle()to exclude specific endpoints - Custom ThrottlerGuard to extract API key from X-API-Key header
- Use Valkey connection from existing ValkeyModule