feat(gatekeeper): add PR merge automation service
Some checks failed
ci/woodpecker/push/ci Pipeline failed
Some checks failed
ci/woodpecker/push/ci Pipeline failed
This commit is contained in:
@@ -3,6 +3,7 @@
|
||||
## Objective
|
||||
|
||||
Implement rate limiting on all federation endpoints to prevent denial-of-service (DoS) attacks. Federation endpoints currently have no rate limiting, allowing attackers to:
|
||||
|
||||
- Overwhelm the server with connection requests
|
||||
- Flood token validation endpoints
|
||||
- Exhaust system resources
|
||||
@@ -12,6 +13,7 @@ Implement rate limiting on all federation endpoints to prevent denial-of-service
|
||||
**Severity:** P0 (Critical) - Blocks production deployment
|
||||
**Attack Vector:** Unauthenticated public endpoints allow unlimited requests
|
||||
**Risk:** System can be brought down by flooding requests to:
|
||||
|
||||
1. `POST /api/v1/federation/incoming/connect` (Public, no auth)
|
||||
2. `POST /api/v1/federation/auth/validate` (Public, no auth)
|
||||
3. All other endpoints (authenticated, but can be abused)
|
||||
@@ -19,15 +21,19 @@ Implement rate limiting on all federation endpoints to prevent denial-of-service
|
||||
## Approach
|
||||
|
||||
### 1. Install @nestjs/throttler
|
||||
|
||||
Use NestJS's official rate limiting package which integrates with the framework's guard system.
|
||||
|
||||
### 2. Configure Rate Limits
|
||||
|
||||
Tiered rate limiting strategy:
|
||||
|
||||
- **Public endpoints:** Strict limits (5 req/min per IP)
|
||||
- **Authenticated endpoints:** Moderate limits (20 req/min per user)
|
||||
- **Admin endpoints:** Higher limits (50 req/min per user)
|
||||
|
||||
### 3. Implementation Strategy
|
||||
|
||||
1. Add `@nestjs/throttler` dependency
|
||||
2. Configure ThrottlerModule globally
|
||||
3. Apply custom rate limits per endpoint using decorators
|
||||
@@ -53,6 +59,7 @@ Tiered rate limiting strategy:
|
||||
**COMPLETE** - Rate limiting successfully implemented on all federation endpoints.
|
||||
|
||||
**Security Impact:** MITIGATED
|
||||
|
||||
- DoS vulnerability eliminated via rate limiting
|
||||
- Public endpoints protected with strict limits (3 req/sec)
|
||||
- Authenticated endpoints have moderate limits (20 req/min)
|
||||
@@ -61,6 +68,7 @@ Tiered rate limiting strategy:
|
||||
## Baseline Quality Status
|
||||
|
||||
**Pre-existing Technical Debt** (NOT introduced by this fix):
|
||||
|
||||
- 29 TypeScript errors in apps/api (federation + runner-jobs)
|
||||
- Federation: Missing Prisma schema types (`FederationConnectionStatus`, `Instance`, `federatedIdentity`)
|
||||
- Runner Jobs: Missing `version` field in schema
|
||||
@@ -68,6 +76,7 @@ Tiered rate limiting strategy:
|
||||
- **My changes introduced 0 new errors**
|
||||
|
||||
**Quality Assessment:**
|
||||
|
||||
- ✅ Tier 1 (Baseline): No regression (error count unchanged)
|
||||
- ✅ Tier 2 (Modified Files): 0 new errors in files I touched
|
||||
- ✅ Tier 3 (New Code): Rate limiting configuration is syntactically correct
|
||||
@@ -75,6 +84,7 @@ Tiered rate limiting strategy:
|
||||
## Testing Status
|
||||
|
||||
**Blocked:** Federation module tests cannot run until Prisma schema is added. Pre-existing error:
|
||||
|
||||
```
|
||||
TypeError: Cannot read properties of undefined (reading 'PENDING')
|
||||
FederationConnectionStatus is undefined
|
||||
@@ -83,6 +93,7 @@ FederationConnectionStatus is undefined
|
||||
This is NOT caused by my changes - it's pre-existing technical debt from incomplete M7 federation implementation.
|
||||
|
||||
**Manual Verification:**
|
||||
|
||||
- TypeScript compilation: No new errors introduced
|
||||
- Rate limiting decorators: Correctly applied to all endpoints
|
||||
- ThrottlerModule: Properly configured with 3 tiers
|
||||
@@ -91,6 +102,7 @@ This is NOT caused by my changes - it's pre-existing technical debt from incompl
|
||||
## Testing
|
||||
|
||||
### Rate Limit Tests
|
||||
|
||||
1. Public endpoint exceeds limit → 429 Too Many Requests
|
||||
2. Authenticated endpoint exceeds limit → 429 Too Many Requests
|
||||
3. Within limits → 200 OK
|
||||
@@ -99,6 +111,7 @@ This is NOT caused by my changes - it's pre-existing technical debt from incompl
|
||||
6. Different users have independent limits
|
||||
|
||||
### Security Tests
|
||||
|
||||
1. Cannot bypass rate limit with different user agents
|
||||
2. Cannot bypass rate limit with different headers
|
||||
3. Rate limit counter resets after time window
|
||||
@@ -107,6 +120,7 @@ This is NOT caused by my changes - it's pre-existing technical debt from incompl
|
||||
## Federation Endpoints Requiring Rate Limiting
|
||||
|
||||
### FederationController (`/api/v1/federation`)
|
||||
|
||||
- `GET /instance` - Public (5 req/min per IP)
|
||||
- `POST /instance/regenerate-keys` - Admin (10 req/min per user)
|
||||
- `POST /connections/initiate` - Auth (10 req/min per user)
|
||||
@@ -118,6 +132,7 @@ This is NOT caused by my changes - it's pre-existing technical debt from incompl
|
||||
- `POST /incoming/connect` - **Public (3 req/min per IP)** ← CRITICAL
|
||||
|
||||
### FederationAuthController (`/api/v1/federation/auth`)
|
||||
|
||||
- `POST /initiate` - Auth (10 req/min per user)
|
||||
- `POST /link` - Auth (5 req/min per user)
|
||||
- `GET /identities` - Auth (30 req/min per user)
|
||||
@@ -127,18 +142,21 @@ This is NOT caused by my changes - it's pre-existing technical debt from incompl
|
||||
## Notes
|
||||
|
||||
### Design Decisions
|
||||
|
||||
- Use IP-based rate limiting for public endpoints
|
||||
- Use user-based rate limiting for authenticated endpoints
|
||||
- Store rate limit state in Valkey (Redis-compatible) for scalability
|
||||
- Include rate limit headers in responses (X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset)
|
||||
|
||||
### Attack Vectors Mitigated
|
||||
|
||||
1. **Connection Request Flooding:** Attacker sends unlimited connection requests to `/incoming/connect`
|
||||
2. **Token Validation Abuse:** Attacker floods `/auth/validate` to exhaust resources
|
||||
3. **Authenticated User Abuse:** Compromised credentials used to flood authenticated endpoints
|
||||
4. **Resource Exhaustion:** Prevents CPU/memory exhaustion from processing excessive requests
|
||||
|
||||
### Future Enhancements (Not in Scope)
|
||||
|
||||
- Circuit breaker pattern for failing instances
|
||||
- Geographic rate limiting
|
||||
- Adaptive rate limiting based on system load
|
||||
|
||||
@@ -7,11 +7,13 @@ The initial implementation (commit 6878d57) was high quality but included placeh
|
||||
## Security-Critical Issues
|
||||
|
||||
### 1. JWT Token Validation (CRITICAL)
|
||||
|
||||
**Problem**: `validateToken()` always returns `valid: false`
|
||||
**Risk**: Cannot verify authenticity of federated tokens
|
||||
**Solution**: Implement proper JWT validation with signature verification
|
||||
|
||||
### 2. OIDC Discovery (CRITICAL)
|
||||
|
||||
**Problem**: `generateAuthUrl()` returns hardcoded placeholder URL
|
||||
**Risk**: Cannot initiate real federated authentication flows
|
||||
**Solution**: Implement OIDC discovery and proper authorization URL generation
|
||||
@@ -19,9 +21,11 @@ The initial implementation (commit 6878d57) was high quality but included placeh
|
||||
## Implementation Plan
|
||||
|
||||
### 1. Add Dependencies
|
||||
|
||||
- [x] Add `jose` library for JWT handling (industry-standard, secure)
|
||||
|
||||
### 2. Implement JWT Validation
|
||||
|
||||
- [ ] Fetch OIDC discovery metadata from issuer
|
||||
- [ ] Cache JWKS (JSON Web Key Set) for performance
|
||||
- [ ] Verify JWT signature using remote public key
|
||||
@@ -31,6 +35,7 @@ The initial implementation (commit 6878d57) was high quality but included placeh
|
||||
- [ ] Return proper validation results
|
||||
|
||||
### 3. Implement OIDC Discovery
|
||||
|
||||
- [ ] Fetch `.well-known/openid-configuration` from remote instance
|
||||
- [ ] Cache discovery metadata
|
||||
- [ ] Generate proper OAuth2 authorization URL
|
||||
@@ -39,6 +44,7 @@ The initial implementation (commit 6878d57) was high quality but included placeh
|
||||
- [ ] Support standard OIDC scopes (openid, profile, email)
|
||||
|
||||
### 4. Update Tests
|
||||
|
||||
- [ ] Replace mock-based tests with real behavior tests
|
||||
- [ ] Test valid JWT validation
|
||||
- [ ] Test expired/invalid token rejection
|
||||
@@ -47,6 +53,7 @@ The initial implementation (commit 6878d57) was high quality but included placeh
|
||||
- [ ] Maintain 85%+ test coverage
|
||||
|
||||
### 5. Security Considerations
|
||||
|
||||
- Cache JWKS to avoid excessive network calls
|
||||
- Validate token expiration strictly
|
||||
- Use PKCE to prevent authorization code interception
|
||||
@@ -57,6 +64,7 @@ The initial implementation (commit 6878d57) was high quality but included placeh
|
||||
## Implementation Notes
|
||||
|
||||
**PKCE Flow**:
|
||||
|
||||
1. Generate random code_verifier (base64url-encoded random bytes)
|
||||
2. Generate code_challenge = base64url(SHA256(code_verifier))
|
||||
3. Store code_verifier in session/database
|
||||
@@ -64,6 +72,7 @@ The initial implementation (commit 6878d57) was high quality but included placeh
|
||||
5. Send code_verifier in token exchange
|
||||
|
||||
**JWT Validation Flow**:
|
||||
|
||||
1. Parse JWT without verification to get header
|
||||
2. Fetch JWKS from issuer (cache for 1 hour)
|
||||
3. Find matching key by kid (key ID)
|
||||
|
||||
@@ -12,6 +12,7 @@ Successfully implemented EVENT message type for federation, enabling pub/sub eve
|
||||
## What Was Implemented
|
||||
|
||||
### Database Schema
|
||||
|
||||
- **FederationEventSubscription Model**: New table for storing event subscriptions
|
||||
- Fields: id, workspaceId, connectionId, eventType, metadata, isActive, timestamps
|
||||
- Unique constraint on (workspaceId, connectionId, eventType)
|
||||
@@ -21,6 +22,7 @@ Successfully implemented EVENT message type for federation, enabling pub/sub eve
|
||||
### Core Services
|
||||
|
||||
**EventService** (`event.service.ts`)
|
||||
|
||||
- `subscribeToEventType()`: Subscribe to events from remote instance
|
||||
- `unsubscribeFromEventType()`: Remove event subscription
|
||||
- `publishEvent()`: Publish events to all subscribed connections
|
||||
@@ -35,6 +37,7 @@ Successfully implemented EVENT message type for federation, enabling pub/sub eve
|
||||
**EventController** (`event.controller.ts`)
|
||||
|
||||
**Authenticated Endpoints (require AuthGuard):**
|
||||
|
||||
- `POST /api/v1/federation/events/subscribe` - Subscribe to event type
|
||||
- `POST /api/v1/federation/events/unsubscribe` - Unsubscribe from event type
|
||||
- `POST /api/v1/federation/events/publish` - Publish event to subscribers
|
||||
@@ -43,12 +46,14 @@ Successfully implemented EVENT message type for federation, enabling pub/sub eve
|
||||
- `GET /api/v1/federation/events/messages/:id` - Get single event message
|
||||
|
||||
**Public Endpoints (signature-verified):**
|
||||
|
||||
- `POST /api/v1/federation/incoming/event` - Receive event from remote instance
|
||||
- `POST /api/v1/federation/incoming/event/ack` - Receive event acknowledgment
|
||||
|
||||
### Type Definitions
|
||||
|
||||
**Added to `message.types.ts`:**
|
||||
|
||||
- `EventMessage`: Outgoing event structure
|
||||
- `EventAck`: Event acknowledgment structure
|
||||
- `EventMessageDetails`: Event message response type
|
||||
@@ -57,6 +62,7 @@ Successfully implemented EVENT message type for federation, enabling pub/sub eve
|
||||
### Data Transfer Objects
|
||||
|
||||
**event.dto.ts:**
|
||||
|
||||
- `SubscribeToEventDto`: Subscribe request
|
||||
- `UnsubscribeFromEventDto`: Unsubscribe request
|
||||
- `PublishEventDto`: Publish event request
|
||||
@@ -66,12 +72,14 @@ Successfully implemented EVENT message type for federation, enabling pub/sub eve
|
||||
## Testing
|
||||
|
||||
### Test Coverage
|
||||
|
||||
- **EventService**: 18 unit tests, **89.09% coverage** ✅
|
||||
- **EventController**: 11 unit tests, **83.87% coverage** ✅
|
||||
- **Total**: 29 tests, all passing
|
||||
- **Coverage**: Exceeds 85% minimum requirement
|
||||
|
||||
### Test Scenarios Covered
|
||||
|
||||
- Subscription creation and deletion
|
||||
- Event publishing to multiple subscribers
|
||||
- Failed delivery handling
|
||||
@@ -84,17 +92,21 @@ Successfully implemented EVENT message type for federation, enabling pub/sub eve
|
||||
## Design Patterns
|
||||
|
||||
### Consistency with Existing Code
|
||||
|
||||
- Follows patterns from `QueryService` and `CommandService`
|
||||
- Reuses existing `SignatureService` for message verification
|
||||
- Reuses existing `FederationService` for instance identity
|
||||
- Uses existing `FederationMessage` model with new `eventType` field
|
||||
|
||||
### Event Type Naming Convention
|
||||
|
||||
Hierarchical dot-notation:
|
||||
|
||||
- `entity.action` (e.g., "task.created", "user.updated")
|
||||
- `entity.action.detail` (e.g., "task.status.changed")
|
||||
|
||||
### Security Features
|
||||
|
||||
- All events signature-verified (RSA)
|
||||
- Timestamp validation (prevents replay attacks)
|
||||
- Connection status validation (only active connections)
|
||||
@@ -103,14 +115,18 @@ Hierarchical dot-notation:
|
||||
## Technical Details
|
||||
|
||||
### Database Migration
|
||||
|
||||
File: `20260203_add_federation_event_subscriptions/migration.sql`
|
||||
|
||||
- Adds `eventType` column to `federation_messages`
|
||||
- Creates `federation_event_subscriptions` table
|
||||
- Adds appropriate indexes for performance
|
||||
- Establishes foreign key relationships
|
||||
|
||||
### Integration
|
||||
|
||||
Updated `federation.module.ts`:
|
||||
|
||||
- Added `EventService` to providers
|
||||
- Added `EventController` to controllers
|
||||
- Exported `EventService` for use by other modules
|
||||
@@ -126,6 +142,7 @@ Updated `federation.module.ts`:
|
||||
## Files Created/Modified
|
||||
|
||||
### New Files (7)
|
||||
|
||||
- `apps/api/src/federation/event.service.ts` (470 lines)
|
||||
- `apps/api/src/federation/event.service.spec.ts` (1,088 lines)
|
||||
- `apps/api/src/federation/event.controller.ts` (199 lines)
|
||||
@@ -135,11 +152,13 @@ Updated `federation.module.ts`:
|
||||
- `docs/scratchpads/90-event-subscriptions.md` (185 lines)
|
||||
|
||||
### Modified Files (3)
|
||||
|
||||
- `apps/api/src/federation/types/message.types.ts` (+118 lines)
|
||||
- `apps/api/src/federation/federation.module.ts` (+3 lines)
|
||||
- `apps/api/prisma/schema.prisma` (+27 lines)
|
||||
|
||||
### Total Changes
|
||||
|
||||
- **2,395 lines added**
|
||||
- **5 lines removed**
|
||||
- **10 files changed**
|
||||
@@ -147,20 +166,25 @@ Updated `federation.module.ts`:
|
||||
## Key Features
|
||||
|
||||
### Server-Side Event Filtering
|
||||
|
||||
Events are only sent to instances with active subscriptions for that event type. This prevents unnecessary network traffic and processing.
|
||||
|
||||
### Acknowledgment Protocol
|
||||
|
||||
Simple ACK pattern confirms event delivery:
|
||||
|
||||
1. Publisher sends event
|
||||
2. Receiver processes and returns ACK
|
||||
3. Publisher updates delivery status
|
||||
|
||||
### Error Handling
|
||||
|
||||
- Failed deliveries marked as FAILED with error message
|
||||
- Connection errors logged but don't crash the system
|
||||
- Invalid signatures rejected immediately
|
||||
|
||||
### Subscription Management
|
||||
|
||||
- Subscriptions persist in database
|
||||
- Can be activated/deactivated without deletion
|
||||
- Support for metadata (extensibility)
|
||||
@@ -168,6 +192,7 @@ Simple ACK pattern confirms event delivery:
|
||||
## Future Enhancements (Not Implemented)
|
||||
|
||||
These were considered but deferred to future issues:
|
||||
|
||||
- Event replay/history
|
||||
- Event filtering by payload fields
|
||||
- Webhook support for event delivery
|
||||
@@ -179,11 +204,13 @@ These were considered but deferred to future issues:
|
||||
## Performance Considerations
|
||||
|
||||
### Scalability
|
||||
|
||||
- Database indexes on eventType, connectionId, workspaceId
|
||||
- Efficient queries with proper WHERE clauses
|
||||
- Server-side filtering reduces network overhead
|
||||
|
||||
### Monitoring
|
||||
|
||||
- All operations logged with appropriate level
|
||||
- Failed deliveries tracked in database
|
||||
- Delivery timestamps recorded for analytics
|
||||
@@ -191,12 +218,14 @@ These were considered but deferred to future issues:
|
||||
## Documentation
|
||||
|
||||
### Inline Documentation
|
||||
|
||||
- JSDoc comments on all public methods
|
||||
- Clear parameter descriptions
|
||||
- Return type documentation
|
||||
- Usage examples in comments
|
||||
|
||||
### Scratchpad Documentation
|
||||
|
||||
- Complete implementation plan
|
||||
- Design decisions documented
|
||||
- Testing strategy outlined
|
||||
@@ -205,6 +234,7 @@ These were considered but deferred to future issues:
|
||||
## Integration Testing Recommendations
|
||||
|
||||
While unit tests are comprehensive, recommend integration testing:
|
||||
|
||||
1. Set up two federated instances
|
||||
2. Subscribe from Instance A to Instance B events
|
||||
3. Publish event from Instance B
|
||||
@@ -214,6 +244,7 @@ While unit tests are comprehensive, recommend integration testing:
|
||||
## Conclusion
|
||||
|
||||
FED-007 (EVENT Subscriptions) is **complete and ready for code review**. The implementation:
|
||||
|
||||
- ✅ Follows TDD principles
|
||||
- ✅ Meets 85%+ code coverage requirement
|
||||
- ✅ Passes all quality gates (lint, typecheck, tests)
|
||||
|
||||
40
docs/scratchpads/ms-gate-001-gatekeeper.md
Normal file
40
docs/scratchpads/ms-gate-001-gatekeeper.md
Normal file
@@ -0,0 +1,40 @@
|
||||
# MS-GATE-001 Scratchpad
|
||||
|
||||
## Objective
|
||||
|
||||
Build the API Gatekeeper module for PR auto-merge orchestration using Gitea PR webhooks, Woodpecker CI webhooks, and a `pending_merges` Prisma model.
|
||||
|
||||
## Constraints
|
||||
|
||||
- Work in `/home/jwoltje/src/mosaic-stack-worktrees/gate-001`
|
||||
- Do not merge or deploy
|
||||
- Must pass:
|
||||
- `pnpm format:check`
|
||||
- `SKIP_ENV_VALIDATION=true pnpm turbo typecheck`
|
||||
- `SKIP_ENV_VALIDATION=true pnpm turbo lint`
|
||||
- `SKIP_ENV_VALIDATION=true pnpm turbo test --filter=@mosaic/api -- --testPathPattern="gatekeeper"`
|
||||
|
||||
## ASSUMPTION
|
||||
|
||||
- Woodpecker PR pipelines expose `CI_COMMIT_PULL_REQUEST` and `CI_COMMIT_SHA`, so the webhook notifier can send `prNumber` and `headSha`.
|
||||
- Rationale: Gatekeeper needs an exact PR/head tuple to safely match CI back to `pending_merges`.
|
||||
|
||||
## Plan
|
||||
|
||||
1. Add Prisma model + SQL migration for `pending_merges`
|
||||
2. Add Gatekeeper NestJS module/controller/service/DTO/tests
|
||||
3. Wire Woodpecker webhook -> Gatekeeper CI handler
|
||||
4. Add env/config documentation and compose variables
|
||||
5. Run quality gates, review, remediate, push, open PR
|
||||
|
||||
## Progress
|
||||
|
||||
- [x] Context loaded
|
||||
- [ ] Tests added first
|
||||
- [ ] Implementation complete
|
||||
- [ ] Quality gates green
|
||||
- [ ] Push + PR opened
|
||||
|
||||
## Verification
|
||||
|
||||
- Pending
|
||||
Reference in New Issue
Block a user