feat(gatekeeper): add PR merge automation service
Some checks failed
ci/woodpecker/push/ci Pipeline failed

This commit is contained in:
2026-03-10 21:35:11 -05:00
parent 3289677056
commit 6e2b9a307e
27 changed files with 1089 additions and 64 deletions

View File

@@ -94,6 +94,21 @@ OIDC_REDIRECT_URI=http://localhost:3001/auth/oauth2/callback/authentik
See [Authentik Setup](2-authentik.md) for complete OIDC configuration.
## Webhooks and Merge Automation
```bash
# Gitea webhook validation secret for /api/gatekeeper/webhook/gitea
GITEA_WEBHOOK_SECRET=your-random-webhook-secret
# Personal access token used by Gatekeeper to comment on and merge PRs
GITEA_API_TOKEN=your-gitea-api-token
# Master switch for the Gatekeeper auto-merge workflow
GATEKEEPER_ENABLED=true
```
Use a dedicated Gitea token with the minimum repository scope needed to comment on pull requests and perform merges.
## Cache and Storage
### Valkey (Redis-compatible)

View File

@@ -13,11 +13,11 @@ Images are tagged based on branch and event type:
### Tag Meanings
| Tag | Purpose | Stability |
| -------------------------- | ---------------------------------- | --------- |
| `latest` | Current build from `main` | Latest |
| `v*` (e.g., `v1.0.0`) | Versioned release | Immutable |
| `{sha}` (e.g., `658ec077`) | Specific commit for traceability | Immutable |
| Tag | Purpose | Stability |
| -------------------------- | -------------------------------- | --------- |
| `latest` | Current build from `main` | Latest |
| `v*` (e.g., `v1.0.0`) | Versioned release | Immutable |
| `{sha}` (e.g., `658ec077`) | Specific commit for traceability | Immutable |
## Retention Policy Configuration

View File

@@ -3,6 +3,7 @@
## Objective
Implement rate limiting on all federation endpoints to prevent denial-of-service (DoS) attacks. Federation endpoints currently have no rate limiting, allowing attackers to:
- Overwhelm the server with connection requests
- Flood token validation endpoints
- Exhaust system resources
@@ -12,6 +13,7 @@ Implement rate limiting on all federation endpoints to prevent denial-of-service
**Severity:** P0 (Critical) - Blocks production deployment
**Attack Vector:** Unauthenticated public endpoints allow unlimited requests
**Risk:** System can be brought down by flooding requests to:
1. `POST /api/v1/federation/incoming/connect` (Public, no auth)
2. `POST /api/v1/federation/auth/validate` (Public, no auth)
3. All other endpoints (authenticated, but can be abused)
@@ -19,15 +21,19 @@ Implement rate limiting on all federation endpoints to prevent denial-of-service
## Approach
### 1. Install @nestjs/throttler
Use NestJS's official rate limiting package which integrates with the framework's guard system.
### 2. Configure Rate Limits
Tiered rate limiting strategy:
- **Public endpoints:** Strict limits (5 req/min per IP)
- **Authenticated endpoints:** Moderate limits (20 req/min per user)
- **Admin endpoints:** Higher limits (50 req/min per user)
### 3. Implementation Strategy
1. Add `@nestjs/throttler` dependency
2. Configure ThrottlerModule globally
3. Apply custom rate limits per endpoint using decorators
@@ -53,6 +59,7 @@ Tiered rate limiting strategy:
**COMPLETE** - Rate limiting successfully implemented on all federation endpoints.
**Security Impact:** MITIGATED
- DoS vulnerability eliminated via rate limiting
- Public endpoints protected with strict limits (3 req/sec)
- Authenticated endpoints have moderate limits (20 req/min)
@@ -61,6 +68,7 @@ Tiered rate limiting strategy:
## Baseline Quality Status
**Pre-existing Technical Debt** (NOT introduced by this fix):
- 29 TypeScript errors in apps/api (federation + runner-jobs)
- Federation: Missing Prisma schema types (`FederationConnectionStatus`, `Instance`, `federatedIdentity`)
- Runner Jobs: Missing `version` field in schema
@@ -68,6 +76,7 @@ Tiered rate limiting strategy:
- **My changes introduced 0 new errors**
**Quality Assessment:**
- ✅ Tier 1 (Baseline): No regression (error count unchanged)
- ✅ Tier 2 (Modified Files): 0 new errors in files I touched
- ✅ Tier 3 (New Code): Rate limiting configuration is syntactically correct
@@ -75,6 +84,7 @@ Tiered rate limiting strategy:
## Testing Status
**Blocked:** Federation module tests cannot run until Prisma schema is added. Pre-existing error:
```
TypeError: Cannot read properties of undefined (reading 'PENDING')
FederationConnectionStatus is undefined
@@ -83,6 +93,7 @@ FederationConnectionStatus is undefined
This is NOT caused by my changes - it's pre-existing technical debt from incomplete M7 federation implementation.
**Manual Verification:**
- TypeScript compilation: No new errors introduced
- Rate limiting decorators: Correctly applied to all endpoints
- ThrottlerModule: Properly configured with 3 tiers
@@ -91,6 +102,7 @@ This is NOT caused by my changes - it's pre-existing technical debt from incompl
## Testing
### Rate Limit Tests
1. Public endpoint exceeds limit → 429 Too Many Requests
2. Authenticated endpoint exceeds limit → 429 Too Many Requests
3. Within limits → 200 OK
@@ -99,6 +111,7 @@ This is NOT caused by my changes - it's pre-existing technical debt from incompl
6. Different users have independent limits
### Security Tests
1. Cannot bypass rate limit with different user agents
2. Cannot bypass rate limit with different headers
3. Rate limit counter resets after time window
@@ -107,6 +120,7 @@ This is NOT caused by my changes - it's pre-existing technical debt from incompl
## Federation Endpoints Requiring Rate Limiting
### FederationController (`/api/v1/federation`)
- `GET /instance` - Public (5 req/min per IP)
- `POST /instance/regenerate-keys` - Admin (10 req/min per user)
- `POST /connections/initiate` - Auth (10 req/min per user)
@@ -118,6 +132,7 @@ This is NOT caused by my changes - it's pre-existing technical debt from incompl
- `POST /incoming/connect` - **Public (3 req/min per IP)** ← CRITICAL
### FederationAuthController (`/api/v1/federation/auth`)
- `POST /initiate` - Auth (10 req/min per user)
- `POST /link` - Auth (5 req/min per user)
- `GET /identities` - Auth (30 req/min per user)
@@ -127,18 +142,21 @@ This is NOT caused by my changes - it's pre-existing technical debt from incompl
## Notes
### Design Decisions
- Use IP-based rate limiting for public endpoints
- Use user-based rate limiting for authenticated endpoints
- Store rate limit state in Valkey (Redis-compatible) for scalability
- Include rate limit headers in responses (X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset)
### Attack Vectors Mitigated
1. **Connection Request Flooding:** Attacker sends unlimited connection requests to `/incoming/connect`
2. **Token Validation Abuse:** Attacker floods `/auth/validate` to exhaust resources
3. **Authenticated User Abuse:** Compromised credentials used to flood authenticated endpoints
4. **Resource Exhaustion:** Prevents CPU/memory exhaustion from processing excessive requests
### Future Enhancements (Not in Scope)
- Circuit breaker pattern for failing instances
- Geographic rate limiting
- Adaptive rate limiting based on system load

View File

@@ -7,11 +7,13 @@ The initial implementation (commit 6878d57) was high quality but included placeh
## Security-Critical Issues
### 1. JWT Token Validation (CRITICAL)
**Problem**: `validateToken()` always returns `valid: false`
**Risk**: Cannot verify authenticity of federated tokens
**Solution**: Implement proper JWT validation with signature verification
### 2. OIDC Discovery (CRITICAL)
**Problem**: `generateAuthUrl()` returns hardcoded placeholder URL
**Risk**: Cannot initiate real federated authentication flows
**Solution**: Implement OIDC discovery and proper authorization URL generation
@@ -19,9 +21,11 @@ The initial implementation (commit 6878d57) was high quality but included placeh
## Implementation Plan
### 1. Add Dependencies
- [x] Add `jose` library for JWT handling (industry-standard, secure)
### 2. Implement JWT Validation
- [ ] Fetch OIDC discovery metadata from issuer
- [ ] Cache JWKS (JSON Web Key Set) for performance
- [ ] Verify JWT signature using remote public key
@@ -31,6 +35,7 @@ The initial implementation (commit 6878d57) was high quality but included placeh
- [ ] Return proper validation results
### 3. Implement OIDC Discovery
- [ ] Fetch `.well-known/openid-configuration` from remote instance
- [ ] Cache discovery metadata
- [ ] Generate proper OAuth2 authorization URL
@@ -39,6 +44,7 @@ The initial implementation (commit 6878d57) was high quality but included placeh
- [ ] Support standard OIDC scopes (openid, profile, email)
### 4. Update Tests
- [ ] Replace mock-based tests with real behavior tests
- [ ] Test valid JWT validation
- [ ] Test expired/invalid token rejection
@@ -47,6 +53,7 @@ The initial implementation (commit 6878d57) was high quality but included placeh
- [ ] Maintain 85%+ test coverage
### 5. Security Considerations
- Cache JWKS to avoid excessive network calls
- Validate token expiration strictly
- Use PKCE to prevent authorization code interception
@@ -57,6 +64,7 @@ The initial implementation (commit 6878d57) was high quality but included placeh
## Implementation Notes
**PKCE Flow**:
1. Generate random code_verifier (base64url-encoded random bytes)
2. Generate code_challenge = base64url(SHA256(code_verifier))
3. Store code_verifier in session/database
@@ -64,6 +72,7 @@ The initial implementation (commit 6878d57) was high quality but included placeh
5. Send code_verifier in token exchange
**JWT Validation Flow**:
1. Parse JWT without verification to get header
2. Fetch JWKS from issuer (cache for 1 hour)
3. Find matching key by kid (key ID)

View File

@@ -12,6 +12,7 @@ Successfully implemented EVENT message type for federation, enabling pub/sub eve
## What Was Implemented
### Database Schema
- **FederationEventSubscription Model**: New table for storing event subscriptions
- Fields: id, workspaceId, connectionId, eventType, metadata, isActive, timestamps
- Unique constraint on (workspaceId, connectionId, eventType)
@@ -21,6 +22,7 @@ Successfully implemented EVENT message type for federation, enabling pub/sub eve
### Core Services
**EventService** (`event.service.ts`)
- `subscribeToEventType()`: Subscribe to events from remote instance
- `unsubscribeFromEventType()`: Remove event subscription
- `publishEvent()`: Publish events to all subscribed connections
@@ -35,6 +37,7 @@ Successfully implemented EVENT message type for federation, enabling pub/sub eve
**EventController** (`event.controller.ts`)
**Authenticated Endpoints (require AuthGuard):**
- `POST /api/v1/federation/events/subscribe` - Subscribe to event type
- `POST /api/v1/federation/events/unsubscribe` - Unsubscribe from event type
- `POST /api/v1/federation/events/publish` - Publish event to subscribers
@@ -43,12 +46,14 @@ Successfully implemented EVENT message type for federation, enabling pub/sub eve
- `GET /api/v1/federation/events/messages/:id` - Get single event message
**Public Endpoints (signature-verified):**
- `POST /api/v1/federation/incoming/event` - Receive event from remote instance
- `POST /api/v1/federation/incoming/event/ack` - Receive event acknowledgment
### Type Definitions
**Added to `message.types.ts`:**
- `EventMessage`: Outgoing event structure
- `EventAck`: Event acknowledgment structure
- `EventMessageDetails`: Event message response type
@@ -57,6 +62,7 @@ Successfully implemented EVENT message type for federation, enabling pub/sub eve
### Data Transfer Objects
**event.dto.ts:**
- `SubscribeToEventDto`: Subscribe request
- `UnsubscribeFromEventDto`: Unsubscribe request
- `PublishEventDto`: Publish event request
@@ -66,12 +72,14 @@ Successfully implemented EVENT message type for federation, enabling pub/sub eve
## Testing
### Test Coverage
- **EventService**: 18 unit tests, **89.09% coverage**
- **EventController**: 11 unit tests, **83.87% coverage**
- **Total**: 29 tests, all passing
- **Coverage**: Exceeds 85% minimum requirement
### Test Scenarios Covered
- Subscription creation and deletion
- Event publishing to multiple subscribers
- Failed delivery handling
@@ -84,17 +92,21 @@ Successfully implemented EVENT message type for federation, enabling pub/sub eve
## Design Patterns
### Consistency with Existing Code
- Follows patterns from `QueryService` and `CommandService`
- Reuses existing `SignatureService` for message verification
- Reuses existing `FederationService` for instance identity
- Uses existing `FederationMessage` model with new `eventType` field
### Event Type Naming Convention
Hierarchical dot-notation:
- `entity.action` (e.g., "task.created", "user.updated")
- `entity.action.detail` (e.g., "task.status.changed")
### Security Features
- All events signature-verified (RSA)
- Timestamp validation (prevents replay attacks)
- Connection status validation (only active connections)
@@ -103,14 +115,18 @@ Hierarchical dot-notation:
## Technical Details
### Database Migration
File: `20260203_add_federation_event_subscriptions/migration.sql`
- Adds `eventType` column to `federation_messages`
- Creates `federation_event_subscriptions` table
- Adds appropriate indexes for performance
- Establishes foreign key relationships
### Integration
Updated `federation.module.ts`:
- Added `EventService` to providers
- Added `EventController` to controllers
- Exported `EventService` for use by other modules
@@ -126,6 +142,7 @@ Updated `federation.module.ts`:
## Files Created/Modified
### New Files (7)
- `apps/api/src/federation/event.service.ts` (470 lines)
- `apps/api/src/federation/event.service.spec.ts` (1,088 lines)
- `apps/api/src/federation/event.controller.ts` (199 lines)
@@ -135,11 +152,13 @@ Updated `federation.module.ts`:
- `docs/scratchpads/90-event-subscriptions.md` (185 lines)
### Modified Files (3)
- `apps/api/src/federation/types/message.types.ts` (+118 lines)
- `apps/api/src/federation/federation.module.ts` (+3 lines)
- `apps/api/prisma/schema.prisma` (+27 lines)
### Total Changes
- **2,395 lines added**
- **5 lines removed**
- **10 files changed**
@@ -147,20 +166,25 @@ Updated `federation.module.ts`:
## Key Features
### Server-Side Event Filtering
Events are only sent to instances with active subscriptions for that event type. This prevents unnecessary network traffic and processing.
### Acknowledgment Protocol
Simple ACK pattern confirms event delivery:
1. Publisher sends event
2. Receiver processes and returns ACK
3. Publisher updates delivery status
### Error Handling
- Failed deliveries marked as FAILED with error message
- Connection errors logged but don't crash the system
- Invalid signatures rejected immediately
### Subscription Management
- Subscriptions persist in database
- Can be activated/deactivated without deletion
- Support for metadata (extensibility)
@@ -168,6 +192,7 @@ Simple ACK pattern confirms event delivery:
## Future Enhancements (Not Implemented)
These were considered but deferred to future issues:
- Event replay/history
- Event filtering by payload fields
- Webhook support for event delivery
@@ -179,11 +204,13 @@ These were considered but deferred to future issues:
## Performance Considerations
### Scalability
- Database indexes on eventType, connectionId, workspaceId
- Efficient queries with proper WHERE clauses
- Server-side filtering reduces network overhead
### Monitoring
- All operations logged with appropriate level
- Failed deliveries tracked in database
- Delivery timestamps recorded for analytics
@@ -191,12 +218,14 @@ These were considered but deferred to future issues:
## Documentation
### Inline Documentation
- JSDoc comments on all public methods
- Clear parameter descriptions
- Return type documentation
- Usage examples in comments
### Scratchpad Documentation
- Complete implementation plan
- Design decisions documented
- Testing strategy outlined
@@ -205,6 +234,7 @@ These were considered but deferred to future issues:
## Integration Testing Recommendations
While unit tests are comprehensive, recommend integration testing:
1. Set up two federated instances
2. Subscribe from Instance A to Instance B events
3. Publish event from Instance B
@@ -214,6 +244,7 @@ While unit tests are comprehensive, recommend integration testing:
## Conclusion
FED-007 (EVENT Subscriptions) is **complete and ready for code review**. The implementation:
- ✅ Follows TDD principles
- ✅ Meets 85%+ code coverage requirement
- ✅ Passes all quality gates (lint, typecheck, tests)

View File

@@ -0,0 +1,40 @@
# MS-GATE-001 Scratchpad
## Objective
Build the API Gatekeeper module for PR auto-merge orchestration using Gitea PR webhooks, Woodpecker CI webhooks, and a `pending_merges` Prisma model.
## Constraints
- Work in `/home/jwoltje/src/mosaic-stack-worktrees/gate-001`
- Do not merge or deploy
- Must pass:
- `pnpm format:check`
- `SKIP_ENV_VALIDATION=true pnpm turbo typecheck`
- `SKIP_ENV_VALIDATION=true pnpm turbo lint`
- `SKIP_ENV_VALIDATION=true pnpm turbo test --filter=@mosaic/api -- --testPathPattern="gatekeeper"`
## ASSUMPTION
- Woodpecker PR pipelines expose `CI_COMMIT_PULL_REQUEST` and `CI_COMMIT_SHA`, so the webhook notifier can send `prNumber` and `headSha`.
- Rationale: Gatekeeper needs an exact PR/head tuple to safely match CI back to `pending_merges`.
## Plan
1. Add Prisma model + SQL migration for `pending_merges`
2. Add Gatekeeper NestJS module/controller/service/DTO/tests
3. Wire Woodpecker webhook -> Gatekeeper CI handler
4. Add env/config documentation and compose variables
5. Run quality gates, review, remediate, push, open PR
## Progress
- [x] Context loaded
- [ ] Tests added first
- [ ] Implementation complete
- [ ] Quality gates green
- [ ] Push + PR opened
## Verification
- Pending