Add CoordinatorIntegrationModule providing REST API endpoints for the Python coordinator to communicate with the NestJS API infrastructure: - POST /coordinator/jobs - Create job from coordinator webhook events - PATCH /coordinator/jobs/:id/status - Update job status (PENDING -> RUNNING) - PATCH /coordinator/jobs/:id/progress - Update job progress percentage - POST /coordinator/jobs/:id/complete - Mark job complete with results - POST /coordinator/jobs/:id/fail - Mark job failed with gate results - GET /coordinator/jobs/:id - Get job details with events and steps - GET /coordinator/health - Integration health check Integration features: - Job creation dispatches to BullMQ queues - Status updates emit JobEvents for audit logging - Completion/failure events broadcast via Herald to Discord - Status transition validation (PENDING -> QUEUED -> RUNNING -> COMPLETED/FAILED) - Health check includes BullMQ connection status and queue counts Also adds JOB_PROGRESS event type to event-types.ts for progress tracking. Fixes #176 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
103 lines
3.1 KiB
Markdown
103 lines
3.1 KiB
Markdown
# Issue #176: Coordinator Integration
|
|
|
|
## Objective
|
|
|
|
Integrate M4.2 infrastructure (NestJS API) with M4.1 coordinator (Python FastAPI) to enable seamless job orchestration between the two systems.
|
|
|
|
## Architecture Analysis
|
|
|
|
### M4.1 Coordinator (Python)
|
|
|
|
- FastAPI application at `apps/coordinator`
|
|
- Handles Gitea webhooks, queue management, agent orchestration
|
|
- Uses file-based JSON queue for persistence
|
|
- Has QueueManager, Coordinator, and OrchestrationLoop classes
|
|
- Exposes `/webhook/gitea` and `/health` endpoints
|
|
|
|
### M4.2 Infrastructure (NestJS)
|
|
|
|
- StitcherModule: Workflow orchestration, webhook handling, job dispatch
|
|
- RunnerJobsModule: CRUD for RunnerJob entities, BullMQ integration
|
|
- JobEventsModule: Event tracking and audit logging
|
|
- JobStepsModule: Step tracking for jobs
|
|
- HeraldModule: Status broadcasting to Discord
|
|
- BullMqModule: Queue infrastructure with Valkey backend
|
|
- BridgeModule: Discord integration
|
|
|
|
## Integration Design
|
|
|
|
### Flow 1: Webhook -> Job Creation
|
|
|
|
```
|
|
Gitea -> Coordinator (Python) -> NestJS API -> RunnerJob + BullMQ
|
|
^
|
|
| HTTP POST /api/coordinator/jobs
|
|
```
|
|
|
|
### Flow 2: Job Status Updates
|
|
|
|
```
|
|
Coordinator (Python) -> NestJS API -> JobEvent -> Herald -> Discord
|
|
^
|
|
| HTTP PATCH /api/coordinator/jobs/:id/status
|
|
```
|
|
|
|
### Flow 3: Job Completion
|
|
|
|
```
|
|
Coordinator (Python) -> NestJS API -> Complete RunnerJob -> Herald broadcast
|
|
^
|
|
| HTTP POST /api/coordinator/jobs/:id/complete
|
|
```
|
|
|
|
## Implementation Plan
|
|
|
|
### 1. Create Coordinator Integration Module
|
|
|
|
- `apps/api/src/coordinator-integration/`
|
|
- `coordinator-integration.module.ts` - NestJS module
|
|
- `coordinator-integration.controller.ts` - REST endpoints for Python coordinator
|
|
- `coordinator-integration.service.ts` - Business logic
|
|
- `dto/` - DTOs for coordinator communication
|
|
- `interfaces/` - Type definitions
|
|
|
|
### 2. Endpoints for Python Coordinator
|
|
|
|
- `POST /api/coordinator/jobs` - Create job from coordinator
|
|
- `PATCH /api/coordinator/jobs/:id/status` - Update job status
|
|
- `POST /api/coordinator/jobs/:id/complete` - Mark job complete
|
|
- `POST /api/coordinator/jobs/:id/fail` - Mark job failed
|
|
- `GET /api/coordinator/health` - Integration health check
|
|
|
|
### 3. Event Bridging
|
|
|
|
- When coordinator reports progress -> emit JobEvent
|
|
- When coordinator completes -> update RunnerJob + emit completion event
|
|
- Herald subscribes and broadcasts to Discord
|
|
|
|
## TDD Approach
|
|
|
|
1. Write tests for CoordinatorIntegrationService
|
|
2. Write tests for CoordinatorIntegrationController
|
|
3. Implement minimal code to pass tests
|
|
4. Refactor
|
|
|
|
## Progress
|
|
|
|
- [x] Analyze coordinator structure
|
|
- [x] Analyze M4.2 infrastructure
|
|
- [x] Design integration layer
|
|
- [x] Write failing tests for service
|
|
- [x] Implement service
|
|
- [x] Write failing tests for controller
|
|
- [x] Implement controller
|
|
- [x] Add DTOs and interfaces
|
|
- [x] Run quality gates
|
|
- [x] Commit
|
|
|
|
## Notes
|
|
|
|
- The Python coordinator uses httpx.AsyncClient for HTTP calls
|
|
- API auth can be handled via shared secret (API key)
|
|
- Events follow established patterns from job-events module
|