Files
stack/docs/scratchpads/176-coordinator-integration.md
Jason Woltje 5a51ee8c30 feat(#176): Integrate M4.2 infrastructure with M4.1 coordinator
Add CoordinatorIntegrationModule providing REST API endpoints for the Python
coordinator to communicate with the NestJS API infrastructure:

- POST /coordinator/jobs - Create job from coordinator webhook events
- PATCH /coordinator/jobs/:id/status - Update job status (PENDING -> RUNNING)
- PATCH /coordinator/jobs/:id/progress - Update job progress percentage
- POST /coordinator/jobs/:id/complete - Mark job complete with results
- POST /coordinator/jobs/:id/fail - Mark job failed with gate results
- GET /coordinator/jobs/:id - Get job details with events and steps
- GET /coordinator/health - Integration health check

Integration features:
- Job creation dispatches to BullMQ queues
- Status updates emit JobEvents for audit logging
- Completion/failure events broadcast via Herald to Discord
- Status transition validation (PENDING -> QUEUED -> RUNNING -> COMPLETED/FAILED)
- Health check includes BullMQ connection status and queue counts

Also adds JOB_PROGRESS event type to event-types.ts for progress tracking.

Fixes #176

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 21:54:34 -06:00

3.1 KiB

Issue #176: Coordinator Integration

Objective

Integrate M4.2 infrastructure (NestJS API) with M4.1 coordinator (Python FastAPI) to enable seamless job orchestration between the two systems.

Architecture Analysis

M4.1 Coordinator (Python)

  • FastAPI application at apps/coordinator
  • Handles Gitea webhooks, queue management, agent orchestration
  • Uses file-based JSON queue for persistence
  • Has QueueManager, Coordinator, and OrchestrationLoop classes
  • Exposes /webhook/gitea and /health endpoints

M4.2 Infrastructure (NestJS)

  • StitcherModule: Workflow orchestration, webhook handling, job dispatch
  • RunnerJobsModule: CRUD for RunnerJob entities, BullMQ integration
  • JobEventsModule: Event tracking and audit logging
  • JobStepsModule: Step tracking for jobs
  • HeraldModule: Status broadcasting to Discord
  • BullMqModule: Queue infrastructure with Valkey backend
  • BridgeModule: Discord integration

Integration Design

Flow 1: Webhook -> Job Creation

Gitea -> Coordinator (Python) -> NestJS API -> RunnerJob + BullMQ
                              ^
                              | HTTP POST /api/coordinator/jobs

Flow 2: Job Status Updates

Coordinator (Python) -> NestJS API -> JobEvent -> Herald -> Discord
                      ^
                      | HTTP PATCH /api/coordinator/jobs/:id/status

Flow 3: Job Completion

Coordinator (Python) -> NestJS API -> Complete RunnerJob -> Herald broadcast
                      ^
                      | HTTP POST /api/coordinator/jobs/:id/complete

Implementation Plan

1. Create Coordinator Integration Module

  • apps/api/src/coordinator-integration/
    • coordinator-integration.module.ts - NestJS module
    • coordinator-integration.controller.ts - REST endpoints for Python coordinator
    • coordinator-integration.service.ts - Business logic
    • dto/ - DTOs for coordinator communication
    • interfaces/ - Type definitions

2. Endpoints for Python Coordinator

  • POST /api/coordinator/jobs - Create job from coordinator
  • PATCH /api/coordinator/jobs/:id/status - Update job status
  • POST /api/coordinator/jobs/:id/complete - Mark job complete
  • POST /api/coordinator/jobs/:id/fail - Mark job failed
  • GET /api/coordinator/health - Integration health check

3. Event Bridging

  • When coordinator reports progress -> emit JobEvent
  • When coordinator completes -> update RunnerJob + emit completion event
  • Herald subscribes and broadcasts to Discord

TDD Approach

  1. Write tests for CoordinatorIntegrationService
  2. Write tests for CoordinatorIntegrationController
  3. Implement minimal code to pass tests
  4. Refactor

Progress

  • Analyze coordinator structure
  • Analyze M4.2 infrastructure
  • Design integration layer
  • Write failing tests for service
  • Implement service
  • Write failing tests for controller
  • Implement controller
  • Add DTOs and interfaces
  • Run quality gates
  • Commit

Notes

  • The Python coordinator uses httpx.AsyncClient for HTTP calls
  • API auth can be handled via shared secret (API key)
  • Events follow established patterns from job-events module