Commit Graph

11 Commits

Author SHA1 Message Date
4545c6dc7a fix(api,orchestrator): fix dependency injection and Docker build issues
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
API:
- Add AuthModule import to RunnerJobsModule
- Fixes: Nest can't resolve dependencies of AuthGuard

Orchestrator:
- Remove --prod flag from dependency installation
- Copy full node_modules tree to production stage
- Align Dockerfile with API pattern for monorepo builds
- Fixes: Cannot find module '@nestjs/core'

Both services now match the working API Dockerfile pattern.
2026-02-08 21:59:19 -06:00
Jason Woltje
519093f42e fix(tests): Correct pipeline test failures (#239)
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
ci/woodpecker/pr/woodpecker Pipeline failed
Fixes 4 test failures identified in pipeline run 239:

1. RunnerJobsService cancel tests:
   - Use updateMany mock instead of update (service uses optimistic locking)
   - Add version field to mock objects
   - Use mockResolvedValueOnce for sequential findUnique calls

2. ActivityService error handling tests:
   - Update tests to expect null return (fire-and-forget pattern)
   - Activity logging now returns null on DB errors per security fix

3. SecretScannerService unreadable file test:
   - Handle root user case where chmod 0o000 doesn't prevent reads
   - Test now adapts expectations based on runtime permissions

Quality gates: lint ✓ typecheck ✓ tests ✓
- @mosaic/orchestrator: 612 tests passing
- @mosaic/web: 650 tests passing

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-06 11:57:47 -06:00
Jason Woltje
880919c77e fix(#338): Add tests to verify runner jobs interval cleanup
- Add test verifying clearInterval is called in finally block
- Add test verifying interval is cleared even when stream throws error
- Prevents memory leaks from leaked intervals

The clearInterval was already present in the codebase at line 409 of
runner-jobs.service.ts. These tests provide explicit verification
of the cleanup behavior.

Refs #338

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 18:54:52 -06:00
a1973e6419 Fix QA validation issues and add M7.1 security fixes (#318)
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Co-authored-by: Jason Woltje <jason@diversecanvas.com>
Co-committed-by: Jason Woltje <jason@diversecanvas.com>
2026-02-04 03:08:09 +00:00
Jason Woltje
ef25167c24 fix(#196): fix race condition in job status updates
Implemented optimistic locking with version field and SELECT FOR UPDATE
transactions to prevent data corruption from concurrent job status updates.

Changes:
- Added version field to RunnerJob schema for optimistic locking
- Created migration 20260202_add_runner_job_version_for_concurrency
- Implemented ConcurrentUpdateException for conflict detection
- Updated RunnerJobsService methods with optimistic locking:
  * updateStatus() - with version checking and retry logic
  * updateProgress() - with version checking and retry logic
  * cancel() - with version checking and retry logic
- Updated CoordinatorIntegrationService with SELECT FOR UPDATE:
  * updateJobStatus() - transaction with row locking
  * completeJob() - transaction with row locking
  * failJob() - transaction with row locking
  * updateJobProgress() - optimistic locking
- Added retry mechanism (3 attempts) with exponential backoff
- Added comprehensive concurrency tests (10 tests, all passing)
- Updated existing test mocks to support updateMany

Test Results:
- All 10 concurrency tests passing ✓
- Tests cover concurrent status updates, progress updates, completions,
  cancellations, retry logic, and exponential backoff

This fix prevents race conditions that could cause:
- Lost job results (double completion)
- Lost progress updates
- Invalid status transitions
- Data corruption under concurrent access

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-02 12:51:17 -06:00
Jason Woltje
a3b48dd631 fix(#187): implement server-side SSE error recovery
Server-side improvements (ALL 27/27 TESTS PASSING):
- Add streamEventsFrom() method with lastEventId parameter for resuming streams
- Include event IDs in SSE messages (id: event-123) for reconnection support
- Send retry interval header (retry: 3000ms) to clients
- Classify errors as retryable vs non-retryable
- Handle transient errors gracefully with retry logic
- Support Last-Event-ID header in controller for automatic reconnection

Files modified:
- apps/api/src/runner-jobs/runner-jobs.service.ts (new streamEventsFrom method)
- apps/api/src/runner-jobs/runner-jobs.controller.ts (Last-Event-ID header support)
- apps/api/src/runner-jobs/runner-jobs.service.spec.ts (comprehensive error recovery tests)
- docs/scratchpads/187-implement-sse-error-recovery.md (implementation notes)

This ensures robust real-time updates with automatic recovery from network issues.
Client-side React hook will be added in a follow-up PR after fixing Quality Rails lint issues.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-02 12:41:12 -06:00
Jason Woltje
7101864a15 fix(#189): add composite database index for job_events table
Add composite index [jobId, timestamp] to improve query performance
for the most common job_events access patterns.

Changes:
- Add @@index([jobId, timestamp]) to JobEvent model in schema.prisma
- Create migration 20260202122655_add_job_events_composite_index
- Add performance tests to validate index effectiveness
- Document index design rationale in scratchpad
- Fix lint errors in api-key.guard, herald.service, runner-jobs.service

Rationale:
The composite index [jobId, timestamp] optimizes the dominant query
pattern used across all services:
- JobEventsService.getEventsByJobId (WHERE jobId, ORDER BY timestamp)
- RunnerJobsService.streamEvents (WHERE jobId + timestamp range)
- RunnerJobsService.findOne (implicit jobId filter + timestamp order)

This index provides:
- Fast filtering by jobId (highly selective)
- Efficient timestamp-based ordering
- Optimal support for timestamp range queries
- Backward compatibility with jobId-only queries

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-02 12:30:19 -06:00
3cdcbf6774 feat(#175): Implement E2E test harness
- Create comprehensive E2E test suite for job orchestration
- Add test fixtures for Discord, BullMQ, and Prisma mocks
- Implement 9 end-to-end test scenarios covering:
  * Happy path: webhook → job → step execution → completion
  * Event emission throughout job lifecycle
  * Step failure and retry handling
  * Job failure after max retries
  * Discord command parsing and job creation
  * WebSocket status updates integration
  * Job cancellation workflow
  * Job retry mechanism
  * Progress percentage tracking

- Add helper methods to services for simplified testing:
  * JobStepsService: start(), complete(), fail(), findByJob()
  * RunnerJobsService: updateStatus(), updateProgress()
  * JobEventsService: findByJob()

- Configure vitest.e2e.config.ts for E2E test execution
- All 9 E2E tests passing
- All 1405 unit tests passing
- Quality gates: typecheck, lint, build all passing

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 21:44:04 -06:00
8f3949e388 feat(#174): Implement SSE endpoint for CLI consumers
Add Server-Sent Events (SSE) endpoint for streaming job events to CLI
consumers who prefer HTTP streaming over WebSocket.

Endpoint: GET /runner-jobs/:id/events/stream

Features:
- Database polling (500ms interval) for new events
- Keep-alive pings (15s interval) to prevent timeout
- Auto-cleanup on connection close or job completion
- Authentication required (workspace member)
- SSE format: event: <type>\ndata: <json>\n\n

Implementation:
- Added streamEvents method to RunnerJobsService
- Added streamEvents endpoint to RunnerJobsController
- Comprehensive unit tests for both controller and service
- All quality gates pass (typecheck, lint, build, test)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 21:33:33 -06:00
e689a1379c feat(#171): Implement chat command parsing
Add command parsing layer for chat integration (Discord, Mattermost, Slack).

Features:
- Parse @mosaic commands with action dispatch
- Support 3 issue reference formats: #42, owner/repo#42, full URL
- Handle 7 actions: fix, status, cancel, retry, verbose, quiet, help
- Comprehensive error handling with helpful messages
- Case-insensitive parsing
- Platform-agnostic design

Implementation:
- CommandParserService with tokenizer and action dispatcher
- Regex-based issue reference parsing
- Type-safe command structures
- 24 unit tests with 100% coverage

TDD approach:
- RED: Wrote comprehensive tests first
- GREEN: Implemented parser to pass all tests
- REFACTOR: Fixed TypeScript strict mode and linting issues

Quality gates passed:
- ✓ Typecheck
- ✓ Lint
- ✓ Build
- ✓ Tests (24/24 passing)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 21:32:53 -06:00
7102b4a1d2 feat(#167): Implement Runner jobs CRUD and queue submission
Implements runner-jobs module for job lifecycle management and queue submission.

Changes:
- Created RunnerJobsModule with service, controller, and DTOs
- Implemented job creation with BullMQ queue submission
- Implemented job listing with filters (status, type, agentTaskId)
- Implemented job detail retrieval with steps and events
- Implemented cancel operation for pending/queued jobs
- Implemented retry operation for failed jobs
- Added comprehensive unit tests (24 tests, 100% coverage)
- Integrated with BullMQ for async job processing
- Integrated with Prisma for database operations
- Followed existing CRUD patterns from tasks/events modules

API Endpoints:
- POST /runner-jobs - Create and queue a new job
- GET /runner-jobs - List jobs (with filters)
- GET /runner-jobs/:id - Get job details
- POST /runner-jobs/:id/cancel - Cancel a running job
- POST /runner-jobs/:id/retry - Retry a failed job

Quality Gates:
- Typecheck:  PASSED
- Lint:  PASSED
- Build:  PASSED
- Tests:  PASSED (24/24 tests)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 21:09:03 -06:00