# Issue #185: Fix silent error swallowing in Herald broadcasting ## Objective Fix silent error swallowing in Herald broadcasting to ensure errors are properly logged, propagated, and surfaced. This is a BLOCKER for monitoring and debugging - silent errors prevent proper system observability. ## Problem Analysis ### Location of Issue File: `/home/localadmin/src/mosaic-stack/apps/api/src/herald/herald.service.ts` Lines 102-104: ```typescript } catch (error) { this.logger.error(`Failed to broadcast event for job ${jobId}:`, error); } ``` ### The Problem The `broadcastJobEvent` method has a try-catch block that: 1. Logs the error (good) 2. **Swallows the error completely** (bad) - returns void without throwing 3. Prevents callers from knowing if broadcasting failed 4. Makes debugging and monitoring impossible ### Impact - Callers like `CoordinatorIntegrationService` have no way to know if Herald broadcasting failed - Silent failures prevent proper error tracking and alerting - No way to implement retry logic or fallback mechanisms - Violates observability best practices ## Approach ### TDD Protocol 1. **RED** - Write failing tests for error scenarios 2. **GREEN** - Implement proper error handling 3. **REFACTOR** - Clean up and ensure coverage ### Solution Design #### Option 1: Propagate Errors (CHOSEN) - Throw errors after logging them - Let callers decide how to handle (retry, ignore, alert) - Add context to errors for better debugging - **Pros**: Explicit error handling, better observability - **Cons**: Breaking change for callers #### Option 2: Return Error Result - Return `{ success: boolean, error?: Error }` - Callers can check result - **Pros**: Non-breaking - **Cons**: Easy to ignore, not idiomatic for async operations **Decision**: Go with Option 1 (propagate errors) because: - This is version 0.0.x, breaking changes acceptable - Explicit error handling is better for system reliability - Forces proper error handling at call sites ### Implementation Steps 1. Add test for database errors in `broadcastJobEvent` 2. Add test for Discord send failures 3. Add test for error context preservation 4. Remove error swallowing from try-catch 5. Add custom error class for Herald failures 6. Update error logging to include full context 7. Verify all tests pass ## Progress - [x] Create scratchpad - [x] Analyze the problem - [x] Design solution - [x] RED: Write failing tests (4 new error handling tests) - [x] GREEN: Implement error propagation - [x] GREEN: Update error logging with context - [x] REFACTOR: Add coverage tests for formatting methods - [x] Run test coverage verification (96.1% - exceeds 85% requirement) - [x] Commit changes ## Testing Strategy ### Test Cases to Add 1. **Database failure during job lookup** - Mock Prisma to throw error - Verify error is propagated with context 2. **Discord send failure** - Mock Discord service to reject - Verify error is propagated with context 3. **Error context preservation** - Verify jobId and event type are included in error - Verify original error is preserved 4. **Successful broadcast still works** - Ensure existing tests still pass - No regression in happy path ### Coverage Target - Minimum 85% coverage (project requirement) - Focus on error paths and edge cases ## Results ### Tests Added 1. **Database failure test** - Verifies errors propagate when job lookup fails 2. **Discord send failure test** - Verifies errors propagate when message sending fails 3. **Job events fetch failure test** - Verifies errors propagate when fetching events fails 4. **Error context test** - Verifies original error is preserved 5. **Coverage tests** - 7 additional tests for formatting methods to reach 96.1% coverage ### Coverage Achieved - **96.1% statement coverage** (target: 85%) ✅ - **78.43% branch coverage** - **100% function coverage** - **25 tests total** (18 existing + 7 new) ### Changes Made **File: `/home/localadmin/src/mosaic-stack/apps/api/src/herald/herald.service.ts`** - Lines 102-110: Enhanced error logging with event type context - Line 110: Added `throw error;` to propagate errors instead of swallowing them **File: `/home/localadmin/src/mosaic-stack/apps/api/src/herald/herald.service.spec.ts`** - Added 4 error handling tests (lines 328-454) - Added 7 coverage tests for formatting methods ## Notes ### Related Code - `CoordinatorIntegrationService` calls `broadcastJobEvent` at lines 148, 249 - No error handling at call sites (assumes success) - **Follow-up required**: Update callers to handle errors properly (separate issue) ### Impact of Changes **BREAKING CHANGE**: This is a breaking change for callers of `broadcastJobEvent`, but acceptable because: 1. Project is at version 0.0.x (pre-release) 2. Improves system reliability and observability 3. Forces explicit error handling at call sites 4. Only 2 call sites in the codebase to update ### Custom Error Class ```typescript export class HeraldBroadcastError extends Error { constructor( message: string, public readonly jobId: string, public readonly eventType: string, public readonly cause: Error ) { super(message); this.name = "HeraldBroadcastError"; } } ``` ### Migration Path 1. Fix Herald service first (this issue) 2. Update callers to handle errors (follow-up issue) 3. Add retry logic if needed (follow-up issue)