Implements FED-010: Agent Spawn via Federation feature that enables spawning and managing Claude agents on remote federated Mosaic Stack instances via COMMAND message type. Features: - Federation agent command types (spawn, status, kill) - FederationAgentService for handling agent operations - Integration with orchestrator's agent spawner/lifecycle services - API endpoints for spawning, querying status, and killing agents - Full command routing through federation COMMAND infrastructure - Comprehensive test coverage (12/12 tests passing) Architecture: - Hub → Spoke: Spawn agents on remote instances - Command flow: FederationController → FederationAgentService → CommandService → Remote Orchestrator - Response handling: Remote orchestrator returns agent status/results - Security: Connection validation, signature verification Files created: - apps/api/src/federation/types/federation-agent.types.ts - apps/api/src/federation/federation-agent.service.ts - apps/api/src/federation/federation-agent.service.spec.ts Files modified: - apps/api/src/federation/command.service.ts (agent command routing) - apps/api/src/federation/federation.controller.ts (agent endpoints) - apps/api/src/federation/federation.module.ts (service registration) - apps/orchestrator/src/api/agents/agents.controller.ts (status endpoint) - apps/orchestrator/src/api/agents/agents.module.ts (lifecycle integration) Testing: - 12/12 tests passing for FederationAgentService - All command service tests passing - TypeScript compilation successful - Linting passed Refs #93 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
5.3 KiB
Issue #185: Fix silent error swallowing in Herald broadcasting
Objective
Fix silent error swallowing in Herald broadcasting to ensure errors are properly logged, propagated, and surfaced. This is a BLOCKER for monitoring and debugging - silent errors prevent proper system observability.
Problem Analysis
Location of Issue
File: /home/localadmin/src/mosaic-stack/apps/api/src/herald/herald.service.ts
Lines 102-104:
} catch (error) {
this.logger.error(`Failed to broadcast event for job ${jobId}:`, error);
}
The Problem
The broadcastJobEvent method has a try-catch block that:
- Logs the error (good)
- Swallows the error completely (bad) - returns void without throwing
- Prevents callers from knowing if broadcasting failed
- Makes debugging and monitoring impossible
Impact
- Callers like
CoordinatorIntegrationServicehave no way to know if Herald broadcasting failed - Silent failures prevent proper error tracking and alerting
- No way to implement retry logic or fallback mechanisms
- Violates observability best practices
Approach
TDD Protocol
- RED - Write failing tests for error scenarios
- GREEN - Implement proper error handling
- REFACTOR - Clean up and ensure coverage
Solution Design
Option 1: Propagate Errors (CHOSEN)
- Throw errors after logging them
- Let callers decide how to handle (retry, ignore, alert)
- Add context to errors for better debugging
- Pros: Explicit error handling, better observability
- Cons: Breaking change for callers
Option 2: Return Error Result
- Return
{ success: boolean, error?: Error } - Callers can check result
- Pros: Non-breaking
- Cons: Easy to ignore, not idiomatic for async operations
Decision: Go with Option 1 (propagate errors) because:
- This is version 0.0.x, breaking changes acceptable
- Explicit error handling is better for system reliability
- Forces proper error handling at call sites
Implementation Steps
- Add test for database errors in
broadcastJobEvent - Add test for Discord send failures
- Add test for error context preservation
- Remove error swallowing from try-catch
- Add custom error class for Herald failures
- Update error logging to include full context
- Verify all tests pass
Progress
- Create scratchpad
- Analyze the problem
- Design solution
- RED: Write failing tests (4 new error handling tests)
- GREEN: Implement error propagation
- GREEN: Update error logging with context
- REFACTOR: Add coverage tests for formatting methods
- Run test coverage verification (96.1% - exceeds 85% requirement)
- Commit changes
Testing Strategy
Test Cases to Add
-
Database failure during job lookup
- Mock Prisma to throw error
- Verify error is propagated with context
-
Discord send failure
- Mock Discord service to reject
- Verify error is propagated with context
-
Error context preservation
- Verify jobId and event type are included in error
- Verify original error is preserved
-
Successful broadcast still works
- Ensure existing tests still pass
- No regression in happy path
Coverage Target
- Minimum 85% coverage (project requirement)
- Focus on error paths and edge cases
Results
Tests Added
- Database failure test - Verifies errors propagate when job lookup fails
- Discord send failure test - Verifies errors propagate when message sending fails
- Job events fetch failure test - Verifies errors propagate when fetching events fails
- Error context test - Verifies original error is preserved
- Coverage tests - 7 additional tests for formatting methods to reach 96.1% coverage
Coverage Achieved
- 96.1% statement coverage (target: 85%) ✅
- 78.43% branch coverage
- 100% function coverage
- 25 tests total (18 existing + 7 new)
Changes Made
File: /home/localadmin/src/mosaic-stack/apps/api/src/herald/herald.service.ts
- Lines 102-110: Enhanced error logging with event type context
- Line 110: Added
throw error;to propagate errors instead of swallowing them
File: /home/localadmin/src/mosaic-stack/apps/api/src/herald/herald.service.spec.ts
- Added 4 error handling tests (lines 328-454)
- Added 7 coverage tests for formatting methods
Notes
Related Code
CoordinatorIntegrationServicecallsbroadcastJobEventat lines 148, 249- No error handling at call sites (assumes success)
- Follow-up required: Update callers to handle errors properly (separate issue)
Impact of Changes
BREAKING CHANGE: This is a breaking change for callers of broadcastJobEvent, but acceptable because:
- Project is at version 0.0.x (pre-release)
- Improves system reliability and observability
- Forces explicit error handling at call sites
- Only 2 call sites in the codebase to update
Custom Error Class
export class HeraldBroadcastError extends Error {
constructor(
message: string,
public readonly jobId: string,
public readonly eventType: string,
public readonly cause: Error
) {
super(message);
this.name = "HeraldBroadcastError";
}
}
Migration Path
- Fix Herald service first (this issue)
- Update callers to handle errors (follow-up issue)
- Add retry logic if needed (follow-up issue)