Files
stack/docs/scratchpads/281-fix-broad-exception-catching.md
Jason Woltje f53f310061 fix(#281): Fix broad exception catching hiding system errors
Replaced broad try-catch blocks with targeted error handling that only
catches expected business logic errors (CommandProcessingError subclasses).
System errors (OOM, DB failures, network issues) now propagate correctly
for proper debugging and monitoring.

Changes:
- Created CommandProcessingError hierarchy for business logic errors
- UnknownCommandTypeError for invalid command types
- AgentCommandError for orchestrator communication failures
- InvalidCommandPayloadError for payload validation
- Updated command.service.ts to only catch CommandProcessingError
- Updated federation-agent.service.ts to throw appropriate error types
- Added comprehensive tests for both business and system error scenarios
- System errors now include structured logging with context
- All 286 federation tests pass

Impact:
- Debugging is now possible for system failures
- System errors properly trigger monitoring/alerting
- Business logic errors handled gracefully with error responses
- No more masking of critical issues like OOM or DB failures

Fixes #281

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-03 20:57:51 -06:00

1.7 KiB

Issue #281: Fix broad exception catching hiding system errors

Objective

Fix broad try-catch blocks in command.service.ts that catch ALL errors including system failures (OOM, DB failures, etc.), making debugging impossible.

Location

apps/api/src/federation/command.service.ts:168-194

Problem

The current implementation catches all errors in a broad try-catch block, which masks critical system errors as business logic failures. This makes debugging impossible and can hide serious issues like:

  • Out of memory errors
  • Database connection failures
  • Network failures
  • Module loading failures

Approach

  1. Define specific error types for expected business logic errors
  2. Only catch expected errors (e.g., module not found, command validation failures)
  3. Let system errors (OOM, DB failures, network issues) propagate naturally
  4. Add structured logging for business logic errors
  5. Add comprehensive tests for both business and system error scenarios

Implementation Plan

  • Create custom error classes for expected business errors
  • Update handleIncomingCommand to only catch expected errors
  • Add structured logging for security events
  • Write tests for business logic errors (should be caught)
  • Write tests for system errors (should propagate)
  • Verify all tests pass
  • Run quality gates (lint, typecheck, build)

Testing

  • Test business logic errors are caught and handled gracefully
  • Test system errors propagate correctly
  • Test error logging includes appropriate context
  • Maintain 85%+ coverage

Notes

  • This is a P0 security issue - proper error handling is critical for production debugging
  • Follow patterns from other federation services
  • Ensure backward compatibility with existing error handling flows