Jason Woltje
6de631cd07
feat( #313 ): Implement FastAPI and agent tracing instrumentation
...
ci/woodpecker/push/woodpecker Pipeline failed
Add comprehensive OpenTelemetry distributed tracing to the coordinator
FastAPI service with automatic request tracing and custom decorators.
Implementation:
- Created src/telemetry.py: OTEL SDK initialization with OTLP exporter
- Created src/tracing_decorators.py: @trace_agent_operation and
@trace_tool_execution decorators with sync/async support
- Integrated FastAPI auto-instrumentation in src/main.py
- Added tracing to coordinator operations in src/coordinator.py
- Environment-based configuration (OTEL_ENABLED, endpoint, sampling)
Features:
- Automatic HTTP request/response tracing via FastAPIInstrumentor
- Custom span enrichment with agent context (issue_id, agent_type)
- Graceful degradation when telemetry disabled
- Proper exception recording and status management
- Resource attributes (service.name, service.version, deployment.env)
- Configurable sampling ratio (0.0-1.0, defaults to 1.0)
Testing:
- 25 comprehensive tests (17 telemetry, 8 decorators)
- Coverage: 90-91% (exceeds 85% requirement)
- All tests passing, no regressions
Quality:
- Zero linting errors (ruff)
- Zero type checking errors (mypy)
- Security review approved (no vulnerabilities)
- Follows OTEL semantic conventions
- Proper error handling and resource cleanup
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com >
2026-02-04 14:25:48 -06:00
Jason Woltje
5d683d401e
fix( #121 ): Remediate security issues from ORCH-121 review
...
ci/woodpecker/push/woodpecker Pipeline failed
Priority Fixes (Required Before Production):
H3: Add rate limiting to webhook endpoint
- Added slowapi library for FastAPI rate limiting
- Implemented per-IP rate limiting (100 req/min) on webhook endpoint
- Added global rate limiting support via slowapi
M4: Add subprocess timeouts to all gates
- Added timeout=300 (5 minutes) to all subprocess.run() calls in gates
- Implemented proper TimeoutExpired exception handling
- Removed dead CalledProcessError handlers (check=False makes them unreachable)
M2: Add input validation on QualityCheckRequest
- Validate files array size (max 1000 files)
- Validate file paths (no path traversal, no null bytes, no absolute paths)
- Validate diff summary size (max 10KB)
- Validate taskId and agentId format (non-empty)
Additional Fixes:
H1: Fix coverage.json path resolution
- Use absolute paths resolved from project root
- Validate path is within project boundaries (prevent path traversal)
Code Review Cleanup:
- Moved imports to module level in quality_orchestrator.py
- Refactored mock detection logic into separate helper methods
- Removed dead subprocess.CalledProcessError exception handlers from all gates
Testing:
- Added comprehensive tests for all security fixes
- All 339 coordinator tests pass
- All 447 orchestrator tests pass
- Followed TDD principles (RED-GREEN-REFACTOR)
Security Impact:
- Prevents webhook DoS attacks via rate limiting
- Prevents hung processes via subprocess timeouts
- Prevents path traversal attacks via input validation
- Prevents malformed input attacks via comprehensive validation
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com >
2026-02-04 11:50:05 -06:00
88953fc998
feat( #160 ): Implement basic orchestration loop
...
Implements the Coordinator class with main orchestration loop:
- Async loop architecture with configurable poll interval
- process_queue() method gets next ready issue and spawns agent (stub)
- Graceful shutdown handling with stop() method
- Error handling that allows loop to continue after failures
- Logging for all actions (start, stop, processing, errors)
- Integration with QueueManager from #159
- Active agent tracking for future agent management
Configuration settings added:
- COORDINATOR_POLL_INTERVAL (default: 5.0s)
- COORDINATOR_MAX_CONCURRENT_AGENTS (default: 10)
- COORDINATOR_ENABLED (default: true)
Tests: 27 new tests covering all acceptance criteria
Coverage: 92% overall (100% for coordinator.py)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-02-01 18:03:12 -06:00
e23c09f1f2
feat( #157 ): Set up webhook receiver endpoint
...
Implement FastAPI webhook receiver for Gitea issue assignment events
with HMAC SHA256 signature verification and event routing.
Implementation details:
- FastAPI application with /webhook/gitea POST endpoint
- HMAC SHA256 signature verification in security.py
- Event routing for assigned, unassigned, closed actions
- Comprehensive logging for all webhook events
- Health check endpoint at /health
- Docker containerization with health checks
- 91% test coverage (exceeds 85% requirement)
TDD workflow followed:
- Wrote 16 tests first (RED phase)
- Implemented features to pass tests (GREEN phase)
- All tests passing with 91% coverage
- Type checking with mypy: success
- Linting with ruff: success
Files created:
- apps/coordinator/src/main.py - FastAPI application
- apps/coordinator/src/webhook.py - Webhook handlers
- apps/coordinator/src/security.py - HMAC verification
- apps/coordinator/src/config.py - Configuration management
- apps/coordinator/tests/ - Comprehensive test suite
- apps/coordinator/Dockerfile - Production container
- apps/coordinator/pyproject.toml - Python project config
Configuration:
- Updated .env.example with GITEA_WEBHOOK_SECRET
- Updated docker-compose.yml with coordinator service
Testing:
- 16 unit and integration tests
- Security tests for signature verification
- Event handler tests for all supported actions
- Health check endpoint tests
- All tests passing with 91% coverage
This unblocks issue #158 (issue parser).
Fixes #157
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com >
2026-02-01 17:41:46 -06:00