Commit Graph

426 Commits

Author SHA1 Message Date
Jason Woltje
7f3cd17488 fix(#338): Add structured logging for embedding failures
- Replace console.error with NestJS Logger
- Include entry ID and workspace ID in error context
- Easier to track and debug embedding issues

Refs #338

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 16:26:30 -06:00
Jason Woltje
6c88e2b96d fix(#338): Don't instantiate OpenAI client with missing API key
- Skip client initialization when OPENAI_API_KEY not configured
- Set openai property to null instead of creating with dummy key
- Methods return gracefully when embeddings not available
- Updated tests to verify client is not instantiated without key

Refs #338

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 16:21:17 -06:00
Jason Woltje
8d542609ff test(#337): Add workspaceId verification tests for multi-tenant isolation
- Verify tasks.service includes workspaceId in all queries
- Verify knowledge.service includes workspaceId in all queries
- Verify projects.service includes workspaceId in all queries
- Verify events.service includes workspaceId in all queries
- Add 39 tests covering create, findAll, findOne, update, remove operations
- Document security concern: findAll accepts empty query without workspaceId
- Ensures tenant isolation is maintained at query level

Refs #337

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 16:14:46 -06:00
Jason Woltje
721d6d15c5 chore: Add orchestrator report directory to .gitignore
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
QA automation reports in docs/reports/qa-automation/ are ephemeral and
should not be committed. They are cleaned up by the orchestrator after
task completion.
2026-02-05 16:12:15 -06:00
Jason Woltje
3055bd2d85 fix(#337): Fix boolean logic bug in ReactFlowEditor (use || instead of ??)
- Nullish coalescing (??) doesn't work with booleans as expected
- When readOnly=false, ?? never evaluates right side (!selectedNode)
- Changed to logical OR (||) for correct disabled state calculation
- Added comprehensive tests verifying the fix:
  * readOnly=false with no selection: editing disabled
  * readOnly=false with selection: editing enabled
  * readOnly=true: editing always disabled
- Removed unused eslint-disable directive

Refs #337

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 16:08:55 -06:00
Jason Woltje
c30b4b1cc2 fix(#337): Replace hardcoded OIDC values in federation with env vars
- Use OIDC_ISSUER and OIDC_CLIENT_ID from environment for JWT validation
- Federation OIDC properly configured from environment variables
- Fail fast with clear error when OIDC config is missing
- Handle trailing slash normalization for issuer URL
- Add tests verifying env var usage and missing config error handling

Refs #337

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 16:03:09 -06:00
Jason Woltje
7cb7a4f543 fix(#337): Sanitize OAuth callback error parameter to prevent open redirect
- Validate error against allowlist of OAuth error codes
- Unknown errors map to generic message
- Encode all URL parameters

Refs #337

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 15:58:14 -06:00
Jason Woltje
45a795d29e chore: Close MS-SEC-001 investigation - reporting anomaly confirmed
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Verified implementation: 276 lines (guard + tests + docs).
The 0.3K token usage was a reporting bug, not incomplete work.
2026-02-05 15:55:50 -06:00
Jason Woltje
6552edaa11 fix(#337): Add Zod validation for Redis deserialization
- Created Zod schemas for TaskState, AgentState, and OrchestratorEvent
- Added ValkeyValidationError class for detailed error context
- Validate task and agent state data after JSON.parse
- Validate events in subscribeToEvents handler
- Corrupted/tampered data now rejected with clear errors including:
  - Key name for context
  - Data snippet (truncated to 100 chars)
  - Underlying Zod validation error
- Prevents silent propagation of invalid data (SEC-ORCH-6)
- Added 20 new tests for validation scenarios

Refs #337

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 15:54:48 -06:00
Jason Woltje
6a4f58dc1c fix(#337): Replace blocking KEYS command with SCAN in Valkey client
- Use SCAN with cursor for non-blocking iteration
- Prevents Redis DoS under high key counts
- Same API, safer implementation

Refs #337

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 15:49:08 -06:00
Jason Woltje
6d6ef1d151 fix(#337): Add API key authentication for orchestrator-coordinator communication
- Add COORDINATOR_API_KEY config option to orchestrator.config.ts
- Include X-API-Key header in coordinator requests when configured
- Log security warning if COORDINATOR_API_KEY not configured in production
- Log security warning if coordinator URL uses HTTP in production
- Add tests verifying API key inclusion in requests and warning behavior

Refs #337
2026-02-05 15:46:03 -06:00
Jason Woltje
949d0d0ead fix(#337): Enable Docker sandbox by default and warn when disabled
- Sandbox now enabled by default for security
- Logs prominent warning when explicitly disabled
- Agents run in containers unless SANDBOX_ENABLED=false

Refs #337

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 15:43:00 -06:00
Jason Woltje
65df2bbdd3 feat: Bootstrap orchestrator learnings with investigation queue
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
MS-SEC-001 shows -98% variance (15K→0.3K) - flagged for investigation.
Possible causes: auth pre-existed, trivial decorator, or reporting error.
2026-02-05 15:40:35 -06:00
Jason Woltje
7e983e2455 fix(#337): Validate OIDC configuration at startup, fail fast if missing
- Add OIDC_ENABLED environment variable to control OIDC authentication
- Validate required OIDC env vars (OIDC_ISSUER, OIDC_CLIENT_ID, OIDC_CLIENT_SECRET)
  are present when OIDC is enabled
- Validate OIDC_ISSUER ends with trailing slash for correct discovery URL
- Throw descriptive error at startup if configuration is invalid
- Skip OIDC plugin registration when OIDC is disabled
- Add comprehensive tests for validation logic (17 test cases)

Refs #337

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 15:39:47 -06:00
Jason Woltje
e237c40482 fix(#337): Propagate database errors from guards instead of masking as access denied
SEC-API-2: WorkspaceGuard now propagates database errors as 500s instead of
returning "access denied". Only Prisma P2025 (record not found) is treated
as "user not a member".

SEC-API-3: PermissionGuard now propagates database errors as 500s instead of
returning null role (which caused permission denied). Only Prisma P2025 is
treated as "not a member".

This prevents connection timeouts, pool exhaustion, and other infrastructure
errors from being misreported to users as authorization failures.

Refs #337

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 15:35:11 -06:00
Jason Woltje
6bb9846cde fix(#337): Return error state from secret scanner on scan failures
- Add scanError field and scannedSuccessfully flag to SecretScanResult
- File read errors no longer falsely report as "clean"
- Callers can distinguish clean files from scan failures
- Update getScanSummary to track filesWithErrors count
- SecretsDetectedError now reports files that couldn't be scanned
- Add tests verifying error handling behavior for file access issues

Refs #337

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 15:30:06 -06:00
Jason Woltje
aa14b580b3 fix(#337): Sanitize HTML before wiki-link processing in WikiLinkRenderer
- Apply DOMPurify to entire HTML input before parseWikiLinks()
- Prevents stored XSS via knowledge entry content (SEC-WEB-2)
- Allow safe formatting tags (p, strong, em, etc.) but strip scripts, iframes, event handlers
- Update tests to reflect new sanitization behavior

Refs #337

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 15:25:57 -06:00
Jason Woltje
000145af96 fix(SEC-ORCH-2): Add API key authentication to orchestrator API
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Add OrchestratorApiKeyGuard to protect agent management endpoints (spawn,
kill, kill-all, status) from unauthorized access. Uses X-API-Key header
with constant-time comparison to prevent timing attacks.

- Create apps/orchestrator/src/common/guards/api-key.guard.ts
- Add comprehensive tests for all guard scenarios
- Apply guard to AgentsController (controller-level protection)
- Document ORCHESTRATOR_API_KEY in .env.example files
- Health endpoints remain unauthenticated for monitoring

Security: Prevents unauthorized users from draining API credits or
killing all agents via unprotected endpoints.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 15:18:15 -06:00
Jason Woltje
c74b6b13d1 chore: Start MS-SEC-001 (orchestrator API auth)
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
2026-02-05 15:14:19 -06:00
Jason Woltje
630f946718 chore(orchestrator): Bootstrap tasks.md from review report
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Parsed 124 findings into 44 tasks across 2 phases (critical + high).
Estimated total: ~400K tokens.

Issues created:
- #337: Phase 1 Critical Security (14 tasks)
- #338: Phase 2 High Priority (30 tasks)
- #339: Phase 3 Medium (deferred)
- #340: Phase 4 Low (deferred)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 15:13:48 -06:00
Jason Woltje
9dfbf8cf61 chore: Remove pre-created task files, add review reports
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
- Delete docs/tasks.md (let orchestrator bootstrap from scratch)
- Delete docs/claude/task-tracking.md (superseded by universal guide)
- Add codebase review reports for orchestrator to parse

Tests orchestrator's autonomous bootstrap capability.
2026-02-05 15:08:02 -06:00
Jason Woltje
b56bef0747 feat: Set up security remediation task tracking
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
- Update CLAUDE.md to point to universal orchestrator guide
- Add docs/tasks.md with 28 tasks across 4 phases:
  - Phase 1: Critical Security (MS-SEC-001 to MS-SEC-010)
  - Phase 2: High Security (MS-HIGH-001 to MS-HIGH-006)
  - Phase 3: Code Quality (MS-CQ-001 to MS-CQ-007)
  - Phase 4: Test Coverage (MS-TEST-001 to MS-TEST-005)
- Add project-specific task-tracking.md reference

Based on comprehensive codebase review (124 findings).
2026-02-05 14:58:52 -06:00
bbc211f56e Merge pull request 'feat(#329): Add usage budget management and cost governance' (#336) from feature/329-usage-budget into develop
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Reviewed-on: #336
2026-02-05 20:37:51 +00:00
6b63ca3e07 Merge branch 'develop' into feature/329-usage-budget
Some checks failed
ci/woodpecker/pr/woodpecker Pipeline failed
ci/woodpecker/push/woodpecker Pipeline failed
2026-02-05 20:37:17 +00:00
c22bde16cd Merge pull request 'feat(#101): Add Task Progress widget for orchestrator monitoring' (#335) from feature/101-task-progress-ui into develop
Some checks are pending
ci/woodpecker/push/woodpecker Pipeline is pending
Reviewed-on: #335
2026-02-05 19:33:41 +00:00
4e4454b0ca Merge branch 'develop' into feature/101-task-progress-ui
Some checks are pending
ci/woodpecker/push/woodpecker Pipeline is pending
ci/woodpecker/pr/woodpecker Pipeline is pending
2026-02-05 19:33:33 +00:00
670809afdb Merge pull request 'test(#229): Add performance test suite for orchestrator' (#334) from feature/229-performance-testing into develop
Some checks are pending
ci/woodpecker/push/woodpecker Pipeline is pending
Reviewed-on: #334
2026-02-05 19:33:16 +00:00
7bc37fc513 Merge branch 'develop' into feature/229-performance-testing
Some checks are pending
ci/woodpecker/push/woodpecker Pipeline is pending
ci/woodpecker/pr/woodpecker Pipeline is pending
2026-02-05 19:33:06 +00:00
dc4857b167 Merge pull request 'docs(#230): Comprehensive orchestrator documentation' (#333) from feature/230-documentation into develop
Some checks are pending
ci/woodpecker/push/woodpecker Pipeline is pending
Reviewed-on: #333
2026-02-05 19:32:55 +00:00
8f2afcd022 Merge branch 'develop' into feature/230-documentation
Some checks failed
ci/woodpecker/pr/woodpecker Pipeline is pending
ci/woodpecker/push/woodpecker Pipeline failed
2026-02-05 19:32:40 +00:00
0f0488856f Merge pull request 'test(#226,#227,#228): Add E2E integration tests for agent orchestration' (#332) from feature/226-e2e-agent-lifecycle into develop
Some checks are pending
ci/woodpecker/push/woodpecker Pipeline is pending
Reviewed-on: #332
2026-02-05 19:32:31 +00:00
a8828cb53e Merge branch 'develop' into feature/226-e2e-agent-lifecycle
Some checks failed
ci/woodpecker/pr/woodpecker Pipeline failed
ci/woodpecker/push/woodpecker Pipeline failed
2026-02-05 19:32:23 +00:00
25bed45411 Merge pull request '[ORCH-134] Update root documentation' (#331) from feature/235-update-root-docs into develop
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Reviewed-on: #331
2026-02-05 19:32:15 +00:00
02cd6d4815 Merge branch 'develop' into feature/235-update-root-docs
Some checks failed
ci/woodpecker/pr/woodpecker Pipeline failed
ci/woodpecker/push/woodpecker Pipeline failed
2026-02-05 19:32:09 +00:00
9e89fa320a Merge pull request '[ORCH-132] Connect agent dashboard to real API' (#330) from feature/233-agent-dashboard-api into develop
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Reviewed-on: #330
2026-02-05 19:32:00 +00:00
Jason Woltje
c68b541b6f fix(#226): Remediate code review findings for E2E tests
Some checks failed
ci/woodpecker/pr/woodpecker Pipeline failed
ci/woodpecker/push/woodpecker Pipeline failed
- Fix CRITICAL: Remove unused imports (Test, TestingModule, CleanupService)
- Fix CRITICAL: Remove unused mockValkeyService declaration
- Fix IMPORTANT: Rename misleading test describe/names to match actual behavior
- Fix IMPORTANT: Verify spawned agents exist before kill-all assertion

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 13:26:21 -06:00
Jason Woltje
5a0f090cc5 fix(#230): Correct documentation errors from code review
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
ci/woodpecker/pr/woodpecker Pipeline failed
- Fix CRITICAL: Correct 5 environment variable names to match actual config
  (VALKEY_HOST not ORCHESTRATOR_VALKEY_HOST, CLAUDE_API_KEY not ORCHESTRATOR_CLAUDE_API_KEY, etc.)
- Fix CRITICAL: Correct quality gate profiles table to match actual gate-config service
  (minimal = tests only, not typecheck+lint; add agent type defaults)
- Fix IMPORTANT: Add missing gateProfile optional field to spawn request docs

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 13:24:54 -06:00
Jason Woltje
0796cbc744 fix(#229): Remediate code review findings for performance tests
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
ci/woodpecker/pr/woodpecker Pipeline failed
- Fix CRITICAL: Increase single-spawn threshold from 10ms to 50ms (CI flakiness)
- Fix CRITICAL: Replace no-op validation test with real backoff scale tests
- Fix IMPORTANT: Add warmup iterations before all timed measurements
- Fix IMPORTANT: Increase scan position ratio tolerance to 10x for sub-ms noise
- Refactored queue perf tests to use actual service methods (calculateBackoffDelay)
- Helper function to reduce spawn request duplication

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 13:23:19 -06:00
Jason Woltje
92ae8097df fix(#101): Remediate code review findings for TaskProgressWidget
Some checks failed
ci/woodpecker/pr/woodpecker Pipeline failed
ci/woodpecker/push/woodpecker Pipeline failed
- Fix CRITICAL: Replace .sort() state mutation with [...tasks].sort()
- Fix CRITICAL: Replace PDA-unfriendly red colors with calm amber tones
- Fix IMPORTANT: Add TaskProgressWidget + ActiveProjectsWidget to WidgetComponentType
- Fix IMPORTANT: Add tests for interval cleanup, HTTP error responses, slice limit
- 3 new tests added (10 total)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 13:19:57 -06:00
Jason Woltje
2cb3fe8f5a fix(#329): Harden BudgetService against security review findings
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
ci/woodpecker/pr/woodpecker Pipeline failed
- Fix CRITICAL: Unbounded memory growth via daily record purging
- Fix CRITICAL: Negative/NaN/Infinity token bypass via input clamping
- Fix HIGH: TOCTOU race via atomic trySpawnAgent() method
- Fix HIGH: Phantom agent leak via Set<string> ID tracking (not counter)
- Fix HIGH: isAgentOverBudget now scoped to today only
- Fix HIGH: Config validation clamps invalid values to safe defaults
- Fix MEDIUM: Wire BudgetModule into AppModule
- Fix MEDIUM: Sanitize agentId in log output to prevent log injection
- Fix MEDIUM: Use Date objects for timezone-safe comparisons
- Fix MEDIUM: Reject empty agentId/taskId in recordUsage
- Add tests for negative tokens, NaN, Infinity, empty IDs, config edge cases

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 13:15:33 -06:00
Jason Woltje
22dc964503 feat(#329): Add usage budget management and cost governance
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
ci/woodpecker/pr/woodpecker Pipeline failed
Implement BudgetService for tracking and enforcing agent usage limits:
- Daily token limit tracking (default 10M tokens)
- Per-agent token limit enforcement (default 2M tokens)
- Maximum concurrent agent cap (default 10)
- Task duration limits (default 120 minutes)
- Hard/soft limit enforcement modes
- Real-time usage summaries with budget status
  (within_budget/approaching_limit/at_limit/exceeded)
- Per-agent usage breakdown with percentage calculations

Includes BudgetModule for NestJS DI and 23 unit tests.

Fixes #329

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 13:00:26 -06:00
Jason Woltje
e7f277ff0c feat(#101): Add Task Progress widget for orchestrator task monitoring
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
ci/woodpecker/pr/woodpecker Pipeline failed
Create TaskProgressWidget showing live agent task execution progress:
- Fetches from orchestrator /agents API with 15s auto-refresh
- Shows stats (total/active/done/stopped), sorted task list
- Agent type badges (worker/reviewer/tester)
- Elapsed time tracking, error display
- Dark mode support, PDA-friendly language
- Registered in WidgetRegistry for dashboard use

Includes 7 unit tests covering all states.

Fixes #101

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 12:57:10 -06:00
Jason Woltje
b93f4c59ce test(#229): Add performance test suite for orchestrator
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
ci/woodpecker/pr/woodpecker Pipeline failed
Add 14 performance benchmarks across 3 test files:
- Spawner throughput: single/sequential/concurrent spawn latency,
  session lookup, list performance, memory efficiency
- Queue service: backoff calculation throughput, validation perf
- Secret scanner: content scanning throughput, pattern scalability

Adds test:perf script to package.json.

Fixes #229

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 12:52:30 -06:00
Jason Woltje
751005391b docs(#230): Comprehensive orchestrator documentation
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
ci/woodpecker/pr/woodpecker Pipeline failed
Update README with complete API reference, module architecture tree,
service catalog, Valkey state keys, quality gate profiles, and
configuration reference.

Fixes #230

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 12:49:54 -06:00
Jason Woltje
c8c81fc437 test(#226,#227,#228): Add E2E integration tests for agent orchestration
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
ci/woodpecker/pr/woodpecker Pipeline failed
Add comprehensive E2E test suites covering:
- Full agent lifecycle (spawn → running → completed/failed) - 7 tests
- Killswitch emergency stop mechanism (single/all/partial) - 5 tests
- Concurrent agent spawning and isolation - 5 tests

Includes vitest config for integration test runner with 30s timeout.

Fixes #226
Fixes #227
Fixes #228

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 12:46:44 -06:00
Jason Woltje
dd954ffee3 docs(#235): Update README with orchestration layer information
Some checks failed
ci/woodpecker/pr/woodpecker Pipeline failed
ci/woodpecker/push/woodpecker Pipeline failed
- Add orchestrator and coordinator to deployment list
- Update project structure with agent orchestration apps
- Add Agent Orchestration Layer section with architecture overview
- Update implementation status to reflect M6 milestone completion
- Document test coverage (2168+ tests passing)

Fixes #235

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-05 12:33:43 -06:00
Jason Woltje
27bbbe79df feat(#233): Connect agent dashboard to real orchestrator API
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
ci/woodpecker/pr/woodpecker Pipeline failed
- Add GET /agents endpoint to orchestrator controller
- Update AgentStatusWidget to fetch from real API instead of mock data
- Add comprehensive tests for listAgents endpoint
- Auto-refresh agent list every 30 seconds
- Display agent status with proper icons and formatting
- Show error states when API is unavailable

Fixes #233

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-05 12:31:07 -06:00
Jason Woltje
06fa8f7402 chore: Remove old QA reports and milestone status files
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Remove 661 outdated files:
- 634 QA automation reports from docs/reports/qa-automation/
- 27 old milestone completion and status tracking files

Preserved core documentation structure and active project reports.
2026-02-05 11:25:00 -06:00
Jason Woltje
6de631cd07 feat(#313): Implement FastAPI and agent tracing instrumentation
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Add comprehensive OpenTelemetry distributed tracing to the coordinator
FastAPI service with automatic request tracing and custom decorators.

Implementation:
- Created src/telemetry.py: OTEL SDK initialization with OTLP exporter
- Created src/tracing_decorators.py: @trace_agent_operation and
  @trace_tool_execution decorators with sync/async support
- Integrated FastAPI auto-instrumentation in src/main.py
- Added tracing to coordinator operations in src/coordinator.py
- Environment-based configuration (OTEL_ENABLED, endpoint, sampling)

Features:
- Automatic HTTP request/response tracing via FastAPIInstrumentor
- Custom span enrichment with agent context (issue_id, agent_type)
- Graceful degradation when telemetry disabled
- Proper exception recording and status management
- Resource attributes (service.name, service.version, deployment.env)
- Configurable sampling ratio (0.0-1.0, defaults to 1.0)

Testing:
- 25 comprehensive tests (17 telemetry, 8 decorators)
- Coverage: 90-91% (exceeds 85% requirement)
- All tests passing, no regressions

Quality:
- Zero linting errors (ruff)
- Zero type checking errors (mypy)
- Security review approved (no vulnerabilities)
- Follows OTEL semantic conventions
- Proper error handling and resource cleanup

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-04 14:25:48 -06:00
Jason Woltje
b836940b89 feat(#309): Add LLM usage tracking and analytics
Implements comprehensive LLM usage tracking with analytics endpoints.

Implementation:
- Added LlmUsageLog model to Prisma schema
- Created llm-usage module with service, controller, and DTOs
- Added tracking for token usage, costs, and durations
- Implemented analytics aggregation by provider, model, and task type
- Added filtering by workspace, provider, model, user, and date range

Testing:
- 20 unit tests with 90.8% coverage (exceeds 85% requirement)
- Tests for service and controller with full error handling
- Tests use Vitest following project conventions

API Endpoints:
- GET /api/llm-usage/analytics - Aggregated usage analytics
- GET /api/llm-usage/by-workspace/:workspaceId - Workspace usage logs
- GET /api/llm-usage/by-workspace/:workspaceId/provider/:provider - Provider logs
- GET /api/llm-usage/by-workspace/:workspaceId/model/:model - Model logs

Database:
- LlmUsageLog table with indexes for efficient queries
- Relations to User, Workspace, and LlmProviderInstance
- Ready for migration with: pnpm prisma migrate dev

Refs #309

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-04 13:41:45 -06:00