Commit Graph

289 Commits

Author SHA1 Message Date
Jason Woltje
1c79da70a6 fix(#338): Handle non-OK responses in ActiveProjectsWidget
- Add error state tracking for both projects and agents API calls
- Show error UI (amber alert icon + message) when fetch fails
- Clear data on error to avoid showing stale information
- Added tests for error handling: API failures, network errors

Refs #338

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 17:50:18 -06:00
Jason Woltje
1a15c12c56 fix(#338): Implement optimistic rollback on Kanban drag-drop errors
- Store previous state before PATCH request
- Apply optimistic update immediately on drag
- Rollback UI to original position on API error
- Show error toast notification on failure
- Add comprehensive tests for optimistic updates and rollback

Refs #338

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 17:45:26 -06:00
Jason Woltje
dd46025d60 fix(#338): Enforce WSS in production and add connect_error handling
- Add validateWebSocketSecurity() to warn when using ws:// in production
- Add connect_error event handler to capture connection failures
- Expose connectionError state to consumers via hook and provider
- Add comprehensive tests for WSS enforcement and error handling

Refs #338

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 17:31:26 -06:00
Jason Woltje
63a622cbef fix(#338): Log auth errors and distinguish backend down from logged out
- Add error logging for auth check failures in development mode
- Distinguish network/backend errors from normal unauthenticated state
- Expose authError state to UI (network | backend | null)
- Add comprehensive tests for error handling scenarios

Refs #338

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 17:23:07 -06:00
Jason Woltje
587272e2d0 fix(#338): Gate mock data behind NODE_ENV check
- Create ComingSoon component for production placeholders
- Federation connections page shows Coming Soon in production
- Workspaces settings page shows Coming Soon in production
- Teams page shows Coming Soon in production
- Add comprehensive tests for environment-based rendering

Refs #338

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 17:15:35 -06:00
Jason Woltje
344e5df3bb fix(#338): Route all state-changing fetch() calls through API client
- Replace raw fetch() with apiPost/apiPatch/apiDelete in:
  - ImportExportActions.tsx: POST for file imports
  - KanbanBoard.tsx: PATCH for task status updates
  - ActiveProjectsWidget.tsx: POST for widget data fetches
  - useLayouts.ts: POST/PATCH/DELETE for layout management
- Add apiPostFormData() method to API client for FormData uploads
- Ensures CSRF token is included in all state-changing requests
- Update tests to mock CSRF token fetch for API client usage

Refs #338

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 17:06:23 -06:00
Jason Woltje
5ae07f7a84 fix(#338): Validate DEFAULT_WORKSPACE_ID as UUID
- Add federation.config.ts with UUID v4 validation for DEFAULT_WORKSPACE_ID
- Validate at module initialization (fail fast if misconfigured)
- Replace hardcoded "default" fallback with proper validation
- Add 18 tests covering valid UUIDs, invalid formats, and missing values
- Clear error messages with expected UUID format

Refs #338

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 16:55:48 -06:00
Jason Woltje
970cc9f606 fix(#338): Add rate limiting and logging to auth catch-all route
- Apply restrictive rate limits (10 req/min) to prevent brute-force attacks
- Log requests with path and client IP for monitoring and debugging
- Extract client IP handling for proxy setups (X-Forwarded-For)
- Add comprehensive tests for rate limiting and logging behavior

Refs #338
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 16:49:06 -06:00
Jason Woltje
06de72a355 fix(#338): Implement proper system admin role separate from workspace ownership
- Replace workspace ownership check with explicit SYSTEM_ADMIN_IDS env var
- System admin access is now explicit and configurable via environment
- Workspace owners no longer automatically get system admin privileges
- Add 15 unit tests verifying security separation
- Add SYSTEM_ADMIN_IDS documentation to .env.example

Refs #338

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 16:44:50 -06:00
Jason Woltje
7ae92f3e1c fix(#338): Log ERROR on rate limiter fallback and track degraded mode
- Log at ERROR level when falling back to in-memory storage
- Track and expose degraded mode status for health checks
- Add isUsingFallback() method to check fallback state
- Add getHealthStatus() method for health check endpoints
- Add comprehensive tests for fallback behavior and health status

Refs #338

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 16:39:55 -06:00
Jason Woltje
7390cac2cc fix(#338): Bind CSRF token to user session with HMAC
- Token now includes HMAC binding to session ID
- Validates session binding on verification
- Adds CSRF_SECRET configuration requirement
- Requires authentication for CSRF token endpoint
- 51 new tests covering session binding security

Security: CSRF tokens are now cryptographically tied to user sessions,
preventing token reuse across sessions and mitigating session fixation
attacks.

Token format: {random_part}:{hmac(random_part + user_id, secret)}

Refs #338

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 16:33:22 -06:00
Jason Woltje
7f3cd17488 fix(#338): Add structured logging for embedding failures
- Replace console.error with NestJS Logger
- Include entry ID and workspace ID in error context
- Easier to track and debug embedding issues

Refs #338

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 16:26:30 -06:00
Jason Woltje
6c88e2b96d fix(#338): Don't instantiate OpenAI client with missing API key
- Skip client initialization when OPENAI_API_KEY not configured
- Set openai property to null instead of creating with dummy key
- Methods return gracefully when embeddings not available
- Updated tests to verify client is not instantiated without key

Refs #338

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 16:21:17 -06:00
Jason Woltje
8d542609ff test(#337): Add workspaceId verification tests for multi-tenant isolation
- Verify tasks.service includes workspaceId in all queries
- Verify knowledge.service includes workspaceId in all queries
- Verify projects.service includes workspaceId in all queries
- Verify events.service includes workspaceId in all queries
- Add 39 tests covering create, findAll, findOne, update, remove operations
- Document security concern: findAll accepts empty query without workspaceId
- Ensures tenant isolation is maintained at query level

Refs #337

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 16:14:46 -06:00
Jason Woltje
3055bd2d85 fix(#337): Fix boolean logic bug in ReactFlowEditor (use || instead of ??)
- Nullish coalescing (??) doesn't work with booleans as expected
- When readOnly=false, ?? never evaluates right side (!selectedNode)
- Changed to logical OR (||) for correct disabled state calculation
- Added comprehensive tests verifying the fix:
  * readOnly=false with no selection: editing disabled
  * readOnly=false with selection: editing enabled
  * readOnly=true: editing always disabled
- Removed unused eslint-disable directive

Refs #337

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 16:08:55 -06:00
Jason Woltje
c30b4b1cc2 fix(#337): Replace hardcoded OIDC values in federation with env vars
- Use OIDC_ISSUER and OIDC_CLIENT_ID from environment for JWT validation
- Federation OIDC properly configured from environment variables
- Fail fast with clear error when OIDC config is missing
- Handle trailing slash normalization for issuer URL
- Add tests verifying env var usage and missing config error handling

Refs #337

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 16:03:09 -06:00
Jason Woltje
7cb7a4f543 fix(#337): Sanitize OAuth callback error parameter to prevent open redirect
- Validate error against allowlist of OAuth error codes
- Unknown errors map to generic message
- Encode all URL parameters

Refs #337

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 15:58:14 -06:00
Jason Woltje
6552edaa11 fix(#337): Add Zod validation for Redis deserialization
- Created Zod schemas for TaskState, AgentState, and OrchestratorEvent
- Added ValkeyValidationError class for detailed error context
- Validate task and agent state data after JSON.parse
- Validate events in subscribeToEvents handler
- Corrupted/tampered data now rejected with clear errors including:
  - Key name for context
  - Data snippet (truncated to 100 chars)
  - Underlying Zod validation error
- Prevents silent propagation of invalid data (SEC-ORCH-6)
- Added 20 new tests for validation scenarios

Refs #337

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 15:54:48 -06:00
Jason Woltje
6a4f58dc1c fix(#337): Replace blocking KEYS command with SCAN in Valkey client
- Use SCAN with cursor for non-blocking iteration
- Prevents Redis DoS under high key counts
- Same API, safer implementation

Refs #337

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 15:49:08 -06:00
Jason Woltje
6d6ef1d151 fix(#337): Add API key authentication for orchestrator-coordinator communication
- Add COORDINATOR_API_KEY config option to orchestrator.config.ts
- Include X-API-Key header in coordinator requests when configured
- Log security warning if COORDINATOR_API_KEY not configured in production
- Log security warning if coordinator URL uses HTTP in production
- Add tests verifying API key inclusion in requests and warning behavior

Refs #337
2026-02-05 15:46:03 -06:00
Jason Woltje
949d0d0ead fix(#337): Enable Docker sandbox by default and warn when disabled
- Sandbox now enabled by default for security
- Logs prominent warning when explicitly disabled
- Agents run in containers unless SANDBOX_ENABLED=false

Refs #337

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 15:43:00 -06:00
Jason Woltje
7e983e2455 fix(#337): Validate OIDC configuration at startup, fail fast if missing
- Add OIDC_ENABLED environment variable to control OIDC authentication
- Validate required OIDC env vars (OIDC_ISSUER, OIDC_CLIENT_ID, OIDC_CLIENT_SECRET)
  are present when OIDC is enabled
- Validate OIDC_ISSUER ends with trailing slash for correct discovery URL
- Throw descriptive error at startup if configuration is invalid
- Skip OIDC plugin registration when OIDC is disabled
- Add comprehensive tests for validation logic (17 test cases)

Refs #337

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 15:39:47 -06:00
Jason Woltje
e237c40482 fix(#337): Propagate database errors from guards instead of masking as access denied
SEC-API-2: WorkspaceGuard now propagates database errors as 500s instead of
returning "access denied". Only Prisma P2025 (record not found) is treated
as "user not a member".

SEC-API-3: PermissionGuard now propagates database errors as 500s instead of
returning null role (which caused permission denied). Only Prisma P2025 is
treated as "not a member".

This prevents connection timeouts, pool exhaustion, and other infrastructure
errors from being misreported to users as authorization failures.

Refs #337

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 15:35:11 -06:00
Jason Woltje
6bb9846cde fix(#337): Return error state from secret scanner on scan failures
- Add scanError field and scannedSuccessfully flag to SecretScanResult
- File read errors no longer falsely report as "clean"
- Callers can distinguish clean files from scan failures
- Update getScanSummary to track filesWithErrors count
- SecretsDetectedError now reports files that couldn't be scanned
- Add tests verifying error handling behavior for file access issues

Refs #337

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 15:30:06 -06:00
Jason Woltje
aa14b580b3 fix(#337): Sanitize HTML before wiki-link processing in WikiLinkRenderer
- Apply DOMPurify to entire HTML input before parseWikiLinks()
- Prevents stored XSS via knowledge entry content (SEC-WEB-2)
- Allow safe formatting tags (p, strong, em, etc.) but strip scripts, iframes, event handlers
- Update tests to reflect new sanitization behavior

Refs #337

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 15:25:57 -06:00
Jason Woltje
000145af96 fix(SEC-ORCH-2): Add API key authentication to orchestrator API
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Add OrchestratorApiKeyGuard to protect agent management endpoints (spawn,
kill, kill-all, status) from unauthorized access. Uses X-API-Key header
with constant-time comparison to prevent timing attacks.

- Create apps/orchestrator/src/common/guards/api-key.guard.ts
- Add comprehensive tests for all guard scenarios
- Apply guard to AgentsController (controller-level protection)
- Document ORCHESTRATOR_API_KEY in .env.example files
- Health endpoints remain unauthenticated for monitoring

Security: Prevents unauthorized users from draining API credits or
killing all agents via unprotected endpoints.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 15:18:15 -06:00
6b63ca3e07 Merge branch 'develop' into feature/329-usage-budget
Some checks failed
ci/woodpecker/pr/woodpecker Pipeline failed
ci/woodpecker/push/woodpecker Pipeline failed
2026-02-05 20:37:17 +00:00
4e4454b0ca Merge branch 'develop' into feature/101-task-progress-ui
Some checks are pending
ci/woodpecker/push/woodpecker Pipeline is pending
ci/woodpecker/pr/woodpecker Pipeline is pending
2026-02-05 19:33:33 +00:00
7bc37fc513 Merge branch 'develop' into feature/229-performance-testing
Some checks are pending
ci/woodpecker/push/woodpecker Pipeline is pending
ci/woodpecker/pr/woodpecker Pipeline is pending
2026-02-05 19:33:06 +00:00
8f2afcd022 Merge branch 'develop' into feature/230-documentation
Some checks failed
ci/woodpecker/pr/woodpecker Pipeline is pending
ci/woodpecker/push/woodpecker Pipeline failed
2026-02-05 19:32:40 +00:00
a8828cb53e Merge branch 'develop' into feature/226-e2e-agent-lifecycle
Some checks failed
ci/woodpecker/pr/woodpecker Pipeline failed
ci/woodpecker/push/woodpecker Pipeline failed
2026-02-05 19:32:23 +00:00
Jason Woltje
c68b541b6f fix(#226): Remediate code review findings for E2E tests
Some checks failed
ci/woodpecker/pr/woodpecker Pipeline failed
ci/woodpecker/push/woodpecker Pipeline failed
- Fix CRITICAL: Remove unused imports (Test, TestingModule, CleanupService)
- Fix CRITICAL: Remove unused mockValkeyService declaration
- Fix IMPORTANT: Rename misleading test describe/names to match actual behavior
- Fix IMPORTANT: Verify spawned agents exist before kill-all assertion

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 13:26:21 -06:00
Jason Woltje
5a0f090cc5 fix(#230): Correct documentation errors from code review
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
ci/woodpecker/pr/woodpecker Pipeline failed
- Fix CRITICAL: Correct 5 environment variable names to match actual config
  (VALKEY_HOST not ORCHESTRATOR_VALKEY_HOST, CLAUDE_API_KEY not ORCHESTRATOR_CLAUDE_API_KEY, etc.)
- Fix CRITICAL: Correct quality gate profiles table to match actual gate-config service
  (minimal = tests only, not typecheck+lint; add agent type defaults)
- Fix IMPORTANT: Add missing gateProfile optional field to spawn request docs

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 13:24:54 -06:00
Jason Woltje
0796cbc744 fix(#229): Remediate code review findings for performance tests
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
ci/woodpecker/pr/woodpecker Pipeline failed
- Fix CRITICAL: Increase single-spawn threshold from 10ms to 50ms (CI flakiness)
- Fix CRITICAL: Replace no-op validation test with real backoff scale tests
- Fix IMPORTANT: Add warmup iterations before all timed measurements
- Fix IMPORTANT: Increase scan position ratio tolerance to 10x for sub-ms noise
- Refactored queue perf tests to use actual service methods (calculateBackoffDelay)
- Helper function to reduce spawn request duplication

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 13:23:19 -06:00
Jason Woltje
92ae8097df fix(#101): Remediate code review findings for TaskProgressWidget
Some checks failed
ci/woodpecker/pr/woodpecker Pipeline failed
ci/woodpecker/push/woodpecker Pipeline failed
- Fix CRITICAL: Replace .sort() state mutation with [...tasks].sort()
- Fix CRITICAL: Replace PDA-unfriendly red colors with calm amber tones
- Fix IMPORTANT: Add TaskProgressWidget + ActiveProjectsWidget to WidgetComponentType
- Fix IMPORTANT: Add tests for interval cleanup, HTTP error responses, slice limit
- 3 new tests added (10 total)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 13:19:57 -06:00
Jason Woltje
2cb3fe8f5a fix(#329): Harden BudgetService against security review findings
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
ci/woodpecker/pr/woodpecker Pipeline failed
- Fix CRITICAL: Unbounded memory growth via daily record purging
- Fix CRITICAL: Negative/NaN/Infinity token bypass via input clamping
- Fix HIGH: TOCTOU race via atomic trySpawnAgent() method
- Fix HIGH: Phantom agent leak via Set<string> ID tracking (not counter)
- Fix HIGH: isAgentOverBudget now scoped to today only
- Fix HIGH: Config validation clamps invalid values to safe defaults
- Fix MEDIUM: Wire BudgetModule into AppModule
- Fix MEDIUM: Sanitize agentId in log output to prevent log injection
- Fix MEDIUM: Use Date objects for timezone-safe comparisons
- Fix MEDIUM: Reject empty agentId/taskId in recordUsage
- Add tests for negative tokens, NaN, Infinity, empty IDs, config edge cases

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 13:15:33 -06:00
Jason Woltje
22dc964503 feat(#329): Add usage budget management and cost governance
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
ci/woodpecker/pr/woodpecker Pipeline failed
Implement BudgetService for tracking and enforcing agent usage limits:
- Daily token limit tracking (default 10M tokens)
- Per-agent token limit enforcement (default 2M tokens)
- Maximum concurrent agent cap (default 10)
- Task duration limits (default 120 minutes)
- Hard/soft limit enforcement modes
- Real-time usage summaries with budget status
  (within_budget/approaching_limit/at_limit/exceeded)
- Per-agent usage breakdown with percentage calculations

Includes BudgetModule for NestJS DI and 23 unit tests.

Fixes #329

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 13:00:26 -06:00
Jason Woltje
e7f277ff0c feat(#101): Add Task Progress widget for orchestrator task monitoring
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
ci/woodpecker/pr/woodpecker Pipeline failed
Create TaskProgressWidget showing live agent task execution progress:
- Fetches from orchestrator /agents API with 15s auto-refresh
- Shows stats (total/active/done/stopped), sorted task list
- Agent type badges (worker/reviewer/tester)
- Elapsed time tracking, error display
- Dark mode support, PDA-friendly language
- Registered in WidgetRegistry for dashboard use

Includes 7 unit tests covering all states.

Fixes #101

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 12:57:10 -06:00
Jason Woltje
b93f4c59ce test(#229): Add performance test suite for orchestrator
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
ci/woodpecker/pr/woodpecker Pipeline failed
Add 14 performance benchmarks across 3 test files:
- Spawner throughput: single/sequential/concurrent spawn latency,
  session lookup, list performance, memory efficiency
- Queue service: backoff calculation throughput, validation perf
- Secret scanner: content scanning throughput, pattern scalability

Adds test:perf script to package.json.

Fixes #229

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 12:52:30 -06:00
Jason Woltje
751005391b docs(#230): Comprehensive orchestrator documentation
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
ci/woodpecker/pr/woodpecker Pipeline failed
Update README with complete API reference, module architecture tree,
service catalog, Valkey state keys, quality gate profiles, and
configuration reference.

Fixes #230

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 12:49:54 -06:00
Jason Woltje
c8c81fc437 test(#226,#227,#228): Add E2E integration tests for agent orchestration
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
ci/woodpecker/pr/woodpecker Pipeline failed
Add comprehensive E2E test suites covering:
- Full agent lifecycle (spawn → running → completed/failed) - 7 tests
- Killswitch emergency stop mechanism (single/all/partial) - 5 tests
- Concurrent agent spawning and isolation - 5 tests

Includes vitest config for integration test runner with 30s timeout.

Fixes #226
Fixes #227
Fixes #228

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 12:46:44 -06:00
Jason Woltje
27bbbe79df feat(#233): Connect agent dashboard to real orchestrator API
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
ci/woodpecker/pr/woodpecker Pipeline failed
- Add GET /agents endpoint to orchestrator controller
- Update AgentStatusWidget to fetch from real API instead of mock data
- Add comprehensive tests for listAgents endpoint
- Auto-refresh agent list every 30 seconds
- Display agent status with proper icons and formatting
- Show error states when API is unavailable

Fixes #233

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-05 12:31:07 -06:00
Jason Woltje
6de631cd07 feat(#313): Implement FastAPI and agent tracing instrumentation
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Add comprehensive OpenTelemetry distributed tracing to the coordinator
FastAPI service with automatic request tracing and custom decorators.

Implementation:
- Created src/telemetry.py: OTEL SDK initialization with OTLP exporter
- Created src/tracing_decorators.py: @trace_agent_operation and
  @trace_tool_execution decorators with sync/async support
- Integrated FastAPI auto-instrumentation in src/main.py
- Added tracing to coordinator operations in src/coordinator.py
- Environment-based configuration (OTEL_ENABLED, endpoint, sampling)

Features:
- Automatic HTTP request/response tracing via FastAPIInstrumentor
- Custom span enrichment with agent context (issue_id, agent_type)
- Graceful degradation when telemetry disabled
- Proper exception recording and status management
- Resource attributes (service.name, service.version, deployment.env)
- Configurable sampling ratio (0.0-1.0, defaults to 1.0)

Testing:
- 25 comprehensive tests (17 telemetry, 8 decorators)
- Coverage: 90-91% (exceeds 85% requirement)
- All tests passing, no regressions

Quality:
- Zero linting errors (ruff)
- Zero type checking errors (mypy)
- Security review approved (no vulnerabilities)
- Follows OTEL semantic conventions
- Proper error handling and resource cleanup

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-04 14:25:48 -06:00
Jason Woltje
b836940b89 feat(#309): Add LLM usage tracking and analytics
Implements comprehensive LLM usage tracking with analytics endpoints.

Implementation:
- Added LlmUsageLog model to Prisma schema
- Created llm-usage module with service, controller, and DTOs
- Added tracking for token usage, costs, and durations
- Implemented analytics aggregation by provider, model, and task type
- Added filtering by workspace, provider, model, user, and date range

Testing:
- 20 unit tests with 90.8% coverage (exceeds 85% requirement)
- Tests for service and controller with full error handling
- Tests use Vitest following project conventions

API Endpoints:
- GET /api/llm-usage/analytics - Aggregated usage analytics
- GET /api/llm-usage/by-workspace/:workspaceId - Workspace usage logs
- GET /api/llm-usage/by-workspace/:workspaceId/provider/:provider - Provider logs
- GET /api/llm-usage/by-workspace/:workspaceId/model/:model - Model logs

Database:
- LlmUsageLog table with indexes for efficient queries
- Relations to User, Workspace, and LlmProviderInstance
- Ready for migration with: pnpm prisma migrate dev

Refs #309

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-04 13:41:45 -06:00
Jason Woltje
6516843612 feat(#312): Implement core OpenTelemetry infrastructure
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Complete the telemetry module with all acceptance criteria:

- Add service.version resource attribute from package.json
- Add deployment.environment resource attribute from env vars
- Add trace sampling configuration with OTEL_TRACES_SAMPLER_ARG
- Implement ParentBasedSampler for consistent distributed tracing
- Add comprehensive tests for SpanContextService (15 tests)
- Add comprehensive tests for LlmTelemetryDecorator (29 tests)
- Fix type safety issues (JSON.parse typing, template literals)
- Add security linter exception for package.json read

Test coverage: 74 tests passing, 85%+ coverage on telemetry module.

Fixes #312

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 12:52:20 -06:00
Jason Woltje
5d683d401e fix(#121): Remediate security issues from ORCH-121 review
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Priority Fixes (Required Before Production):

H3: Add rate limiting to webhook endpoint
- Added slowapi library for FastAPI rate limiting
- Implemented per-IP rate limiting (100 req/min) on webhook endpoint
- Added global rate limiting support via slowapi

M4: Add subprocess timeouts to all gates
- Added timeout=300 (5 minutes) to all subprocess.run() calls in gates
- Implemented proper TimeoutExpired exception handling
- Removed dead CalledProcessError handlers (check=False makes them unreachable)

M2: Add input validation on QualityCheckRequest
- Validate files array size (max 1000 files)
- Validate file paths (no path traversal, no null bytes, no absolute paths)
- Validate diff summary size (max 10KB)
- Validate taskId and agentId format (non-empty)

Additional Fixes:

H1: Fix coverage.json path resolution
- Use absolute paths resolved from project root
- Validate path is within project boundaries (prevent path traversal)

Code Review Cleanup:
- Moved imports to module level in quality_orchestrator.py
- Refactored mock detection logic into separate helper methods
- Removed dead subprocess.CalledProcessError exception handlers from all gates

Testing:
- Added comprehensive tests for all security fixes
- All 339 coordinator tests pass
- All 447 orchestrator tests pass
- Followed TDD principles (RED-GREEN-REFACTOR)

Security Impact:
- Prevents webhook DoS attacks via rate limiting
- Prevents hung processes via subprocess timeouts
- Prevents path traversal attacks via input validation
- Prevents malformed input attacks via comprehensive validation

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-04 11:50:05 -06:00
3a98b78661 fix: Complete CSRF protection implementation
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
ci/woodpecker/pr/woodpecker Pipeline failed
Closes three CSRF security gaps identified in code review:

1. Added X-CSRF-Token and X-Workspace-Id to CORS allowed headers
   - Updated apps/api/src/main.ts to accept CSRF token headers

2. Integrated CSRF token handling in web client
   - Added fetchCsrfToken() to fetch token from API
   - Store token in memory (not localStorage for security)
   - Automatically include X-CSRF-Token in POST/PUT/PATCH/DELETE
   - Implement automatic token refresh on 403 CSRF errors
   - Added comprehensive test coverage for CSRF functionality

3. Applied CSRF Guard globally
   - Added CsrfGuard as APP_GUARD in app.module.ts
   - Verified @SkipCsrf() decorator works for exempted endpoints

All tests passing. CSRF protection now enforced application-wide.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-04 07:12:42 -06:00
e57271c278 fix(#201): Enhance WikiLink XSS protection with comprehensive validation
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Added defense-in-depth security layers for wiki-link rendering:

Slug Validation (isValidWikiLinkSlug):
- Reject empty slugs
- Block dangerous protocols: javascript:, data:, vbscript:, file:, about:, blob:
- Block URL-encoded dangerous protocols (e.g., %6A%61%76%61... = javascript)
- Block HTML tags in slugs
- Block HTML entities in slugs
- Only allow safe characters: a-z, A-Z, 0-9, -, _, ., /

Display Text Sanitization (DOMPurify):
- Strip all HTML tags from display text
- ALLOWED_TAGS: [] (no HTML allowed)
- KEEP_CONTENT: true (preserves text content)
- Prevents event handler injection
- Prevents iframe/object/embed injection

Comprehensive XSS Testing:
- 11 new attack vector tests
- javascript: URLs - blocked
- data: URLs - blocked
- vbscript: URLs - blocked
- Event handlers (onerror, onclick) - removed
- iframe/object/embed - removed
- SVG with scripts - removed
- HTML entity bypass - blocked
- URL-encoded protocols - blocked
- All 25 tests passing (14 existing + 11 new)

Files modified:
- apps/web/src/components/knowledge/WikiLinkRenderer.tsx
- apps/web/src/components/knowledge/__tests__/WikiLinkRenderer.test.tsx

Fixes #201

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-03 22:59:41 -06:00
f87a28ac55 fix(#200): Enhance Mermaid XSS protection with DOMPurify
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Added defense-in-depth security layers for Mermaid rendering:

DOMPurify SVG Sanitization:
- Sanitize SVG output after mermaid.render()
- Remove script tags, iframes, objects, embeds
- Remove event handlers (onerror, onclick, onload, etc.)
- Use SVG profile for allowed elements

Label Sanitization:
- Added sanitizeMermaidLabel() function
- Remove HTML tags from all labels
- Remove dangerous protocols (javascript:, data:, vbscript:)
- Remove control characters
- Escape Mermaid special characters
- Truncate to 200 chars for DoS prevention
- Applied to all node labels in diagrams

Comprehensive XSS Testing:
- 15 test cases covering all attack vectors
- Script tag injection variants
- Event handler injection
- JavaScript/data URL injection
- SVG with embedded scripts
- HTML entity bypass attempts
- All tests passing

Files modified:
- apps/web/src/components/mindmap/MermaidViewer.tsx
- apps/web/src/components/mindmap/hooks/useGraphData.ts
- apps/web/src/components/mindmap/MermaidViewer.test.tsx (new)

Fixes #200

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-03 22:55:57 -06:00
9582d9a265 fix(#298): Fix async response handling in dashboard
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Replaced setTimeout hacks with proper polling mechanism:
- Added pollForQueryResponse() function with configurable polling interval
- Polls every 500ms with 30s timeout
- Properly handles DELIVERED and FAILED message states
- Throws errors for failures and timeouts

Updated dashboard to use polling instead of arbitrary delays:
- Removed setTimeout(resolve, 1000) hacks
- Added proper async/await for query responses
- Improved response data parsing for new query format
- Better error handling via polling exceptions

This fixes race conditions and unreliable data loading.

Fixes #298

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-03 22:51:25 -06:00