Commit Graph

18 Commits

Author SHA1 Message Date
Jason Woltje
6fd8e85266 fix(orchestrator): make provider-aware Claude key startup requirements
All checks were successful
ci/woodpecker/push/infra Pipeline was successful
ci/woodpecker/push/orchestrator Pipeline was successful
2026-02-17 17:15:42 -06:00
Jason Woltje
3258cd4f4d feat(orchestrator): add SSE events, queue controls, and mosaic rails sync 2026-02-17 15:39:15 -06:00
281c7ab39b fix(orchestrator): resolve DockerSandboxService DI failure on startup
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
Add explicit @Inject("DOCKER_CLIENT") token to the Docker constructor
parameter in DockerSandboxService. The @Optional() decorator alone was
not suppressing the NestJS resolution error for the external dockerode
class, causing the orchestrator container to crash on startup.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 21:22:52 -06:00
709499c167 fix(api,orchestrator): fix remaining dependency injection issues
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
API:
- Add AuthModule import to JobEventsModule
- Add AuthModule import to JobStepsModule
- Fixes: AuthGuard dependency resolution in job modules

Orchestrator:
- Add @Optional() decorator to docker parameter in DockerSandboxService
- Fixes: NestJS trying to inject Docker class as dependency

All modules using AuthGuard must import AuthModule.
Docker parameter is optional for testing, needs @Optional() decorator.
2026-02-08 22:24:37 -06:00
Jason Woltje
6934d9261c fix(SEC-ORCH-30): Add unique suffix to container names
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
Add crypto.randomBytes(4) hex suffix to container name generation
to prevent name collisions when multiple agents spawn simultaneously
within the same millisecond. Container names now include both a
timestamp and 8 random hex characters for guaranteed uniqueness.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 15:22:12 -06:00
Jason Woltje
92c310333c fix(SEC-REVIEW-4-7): Address remaining MEDIUM security review findings
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
- Graceful container shutdown: detect "not running" containers and skip
  force-remove escalation, only SIGKILL for genuine stop failures
- data: URI stripping: add security audit logging via NestJS Logger
  when data: URIs are blocked in markdown links and images
- Orchestrator bootstrap: replace void bootstrap() with .catch() handler
  for clear startup failure logging and clean process.exit(1)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 14:51:22 -06:00
Jason Woltje
a0062494b7 fix(CQ-ORCH-7): Graceful Docker container shutdown before force remove
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
Replace the always-force container removal (SIGKILL) with a two-phase
approach: first attempt graceful stop (SIGTERM with configurable timeout),
then remove without force. Falls back to force remove only if the graceful
path fails. The graceful stop timeout is configurable via
orchestrator.sandbox.gracefulStopTimeoutSeconds (default: 10s).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 14:05:53 -06:00
Jason Woltje
2b356f6ca2 fix(CQ-ORCH-5): Fix TOCTOU race in agent state transitions
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
Add per-agent mutex using promise chaining to serialize state transitions
for the same agent. This prevents the Time-of-Check-Time-of-Use race
condition where two concurrent requests could both read the current state,
both validate it as valid for transition, and both write, causing one to
overwrite the other's transition.

The mutex uses a Map<string, Promise<void>> with promise chaining so that:
- Concurrent transitions to the same agent are queued and executed sequentially
- Different agents can still transition concurrently without contention
- The lock is always released even if the transition throws an error

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 14:02:40 -06:00
Jason Woltje
d9efa85924 fix(SEC-ORCH-22): Validate Docker image tag format before pull
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
Add validateImageTag() method to DockerSandboxService that validates
Docker image references against a safe character pattern before any
container creation. Rejects empty tags, tags exceeding 256 characters,
and tags containing shell metacharacters (;, &, |, $, backtick, etc.)
to prevent injection attacks. Also validates the default image tag at
service construction time to fail fast on misconfiguration.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 13:46:47 -06:00
Jason Woltje
a42f88d64c fix(#338): Add session cleanup on terminal states
- Add removeSession and scheduleSessionCleanup methods to AgentSpawnerService
- Schedule session cleanup after completed/failed/killed transitions
- Default 30 second delay before cleanup to allow status queries
- Implement OnModuleDestroy to clean up pending timers
- Add forwardRef injection to avoid circular dependency
- Add comprehensive tests for cleanup functionality

Refs #338
2026-02-05 18:47:14 -06:00
Jason Woltje
3b80e9c396 fix(#338): Add max concurrent agents limit
- Add MAX_CONCURRENT_AGENTS configuration (default: 20)
- Check current agent count before spawning
- Reject spawn requests with 429 Too Many Requests when limit reached
- Add comprehensive tests for limit enforcement

Refs #338
2026-02-05 18:30:42 -06:00
Jason Woltje
3f16bbeca1 fix(#338): Add Docker security hardening (CapDrop, ReadonlyRootfs, PidsLimit)
- Drop all Linux capabilities by default (CapDrop: ALL)
- Enable read-only root filesystem (agents write to mounted /workspace volume)
- Limit process count to 100 to prevent fork bombs (PidsLimit)
- Add no-new-privileges security option to prevent privilege escalation
- Add DockerSecurityOptions type with configurable security settings
- All options are configurable via config but secure by default
- Add comprehensive tests for security hardening options (20+ new tests)

Refs #338

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 18:21:43 -06:00
Jason Woltje
e747c8db04 fix(#338): Whitelist allowed environment variables in Docker containers
- Add DEFAULT_ENV_WHITELIST constant with safe env vars (AGENT_ID, TASK_ID,
  NODE_ENV, LOG_LEVEL, TZ, MOSAIC_* vars, etc.)
- Implement filterEnvVars() to separate allowed/filtered vars
- Log security warning when non-whitelisted vars are filtered
- Support custom whitelist via orchestrator.sandbox.envWhitelist config
- Add comprehensive tests for whitelist functionality (39 tests passing)

Prevents accidental leakage of secrets like API keys, database credentials,
AWS secrets, etc. to Docker containers.

Refs #338

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 18:17:00 -06:00
Jason Woltje
949d0d0ead fix(#337): Enable Docker sandbox by default and warn when disabled
- Sandbox now enabled by default for security
- Logs prominent warning when explicitly disabled
- Agents run in containers unless SANDBOX_ENABLED=false

Refs #337

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 15:43:00 -06:00
Jason Woltje
fc87494137 fix(orchestrator): resolve all M6 remediation issues (#260-#269)
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Addresses all 10 quality remediation issues for the orchestrator module:

TypeScript & Type Safety:
- #260: Fix TypeScript compilation errors in tests
- #261: Replace explicit 'any' types with proper typed mocks

Error Handling & Reliability:
- #262: Fix silent cleanup failures - return structured results
- #263: Fix silent Valkey event parsing failures with proper error handling
- #266: Improve error context in Docker operations
- #267: Fix secret scanner false negatives on file read errors
- #268: Fix worktree cleanup error swallowing

Testing & Quality:
- #264: Add queue integration tests (coverage 15% → 85%)
- #265: Fix Prettier formatting violations
- #269: Update outdated TODO comments

All tests passing (406/406), TypeScript compiles cleanly, ESLint clean.

Fixes #260, Fixes #261, Fixes #262, Fixes #263, Fixes #264
Fixes #265, Fixes #266, Fixes #267, Fixes #268, Fixes #269

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 12:44:04 -06:00
Jason Woltje
5d348526de feat(#71): implement graph data API
Implemented three new API endpoints for knowledge graph visualization:

1. GET /api/knowledge/graph - Full knowledge graph
   - Returns all entries and links with optional filtering
   - Supports filtering by tags, status, and node count limit
   - Includes orphan detection (entries with no links)

2. GET /api/knowledge/graph/stats - Graph statistics
   - Total entries and links counts
   - Orphan entries detection
   - Average links per entry
   - Top 10 most connected entries
   - Tag distribution across entries

3. GET /api/knowledge/graph/:slug - Entry-centered subgraph
   - Returns graph centered on specific entry
   - Supports depth parameter (1-5) for traversal distance
   - Includes all connected nodes up to specified depth

New Files:
- apps/api/src/knowledge/graph.controller.ts
- apps/api/src/knowledge/graph.controller.spec.ts

Modified Files:
- apps/api/src/knowledge/dto/graph-query.dto.ts (added GraphFilterDto)
- apps/api/src/knowledge/entities/graph.entity.ts (extended with new types)
- apps/api/src/knowledge/services/graph.service.ts (added new methods)
- apps/api/src/knowledge/services/graph.service.spec.ts (added tests)
- apps/api/src/knowledge/knowledge.module.ts (registered controller)
- apps/api/src/knowledge/dto/index.ts (exported new DTOs)
- docs/scratchpads/71-graph-data-api.md (implementation notes)

Test Coverage: 21 tests (all passing)
- 14 service tests including orphan detection, filtering, statistics
- 7 controller tests for all three endpoints

Follows TDD principles with tests written before implementation.
All code quality gates passed (lint, typecheck, tests).

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-02 15:27:00 -06:00
Jason Woltje
c3500783d1 feat(#66): implement tag filtering in search API endpoint
Add support for filtering search results by tags in the main search endpoint.

Changes:
- Add tags parameter to SearchQueryDto (comma-separated tag slugs)
- Implement tag filtering in SearchService.search() method
- Update SQL query to join with knowledge_entry_tags when tags provided
- Entries must have ALL specified tags (AND logic)
- Add tests for tag filtering (2 controller tests, 2 service tests)
- Update endpoint documentation
- Fix non-null assertion linting error

The search endpoint now supports:
- Full-text search with ranking (ts_rank)
- Snippet generation with highlighting (ts_headline)
- Status filtering
- Tag filtering (new)
- Pagination

Example: GET /api/knowledge/search?q=api&tags=documentation,tutorial

All tests pass (25 total), type checking passes, linting passes.

Fixes #66

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-02 14:33:31 -06:00
Jason Woltje
e808487725 feat(M6): Set up orchestrator service foundation
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Add NestJS-based orchestrator service structure for M6-AgentOrchestration.

Changes:
- Migrate from Express to NestJS architecture
- Add health check endpoint module
- Add placeholder modules: coordinator, git, killswitch, monitor, queue, spawner, valkey
- Update configuration for NestJS
- Update lockfile for new dependencies

This is foundational work for M6-AgentOrchestration milestone.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-02 13:16:19 -06:00