Add explicit @Inject("DOCKER_CLIENT") token to the Docker constructor
parameter in DockerSandboxService. The @Optional() decorator alone was
not suppressing the NestJS resolution error for the external dockerode
class, causing the orchestrator container to crash on startup.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
API:
- Add AuthModule import to JobEventsModule
- Add AuthModule import to JobStepsModule
- Fixes: AuthGuard dependency resolution in job modules
Orchestrator:
- Add @Optional() decorator to docker parameter in DockerSandboxService
- Fixes: NestJS trying to inject Docker class as dependency
All modules using AuthGuard must import AuthModule.
Docker parameter is optional for testing, needs @Optional() decorator.
Add crypto.randomBytes(4) hex suffix to container name generation
to prevent name collisions when multiple agents spawn simultaneously
within the same millisecond. Container names now include both a
timestamp and 8 random hex characters for guaranteed uniqueness.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Graceful container shutdown: detect "not running" containers and skip
force-remove escalation, only SIGKILL for genuine stop failures
- data: URI stripping: add security audit logging via NestJS Logger
when data: URIs are blocked in markdown links and images
- Orchestrator bootstrap: replace void bootstrap() with .catch() handler
for clear startup failure logging and clean process.exit(1)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the always-force container removal (SIGKILL) with a two-phase
approach: first attempt graceful stop (SIGTERM with configurable timeout),
then remove without force. Falls back to force remove only if the graceful
path fails. The graceful stop timeout is configurable via
orchestrator.sandbox.gracefulStopTimeoutSeconds (default: 10s).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add per-agent mutex using promise chaining to serialize state transitions
for the same agent. This prevents the Time-of-Check-Time-of-Use race
condition where two concurrent requests could both read the current state,
both validate it as valid for transition, and both write, causing one to
overwrite the other's transition.
The mutex uses a Map<string, Promise<void>> with promise chaining so that:
- Concurrent transitions to the same agent are queued and executed sequentially
- Different agents can still transition concurrently without contention
- The lock is always released even if the transition throws an error
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add validateImageTag() method to DockerSandboxService that validates
Docker image references against a safe character pattern before any
container creation. Rejects empty tags, tags exceeding 256 characters,
and tags containing shell metacharacters (;, &, |, $, backtick, etc.)
to prevent injection attacks. Also validates the default image tag at
service construction time to fail fast on misconfiguration.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add removeSession and scheduleSessionCleanup methods to AgentSpawnerService
- Schedule session cleanup after completed/failed/killed transitions
- Default 30 second delay before cleanup to allow status queries
- Implement OnModuleDestroy to clean up pending timers
- Add forwardRef injection to avoid circular dependency
- Add comprehensive tests for cleanup functionality
Refs #338
- Add MAX_CONCURRENT_AGENTS configuration (default: 20)
- Check current agent count before spawning
- Reject spawn requests with 429 Too Many Requests when limit reached
- Add comprehensive tests for limit enforcement
Refs #338
- Drop all Linux capabilities by default (CapDrop: ALL)
- Enable read-only root filesystem (agents write to mounted /workspace volume)
- Limit process count to 100 to prevent fork bombs (PidsLimit)
- Add no-new-privileges security option to prevent privilege escalation
- Add DockerSecurityOptions type with configurable security settings
- All options are configurable via config but secure by default
- Add comprehensive tests for security hardening options (20+ new tests)
Refs #338
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add DEFAULT_ENV_WHITELIST constant with safe env vars (AGENT_ID, TASK_ID,
NODE_ENV, LOG_LEVEL, TZ, MOSAIC_* vars, etc.)
- Implement filterEnvVars() to separate allowed/filtered vars
- Log security warning when non-whitelisted vars are filtered
- Support custom whitelist via orchestrator.sandbox.envWhitelist config
- Add comprehensive tests for whitelist functionality (39 tests passing)
Prevents accidental leakage of secrets like API keys, database credentials,
AWS secrets, etc. to Docker containers.
Refs #338
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Sandbox now enabled by default for security
- Logs prominent warning when explicitly disabled
- Agents run in containers unless SANDBOX_ENABLED=false
Refs #337
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implemented three new API endpoints for knowledge graph visualization:
1. GET /api/knowledge/graph - Full knowledge graph
- Returns all entries and links with optional filtering
- Supports filtering by tags, status, and node count limit
- Includes orphan detection (entries with no links)
2. GET /api/knowledge/graph/stats - Graph statistics
- Total entries and links counts
- Orphan entries detection
- Average links per entry
- Top 10 most connected entries
- Tag distribution across entries
3. GET /api/knowledge/graph/:slug - Entry-centered subgraph
- Returns graph centered on specific entry
- Supports depth parameter (1-5) for traversal distance
- Includes all connected nodes up to specified depth
New Files:
- apps/api/src/knowledge/graph.controller.ts
- apps/api/src/knowledge/graph.controller.spec.ts
Modified Files:
- apps/api/src/knowledge/dto/graph-query.dto.ts (added GraphFilterDto)
- apps/api/src/knowledge/entities/graph.entity.ts (extended with new types)
- apps/api/src/knowledge/services/graph.service.ts (added new methods)
- apps/api/src/knowledge/services/graph.service.spec.ts (added tests)
- apps/api/src/knowledge/knowledge.module.ts (registered controller)
- apps/api/src/knowledge/dto/index.ts (exported new DTOs)
- docs/scratchpads/71-graph-data-api.md (implementation notes)
Test Coverage: 21 tests (all passing)
- 14 service tests including orphan detection, filtering, statistics
- 7 controller tests for all three endpoints
Follows TDD principles with tests written before implementation.
All code quality gates passed (lint, typecheck, tests).
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add support for filtering search results by tags in the main search endpoint.
Changes:
- Add tags parameter to SearchQueryDto (comma-separated tag slugs)
- Implement tag filtering in SearchService.search() method
- Update SQL query to join with knowledge_entry_tags when tags provided
- Entries must have ALL specified tags (AND logic)
- Add tests for tag filtering (2 controller tests, 2 service tests)
- Update endpoint documentation
- Fix non-null assertion linting error
The search endpoint now supports:
- Full-text search with ranking (ts_rank)
- Snippet generation with highlighting (ts_headline)
- Status filtering
- Tag filtering (new)
- Pagination
Example: GET /api/knowledge/search?q=api&tags=documentation,tutorial
All tests pass (25 total), type checking passes, linting passes.
Fixes#66
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add NestJS-based orchestrator service structure for M6-AgentOrchestration.
Changes:
- Migrate from Express to NestJS architecture
- Add health check endpoint module
- Add placeholder modules: coordinator, git, killswitch, monitor, queue, spawner, valkey
- Update configuration for NestJS
- Update lockfile for new dependencies
This is foundational work for M6-AgentOrchestration milestone.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>