Implemented optimistic locking with version field and SELECT FOR UPDATE transactions to prevent data corruption from concurrent job status updates. Changes: - Added version field to RunnerJob schema for optimistic locking - Created migration 20260202_add_runner_job_version_for_concurrency - Implemented ConcurrentUpdateException for conflict detection - Updated RunnerJobsService methods with optimistic locking: * updateStatus() - with version checking and retry logic * updateProgress() - with version checking and retry logic * cancel() - with version checking and retry logic - Updated CoordinatorIntegrationService with SELECT FOR UPDATE: * updateJobStatus() - transaction with row locking * completeJob() - transaction with row locking * failJob() - transaction with row locking * updateJobProgress() - optimistic locking - Added retry mechanism (3 attempts) with exponential backoff - Added comprehensive concurrency tests (10 tests, all passing) - Updated existing test mocks to support updateMany Test Results: - All 10 concurrency tests passing ✓ - Tests cover concurrent status updates, progress updates, completions, cancellations, retry logic, and exponential backoff This fix prevents race conditions that could cause: - Lost job results (double completion) - Lost progress updates - Invalid status transitions - Data corruption under concurrent access Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
21 KiB
M6-AgentOrchestration Issue Audit
Date: 2026-02-02 Milestone: M6-AgentOrchestration (0.0.6) Status: 6 open / 3 closed issues Audit Purpose: Review existing issues against confirmed orchestrator-in-monorepo architecture
Executive Summary
Current State:
- M6 milestone has 9 issues (6 open, 3 closed)
- Issues are based on "ClawdBot integration" architecture
- New architecture: Orchestrator is
apps/orchestrator/in monorepo (NOT ClawdBot)
Key Finding:
- CONFLICT: All M6 issues reference "ClawdBot" as external execution backend
- REALITY: Orchestrator is now an internal monorepo service at
apps/orchestrator/
Recommendation:
- Keep existing M6 issues - they represent the control plane (Mosaic Stack's responsibility)
- Create 34 new issues - for the execution plane (
apps/orchestrator/implementation) - Update issue descriptions - replace "ClawdBot" references with "Orchestrator service"
Architecture Comparison
Old Architecture (Current M6 Issues)
Mosaic Stack (Control Plane)
↓
ClawdBot Gateway (External service, separate repo)
↓
Worker Agents
New Architecture (Confirmed 2026-02-02)
Mosaic Stack Monorepo
├── apps/api/ (Control Plane - task CRUD, dispatch)
├── apps/coordinator/ (Quality gates, 50% rule)
├── apps/orchestrator/ (NEW - Execution plane)
│ ├── Agent spawning
│ ├── Task queue (Valkey/BullMQ)
│ ├── Git operations
│ ├── Health monitoring
│ └── Killswitch responder
└── apps/web/ (Dashboard, agent monitoring)
Key Difference: Orchestrator is IN the monorepo at apps/orchestrator/, not external "ClawdBot".
Existing M6 Issues Analysis
Epic
#95 [EPIC] Agent Orchestration - Persistent task management
- Status: Open
- Architecture: Based on ClawdBot integration
- Recommendation: UPDATE - Keep as overall epic, but update description:
- Replace "ClawdBot" with "Orchestrator service (
apps/orchestrator/)" - Update delegation model to reflect monorepo architecture
- Reference
ORCHESTRATOR-MONOREPO-SETUP.mdinstead ofCLAWDBOT-INTEGRATION.md
- Replace "ClawdBot" with "Orchestrator service (
- Action: Update issue description
Phase 1: Foundation (Control Plane)
#96 [ORCH-001] Agent Task Database Schema
- Status: Closed ✅
- Scope: Database schema for task orchestration
- Architecture Fit: ✅ KEEP AS-IS
- Reason: Control plane (Mosaic Stack) still needs task database
- Notes:
agent_taskstable - ✅ Still neededagent_task_logs- ✅ Still neededclawdbot_backends- ⚠️ Rename toorchestrator_instances(if multi-instance)
- Action: No changes needed (already closed)
#97 [ORCH-002] Task CRUD API
- Status: Closed ✅
- Scope: REST API for task management
- Architecture Fit: ✅ KEEP AS-IS
- Reason: Control plane API (Mosaic Stack) manages tasks
- Notes:
- POST/GET/PATCH endpoints - ✅ Still needed
- Dispatch handled in #99 - ✅ Correct
- Action: No changes needed (already closed)
Phase 2: Integration (Control Plane ↔ Execution Plane)
#98 [ORCH-003] Valkey Integration
- Status: Closed ✅
- Scope: Valkey for runtime state
- Architecture Fit: ✅ KEEP AS-IS
- Reason: Shared state between control plane and orchestrator
- Notes:
- Task status caching - ✅ Control plane needs this
- Pub/Sub for progress - ✅ Still needed
- Backend health cache - ⚠️ Update to "Orchestrator health cache"
- Action: No changes needed (already closed)
#99 [ORCH-004] Task Dispatcher Service
- Status: Open
- Scope: Dispatch tasks to execution backend
- Architecture Fit: ⚠️ UPDATE REQUIRED
- Current Description: "Dispatcher service for delegating work to ClawdBot"
- Should Be: "Dispatcher service for delegating work to Orchestrator (
apps/orchestrator/)" - Changes Needed:
- Replace "ClawdBot Gateway API client" with "Orchestrator API client"
- Update endpoint references (ClawdBot → Orchestrator)
- Internal service call, not external HTTP (unless orchestrator runs separately)
- Action: Update issue description, replace ClawdBot → Orchestrator
#102 [ORCH-007] Gateway Integration
- Status: Open
- Scope: Integration with execution backend
- Architecture Fit: ⚠️ UPDATE REQUIRED
- Current Description: "Core integration with ClawdBot Gateway API"
- Should Be: "Integration with Orchestrator service (
apps/orchestrator/)" - Changes Needed:
- API endpoints:
/orchestrator/agents/spawn,/orchestrator/agents/kill - Monorepo service-to-service communication (not external HTTP, or internal HTTP)
- Session management handled by orchestrator
- API endpoints:
- Action: Update issue description, replace ClawdBot → Orchestrator
Phase 3: Failure Handling (Control Plane)
#100 [ORCH-005] ClawdBot Failure Handling
- Status: Open
- Scope: Handle failures reported by execution backend
- Architecture Fit: ⚠️ UPDATE REQUIRED
- Current Description: "Handle failures reported by ClawdBot"
- Should Be: "Handle failures reported by Orchestrator"
- Changes Needed:
- Callback handler receives failures from orchestrator
- Retry/escalation logic - ✅ Still valid
- Orchestrator reports failures, control plane decides retry
- Action: Update issue description, replace ClawdBot → Orchestrator
Phase 4: Observability (Control Plane UI)
#101 [ORCH-006] Task Progress UI
- Status: Open
- Scope: Dashboard for monitoring task execution
- Architecture Fit: ✅ KEEP - MINOR UPDATES
- Current Description: Dashboard with kill controls
- Should Be: Same, but backend is Orchestrator
- Changes Needed:
- Backend health indicators - ⚠️ Update to "Orchestrator health"
- Real-time progress from Orchestrator via Valkey pub/sub - ✅ Correct
- Action: Minor update to issue description (backend = Orchestrator)
Safety Critical
#114 [ORCH-008] Kill Authority Implementation
- Status: Open
- Scope: Control plane kill authority over execution backend
- Architecture Fit: ✅ KEEP - CRITICAL
- Current Description: "Mosaic Stack MUST retain the ability to terminate any ClawdBot operation"
- Should Be: "Mosaic Stack MUST retain the ability to terminate any Orchestrator operation"
- Changes Needed:
- Endpoints:
/api/orchestrator/tasks/:id/kill(not/api/clawdbot/...) - Kill signal to orchestrator service
- Audit trail - ✅ Still valid
- Endpoints:
- Action: Update issue description, replace ClawdBot → Orchestrator
New Orchestrator Issues (Execution Plane)
The existing M6 issues cover the control plane (Mosaic Stack). We need 34 new issues for the execution plane (apps/orchestrator/).
Source: ORCHESTRATOR-MONOREPO-SETUP.md Section 10.
Foundation (Days 1-2)
-
[ORCH-101] Set up apps/orchestrator structure
- Labels:
task,setup,orchestrator - Milestone: M6-AgentOrchestration (0.0.6)
- Description: Create directory structure, package.json, tsconfig.json
- Dependencies: None
- Conflicts: None (new code)
- Labels:
-
[ORCH-102] Create Fastify server with health checks
- Labels:
feature,api,orchestrator - Milestone: M6-AgentOrchestration (0.0.6)
- Description: Basic HTTP server with
/healthendpoint - Dependencies: #[ORCH-101]
- Conflicts: None
- Labels:
-
[ORCH-103] Docker Compose integration for orchestrator
- Labels:
task,infrastructure,orchestrator - Milestone: M6-AgentOrchestration (0.0.6)
- Description: Add orchestrator service to docker-compose.yml
- Dependencies: #[ORCH-101]
- Conflicts: None
- Labels:
-
[ORCH-104] Monorepo build pipeline for orchestrator
- Labels:
task,infrastructure,orchestrator - Milestone: M6-AgentOrchestration (0.0.6)
- Description: Update turbo.json, ensure orchestrator builds correctly
- Dependencies: #[ORCH-101]
- Conflicts: None
- Labels:
Agent Spawning (Days 3-4)
-
[ORCH-105] Implement agent spawner (Claude SDK)
- Labels:
feature,core,orchestrator - Milestone: M6-AgentOrchestration (0.0.6)
- Description: Spawn Claude agents via Anthropic SDK
- Dependencies: #[ORCH-102]
- Conflicts: None
- Labels:
-
[ORCH-106] Docker sandbox isolation
- Labels:
feature,security,orchestrator - Milestone: M6-AgentOrchestration (0.0.6)
- Description: Isolate agents in Docker containers
- Dependencies: #[ORCH-105]
- Conflicts: None
- Labels:
-
[ORCH-107] Valkey client and state management
- Labels:
feature,core,orchestrator - Milestone: M6-AgentOrchestration (0.0.6)
- Description: Valkey client, state schema implementation
- Dependencies: #98 (Valkey Integration), #[ORCH-102]
- Conflicts: None (orchestrator's own Valkey client)
- Labels:
-
[ORCH-108] BullMQ task queue
- Labels:
feature,core,orchestrator - Milestone: M6-AgentOrchestration (0.0.6)
- Description: Task queue with priority, retry logic
- Dependencies: #[ORCH-107]
- Conflicts: None
- Labels:
-
[ORCH-109] Agent lifecycle management
- Labels:
feature,core,orchestrator - Milestone: M6-AgentOrchestration (0.0.6)
- Description: Manage agent states (spawning, running, completed, failed)
- Dependencies: #[ORCH-105], #[ORCH-108]
- Conflicts: None
- Labels:
Git Integration (Days 5-6)
-
[ORCH-110] Git operations (clone, commit, push)
- Labels:
feature,git,orchestrator - Milestone: M6-AgentOrchestration (0.0.6)
- Description: Implement git-operations.ts with simple-git
- Dependencies: #[ORCH-105]
- Conflicts: None
- Labels:
-
[ORCH-111] Git worktree management
- Labels:
feature,git,orchestrator - Milestone: M6-AgentOrchestration (0.0.6)
- Description: Create and manage git worktrees for isolation
- Dependencies: #[ORCH-110]
- Conflicts: None
- Labels:
-
[ORCH-112] Conflict detection
- Labels:
feature,git,orchestrator - Milestone: M6-AgentOrchestration (0.0.6)
- Description: Detect merge conflicts before pushing
- Dependencies: #[ORCH-110]
- Conflicts: None
- Labels:
Coordinator Integration (Days 7-8)
-
[ORCH-113] Coordinator API client
- Labels:
feature,integration,orchestrator - Milestone: M6-AgentOrchestration (0.0.6)
- Description: HTTP client for coordinator callbacks
- Dependencies: #[ORCH-102]
- Related: Existing coordinator in
apps/coordinator/
- Labels:
-
[ORCH-114] Quality gate callbacks
- Labels:
feature,quality,orchestrator - Milestone: M6-AgentOrchestration (0.0.6)
- Description: Call coordinator quality gates (pre-commit, post-commit)
- Dependencies: #[ORCH-113]
- Related: Coordinator implements gates
- Labels:
-
[ORCH-115] Task dispatch from coordinator
- Labels:
feature,integration,orchestrator - Milestone: M6-AgentOrchestration (0.0.6)
- Description: Coordinator dispatches tasks to orchestrator
- Dependencies: #99 (Task Dispatcher), #[ORCH-113]
- Conflicts: None (complements #99)
- Labels:
-
[ORCH-116] 50% rule enforcement
- Labels:
feature,quality,orchestrator - Milestone: M6-AgentOrchestration (0.0.6)
- Description: Mechanical gates + AI confirmation
- Dependencies: #[ORCH-114]
- Related: Coordinator enforces, orchestrator calls
- Labels:
Killswitch + Security (Days 9-10)
-
[ORCH-117] Killswitch implementation
- Labels:
feature,security,orchestrator - Milestone: M6-AgentOrchestration (0.0.6)
- Description: Kill single agent or all agents (emergency stop)
- Dependencies: #[ORCH-109]
- Related: #114 (Kill Authority in control plane)
- Labels:
-
[ORCH-118] Resource cleanup
- Labels:
task,infrastructure,orchestrator - Milestone: M6-AgentOrchestration (0.0.6)
- Description: Clean up Docker containers, git worktrees
- Dependencies: #[ORCH-117]
- Conflicts: None
- Labels:
-
[ORCH-119] Docker security hardening
- Labels:
security,orchestrator - Milestone: M6-AgentOrchestration (0.0.6)
- Description: Non-root user, minimal image, security scanning
- Dependencies: #[ORCH-106]
- Conflicts: None
- Labels:
-
[ORCH-120] Secret scanning
- Labels:
security,orchestrator - Milestone: M6-AgentOrchestration (0.0.6)
- Description: git-secrets integration, pre-commit hooks
- Dependencies: #[ORCH-110]
- Conflicts: None
- Labels:
Quality Gates (Days 11-12)
-
[ORCH-121] Mechanical quality gates
- Labels:
feature,quality,orchestrator - Milestone: M6-AgentOrchestration (0.0.6)
- Description: TypeScript, ESLint, tests, coverage
- Dependencies: #[ORCH-114]
- Related: Coordinator has gate implementations
- Labels:
-
[ORCH-122] AI agent confirmation
- Labels:
feature,quality,orchestrator - Milestone: M6-AgentOrchestration (0.0.6)
- Description: Independent AI agent reviews changes
- Dependencies: #[ORCH-114]
- Related: Coordinator calls AI reviewer
- Labels:
-
[ORCH-123] YOLO mode (gate bypass)
- Labels:
feature,configuration,orchestrator - Milestone: M6-AgentOrchestration (0.0.6)
- Description: User-configurable approval gates
- Dependencies: #[ORCH-114]
- Conflicts: None
- Labels:
-
[ORCH-124] Gate configuration per-task
- Labels:
feature,configuration,orchestrator - Milestone: M6-AgentOrchestration (0.0.6)
- Description: Different quality gates for different tasks
- Dependencies: #[ORCH-114]
- Conflicts: None
- Labels:
Testing (Days 13-14)
-
[ORCH-125] E2E test: Full agent lifecycle
- Labels:
test,e2e,orchestrator - Milestone: M6-AgentOrchestration (0.0.6)
- Description: Spawn → Git → Quality → Complete
- Dependencies: All above
- Conflicts: None
- Labels:
-
[ORCH-126] E2E test: Killswitch
- Labels:
test,e2e,orchestrator - Milestone: M6-AgentOrchestration (0.0.6)
- Description: Kill single and all agents
- Dependencies: #[ORCH-117]
- Conflicts: None
- Labels:
-
[ORCH-127] E2E test: Concurrent agents
- Labels:
test,e2e,orchestrator - Milestone: M6-AgentOrchestration (0.0.6)
- Description: 10 concurrent agents
- Dependencies: #[ORCH-109]
- Conflicts: None
- Labels:
-
[ORCH-128] Performance testing
- Labels:
test,performance,orchestrator - Milestone: M6-AgentOrchestration (0.0.6)
- Description: Load testing, resource monitoring
- Dependencies: #[ORCH-125]
- Conflicts: None
- Labels:
-
[ORCH-129] Documentation
- Labels:
documentation,orchestrator - Milestone: M6-AgentOrchestration (0.0.6)
- Description: API docs, architecture diagrams, runbooks
- Dependencies: All above
- Conflicts: None
- Labels:
Integration Issues (Existing Apps)
-
[ORCH-130] apps/api: Add orchestrator client
- Labels:
feature,integration,api - Milestone: M6-AgentOrchestration (0.0.6)
- Description: HTTP client for orchestrator API
- Dependencies: #[ORCH-102], #99 (uses this client)
- Conflicts: None (extends #99)
- Labels:
-
[ORCH-131] apps/coordinator: Add orchestrator dispatcher
- Labels:
feature,integration,coordinator - Milestone: M6-AgentOrchestration (0.0.6)
- Description: Dispatch tasks to orchestrator after quality pre-check
- Dependencies: #[ORCH-102], #99
- Related: Coordinator already exists
- Labels:
-
[ORCH-132] apps/web: Add agent dashboard
- Labels:
feature,ui,web - Milestone: M6-AgentOrchestration (0.0.6)
- Description: Real-time agent status dashboard
- Dependencies: #101 (extends this), #[ORCH-102]
- Related: Extends #101
- Labels:
-
[ORCH-133] docker-compose: Add orchestrator service
- Labels:
task,infrastructure - Milestone: M6-AgentOrchestration (0.0.6)
- Description: Integrate orchestrator into docker-compose.yml
- Dependencies: #[ORCH-103]
- Conflicts: None
- Labels:
-
[ORCH-134] Update root documentation
- Labels:
documentation - Milestone: M6-AgentOrchestration (0.0.6)
- Description: Update README, ARCHITECTURE.md
- Dependencies: #[ORCH-129]
- Conflicts: None
- Labels:
Integration Matrix
Existing M6 Issues (Control Plane)
| Issue | Keep? | Update? | Reason |
|---|---|---|---|
| #95 (Epic) | ✅ | ⚠️ | Update ClawdBot → Orchestrator |
| #96 (Schema) | ✅ | ✅ | Already closed, no changes |
| #97 (CRUD API) | ✅ | ✅ | Already closed, no changes |
| #98 (Valkey) | ✅ | ✅ | Already closed, no changes |
| #99 (Dispatcher) | ✅ | ⚠️ | Update ClawdBot → Orchestrator |
| #100 (Failure Handling) | ✅ | ⚠️ | Update ClawdBot → Orchestrator |
| #101 (Progress UI) | ✅ | ⚠️ | Minor update (backend = Orchestrator) |
| #102 (Gateway Integration) | ✅ | ⚠️ | Update ClawdBot → Orchestrator |
| #114 (Kill Authority) | ✅ | ⚠️ | Update ClawdBot → Orchestrator |
New Orchestrator Issues (Execution Plane)
| Issue | Phase | Dependencies | Conflicts |
|---|---|---|---|
| ORCH-101 to ORCH-104 | Foundation | None | None |
| ORCH-105 to ORCH-109 | Spawning | Foundation | None |
| ORCH-110 to ORCH-112 | Git | Spawning | None |
| ORCH-113 to ORCH-116 | Coordinator | Git | None |
| ORCH-117 to ORCH-120 | Security | Coordinator | None |
| ORCH-121 to ORCH-124 | Quality | Security | None |
| ORCH-125 to ORCH-129 | Testing | All above | None |
| ORCH-130 to ORCH-134 | Integration | Testing | Extends existing |
No conflicts. New issues are additive (execution plane). Existing issues are control plane.
Recommended Actions
Immediate (Before Creating New Issues)
-
Update Existing M6 Issues (6 issues to update)
- #95: Update epic description (ClawdBot → Orchestrator service)
- #99: Update dispatcher description
- #100: Update failure handling description
- #101: Minor update (backend = Orchestrator)
- #102: Update gateway integration description
- #114: Update kill authority description
Script:
# For each issue, use tea CLI: tea issues edit <issue-number> --description "<updated description>" -
Add Architecture Reference to Epic
- Update #95 to reference:
ORCHESTRATOR-MONOREPO-SETUP.mdARCHITECTURE-CLARIFICATION.md
- Remove reference to
CLAWDBOT-INTEGRATION.md(obsolete)
- Update #95 to reference:
After Updates
-
Create 34 New Orchestrator Issues
-
Use template:
# [ORCH-XXX] Title ## Description [What needs to be done] ## Acceptance Criteria - [ ] Criterion 1 - [ ] Criterion 2 ## Dependencies - Blocks: #X - Blocked by: #Y ## Technical Notes [Implementation details from ORCHESTRATOR-MONOREPO-SETUP.md]
-
-
Create Label:
orchestratortea labels create orchestrator --color "#FF6B35" --description "Orchestrator service (apps/orchestrator/)" -
Link Issues
- New orchestrator issues should reference control plane issues:
- ORCH-130 extends #99 (API client for dispatcher)
- ORCH-131 extends #99 (Coordinator dispatcher)
- ORCH-132 extends #101 (Agent dashboard)
- Use "Blocks:" and "Blocked by:" in issue descriptions
- New orchestrator issues should reference control plane issues:
Issue Creation Priority
Phase 1: Foundation (Create First)
- ORCH-101 to ORCH-104 (no dependencies)
Phase 2: Core Features
- ORCH-105 to ORCH-109 (spawning)
- ORCH-110 to ORCH-112 (git)
- ORCH-113 to ORCH-116 (coordinator)
Phase 3: Security & Quality
- ORCH-117 to ORCH-120 (security)
- ORCH-121 to ORCH-124 (quality)
Phase 4: Testing & Integration
- ORCH-125 to ORCH-129 (testing)
- ORCH-130 to ORCH-134 (integration)
Summary
Existing M6 Issues: 9 total
- Keep: 9 (all control plane work)
- Update: 6 (replace ClawdBot → Orchestrator)
- Close: 0 (all still valid)
New Orchestrator Issues: 34 total
- Foundation: 4 issues
- Spawning: 5 issues
- Git: 3 issues
- Coordinator: 4 issues
- Security: 4 issues
- Quality: 4 issues
- Testing: 5 issues
- Integration: 5 issues
Total M6 Issues After Audit: 43 issues
- 9 control plane (existing, updated)
- 34 execution plane (new)
Conflicts: None (clean separation between control plane and execution plane)
Blockers: None
Questions for Jason:
- Approve update of existing 6 issues? (replace ClawdBot → Orchestrator)
- Approve creation of 34 new orchestrator issues?
- Create
orchestratorlabel? - Any additional issues needed?
Next Steps
- ✅ Review this audit
- ⏸️ Get Jason's approval
- ⏸️ Update existing 6 M6 issues
- ⏸️ Create
orchestratorlabel - ⏸️ Create 34 new orchestrator issues
- ⏸️ Link issues (dependencies, blocks)
- ⏸️ Update M6 milestone (43 total issues)
Ready to proceed?