docs: Add overlap analysis for non-AI coordinator patterns

Detailed comparison showing: - Existing doc addresses L-015 (premature completion) - New doc addresses context exhaustion (multi-issue orchestration) - ~20% overlap (both use non-AI coordinator, mechanical gates) - 80% complementary (different problems, different solutions) Recommends merging into comprehensive document (already done). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-31 14:47:59 -06:00
parent a2f06fe75b
commit 903109ea40
16 changed files with 1212 additions and 3 deletions
--- a/docs/3-architecture/non-ai-coordinator-overlap-analysis.md
+++ b/docs/3-architecture/non-ai-coordinator-overlap-analysis.md
@@ -0,0 +1,417 @@
+# Non-AI Coordinator Pattern - Overlap Analysis
+
+**Date:** 2026-01-31
+**Purpose:** Identify overlaps and differences between two complementary architecture documents
+
+---
+
+## Documents Compared
+
+### Document A: Mosaic Stack Non-AI Coordinator Pattern
+
+**Location:** `/home/jwoltje/src/mosaic-stack/docs/3-architecture/non-ai-coordinator-pattern.md`
+**Length:** 903 lines
+**Problem Space:** L-015 Agent Premature Completion
+**Focus:** Single-agent quality enforcement
+
+### Document B: Quality-Rails Orchestration Architecture
+
+**Location:** `/home/jwoltje/src/jarvis-brain/docs/work/quality-rails-orchestration-architecture.md`
+**Length:** ~600 lines
+**Problem Space:** Context exhaustion in multi-issue orchestration
+**Focus:** Multi-agent lifecycle management at scale
+
+---
+
+## Summary Table
+
+| Aspect                     | Document A (Existing)                       | Document B (New)                         | Overlap?           |
+| -------------------------- | ------------------------------------------- | ---------------------------------------- | ------------------ |
+| **Primary Problem**        | Agents claim "done" prematurely             | Agents pause at 95% context              | Different          |
+| **Coordinator Type**       | Non-AI (TypeScript/NestJS)                  | Non-AI (Python/Node.js)                  | ✅ Overlap         |
+| **Quality Gates**          | BuildGate, LintGate, TestGate, CoverageGate | Mechanical gates (lint, typecheck, test) | ✅ Overlap         |
+| **Agent Scope**            | Single agent per issue                      | Multi-agent orchestration                | Different          |
+| **Context Management**     | Not addressed                               | Core feature (80% compact, 95% rotate)   | Different          |
+| **Model Assignment**       | Not addressed                               | Agent profiles + difficulty matching     | Different          |
+| **Issue Sizing**           | Not addressed                               | 50% rule, epic decomposition             | Different          |
+| **Implementation Status**  | Full TypeScript code                        | Python pseudocode + PoC plan             | Different          |
+| **Forced Continuation**    | Yes (rejection loop)                        | No (preventive via context mgmt)         | Different approach |
+| **Non-negotiable Quality** | Yes                                         | Yes                                      | ✅ Overlap         |
+
+---
+
+## Unique to Document A (Existing Mosaic Stack Pattern)
+
+### 1. **Premature Completion Problem**
+
+- **Problem:** Agents claim work is "done" when tests fail, files are missing, or requirements are incomplete
+- **Root cause:** Agent interprets partial completion as success
+- **Example:** Agent implements feature, tests fail, agent says "done" anyway
+
+### 2. **Rejection Loop & Forced Continuation**
+
+```typescript
+// CompletionVerificationEngine
+if (!allGatesPassed) {
+  return this.forcedContinuationService.generateContinuationPrompt({
+    failedGates,
+    tone: "non-negotiable",
+  });
+}
+```
+
+**Key innovation:** When agent claims "done" but gates fail, coordinator injects prompt forcing continuation:
+
+```
+COMPLETION REJECTED. The following quality gates have failed:
+- Build Gate: Compilation errors detected
+- Test Gate: 3/15 tests failing
+
+You must continue working until ALL quality gates pass.
+This is not optional. Do not claim completion until gates pass.
+```
+
+### 3. **State Machine for Completion Claims**
+
+```
+Agent Working → Claims Done → Run Gates → Pass/Reject
+                                   ↓
+                              Reject → Force Continue → Agent Working
+```
+
+### 4. **TypeScript/NestJS Implementation**
+
+- Full production-ready service code
+- QualityOrchestrator service
+- Gate interfaces and implementations
+- Dependency injection architecture
+
+### 5. **CompletionVerificationEngine**
+
+- Intercepts agent completion claims
+- Runs all gates synchronously
+- Blocks "done" status until gates pass
+
+---
+
+## Unique to Document B (New Quality-Rails Orchestration)
+
+### 1. **Context Exhaustion Problem**
+
+- **Problem:** AI orchestrators pause at 95% context usage, losing autonomy
+- **Root cause:** Linear context growth without compaction
+- **Example:** M4 session completed 11 issues, paused at 95%, required manual restart
+
+### 2. **50% Rule for Issue Sizing**
+
+```
+Issue context estimate MUST NOT exceed 50% of target agent's context limit.
+
+Example:
+- Sonnet agent: 200K context limit
+- Maximum issue estimate: 100K tokens
+- Reasoning: Leaves 100K for system prompts, conversation, safety buffer
+```
+
+### 3. **Agent Profiles & Model Assignment**
+
+```python
+AGENT_PROFILES = {
+    'opus': {
+        'context_limit': 200000,
+        'cost_per_mtok': 15.00,
+        'capabilities': ['high', 'medium', 'low']
+    },
+    'sonnet': {
+        'context_limit': 200000,
+        'cost_per_mtok': 3.00,
+        'capabilities': ['medium', 'low']
+    },
+    'glm': {
+        'context_limit': 128000,
+        'cost_per_mtok': 0.00,  # Self-hosted
+        'capabilities': ['medium', 'low']
+    }
+}
+```
+
+**Assignment logic:** Choose cheapest capable agent based on:
+
+- Estimated context usage
+- Difficulty level
+- Agent capabilities
+
+### 4. **Context Monitoring & Session Rotation**
+
+```python
+def monitor_agent_context(agent_id: str) -> ContextAction:
+    usage = get_context_usage(agent_id)
+
+    if usage > 0.95:
+        return ContextAction.ROTATE_SESSION  # Start fresh agent
+    elif usage > 0.80:
+        return ContextAction.COMPACT  # Summarize completed work
+    else:
+        return ContextAction.CONTINUE  # Keep working
+```
+
+### 5. **Context Estimation Formula**
+
+```python
+def estimate_context(issue: Issue) -> int:
+    base = (
+        issue.files_to_modify * 7000 +  # Average file size
+        issue.implementation_complexity * 20000 +  # Code writing
+        issue.test_requirements * 10000 +  # Test writing
+        issue.documentation * 3000  # Docs
+    )
+
+    buffer = base * 1.3  # 30% safety margin
+    return int(buffer)
+```
+
+### 6. **Epic Decomposition Workflow**
+
+```
+User creates Epic → Coordinator analyzes scope → Decomposes into sub-issues
+                                                        ↓
+                                        Each issue ≤ 50% agent context limit
+                                                        ↓
+                                        Assigns metadata: estimated_context, difficulty
+```
+
+### 7. **Multi-Model Support**
+
+- Supports Opus, Sonnet, Haiku, GLM, MiniMax, Cogito
+- Cost optimization through model selection
+- Self-hosted model preference when capable
+
+### 8. **Proactive Context Management**
+
+- Prevents context exhaustion BEFORE it happens
+- No manual intervention needed
+- Maintains autonomy through entire queue
+
+---
+
+## Overlaps (Both Documents)
+
+### 1. **Non-AI Coordinator Pattern** ✅
+
+Both use deterministic code (not AI) as the orchestrator:
+
+- **Doc A:** TypeScript/NestJS service
+- **Doc B:** Python/Node.js coordinator
+- **Rationale:** Avoid AI orchestrator context limits and inconsistency
+
+### 2. **Mechanical Quality Gates** ✅
+
+Both enforce quality through automated checks:
+
+**Doc A gates:**
+
+- BuildGate (compilation)
+- LintGate (code style)
+- TestGate (unit/integration tests)
+- CoverageGate (test coverage threshold)
+
+**Doc B gates:**
+
+- lint (code quality)
+- typecheck (type safety)
+- test (functionality)
+- coverage (same as Doc A)
+
+### 3. **Programmatic Enforcement** ✅
+
+Both prevent agent from bypassing quality:
+
+- **Doc A:** Rejection loop blocks completion until gates pass
+- **Doc B:** Coordinator enforces gates before allowing next issue
+- **Shared principle:** Quality is a requirement, not a suggestion
+
+### 4. **Non-Negotiable Quality Standards** ✅
+
+Both use firm language about quality requirements:
+
+- **Doc A:** "This is not optional. Do not claim completion until gates pass."
+- **Doc B:** "Quality gates are mechanical blockers, not suggestions."
+
+### 5. **State Management** ✅
+
+Both track work state programmatically:
+
+- **Doc A:** Agent state machine (working → claimed done → verified → actual done)
+- **Doc B:** Issue state in tracking system (pending → in-progress → gate-check → completed)
+
+### 6. **Validation Before Progression** ✅
+
+Both prevent moving forward with broken code:
+
+- **Doc A:** Cannot claim "done" until gates pass
+- **Doc B:** Cannot start next issue until current issue passes gates
+
+---
+
+## Complementary Nature
+
+These documents solve **different problems in the same architectural pattern**:
+
+### Document A (Existing): Quality Enforcement
+
+**Problem:** "How do we prevent an agent from claiming work is done when it's not?"
+**Solution:** Rejection loop with forced continuation
+**Scope:** Single agent working on single issue
+**Lifecycle stage:** Task completion verification
+
+### Document B (New): Orchestration at Scale
+
+**Problem:** "How do we manage multiple agents working through dozens of issues without context exhaustion?"
+**Solution:** Proactive context management + intelligent agent assignment
+**Scope:** Multi-agent orchestration across entire milestone
+**Lifecycle stage:** Agent selection, session management, queue progression
+
+### Together They Form:
+
+```
+┌─────────────────────────────────────────────────────────┐
+│         Non-AI Coordinator (Document B)                 │
+│  - Monitors context usage across all agents             │
+│  - Assigns issues based on context estimates            │
+│  - Rotates agents at 95% context                        │
+│  - Enforces 50% rule during issue creation              │
+└─────────────────────────┬───────────────────────────────┘
+                          │
+        ┌─────────────────┼─────────────────┐
+        ▼                 ▼                 ▼
+   Agent 1           Agent 2           Agent 3
+   Issue #42         Issue #57         Issue #89
+        │                 │                 │
+        └─────────────────┴─────────────────┘
+                          │
+                          ▼
+        ┌─────────────────────────────────────────────────┐
+        │   Quality Orchestrator (Document A)             │
+        │   - Intercepts completion claims                │
+        │   - Runs quality gates                          │
+        │   - Forces continuation if gates fail           │
+        │   - Only allows "done" when gates pass          │
+        └─────────────────────────────────────────────────┘
+```
+
+**Document B (new)** manages the **agent lifecycle and orchestration**.
+**Document A (existing)** manages the **quality enforcement per agent**.
+
+---
+
+## Integration Recommendations
+
+### Option 1: Merge into Single Document (Recommended)
+
+**Reason:** They're parts of the same system
+
+**Structure:**
+
+```markdown
+# Non-AI Coordinator Pattern Architecture
+
+## Part 1: Multi-Agent Orchestration (from Doc B)
+
+- Context management
+- Agent assignment
+- Session rotation
+- 50% rule
+- Epic decomposition
+
+## Part 2: Quality Enforcement (from Doc A)
+
+- Premature completion problem
+- Quality gates
+- Rejection loop
+- Forced continuation
+- CompletionVerificationEngine
+
+## Part 3: Implementation
+
+- TypeScript/NestJS orchestrator (from Doc A)
+- Python coordinator enhancements (from Doc B)
+- Integration points
+```
+
+### Option 2: Keep Separate, Create Integration Doc
+
+**Reason:** Different audiences (orchestration vs quality enforcement)
+
+**Documents:**
+
+1. `orchestration-architecture.md` (Doc B) - For understanding multi-agent coordination
+2. `quality-enforcement-architecture.md` (Doc A) - For understanding quality gates
+3. `non-ai-coordinator-integration.md` (NEW) - How they work together
+
+### Option 3: Hierarchical Documentation
+
+**Reason:** Layers of abstraction
+
+```
+non-ai-coordinator-pattern.md (Overview)
+├── orchestration-layer.md (Doc B content)
+└── quality-layer.md (Doc A content)
+```
+
+---
+
+## Action Items
+
+Based on overlap analysis, recommend:
+
+1. **Merge the documents** into comprehensive architecture guide
+   - Use Doc A's problem statement for quality enforcement
+   - Use Doc B's problem statement for context exhaustion
+   - Show how both problems require non-AI coordinator
+   - Integrate TypeScript implementation with context monitoring
+
+2. **Update Mosaic Stack issue #140**
+   - Current: "Document Non-AI Coordinator Pattern Architecture"
+   - Expand scope: Include both quality enforcement AND orchestration
+   - Reference both problem spaces (L-015 + context exhaustion)
+
+3. **Create unified PoC plan**
+   - Phase 1: Context monitoring (from Doc B)
+   - Phase 2: Agent assignment logic (from Doc B)
+   - Phase 3: Quality gate integration (from Doc A)
+   - Phase 4: Forced continuation (from Doc A)
+
+4. **Preserve unique innovations from each**
+   - Doc A: Rejection loop, forced continuation prompts
+   - Doc B: 50% rule, agent profiles, context estimation formula
+
+---
+
+## Conclusion
+
+**These documents are highly complementary, not duplicative.**
+
+- **~20% overlap:** Both use non-AI coordinator, mechanical gates, non-negotiable quality
+- **80% unique value:** Doc A solves premature completion, Doc B solves context exhaustion
+
+**Best path forward:** Merge into single comprehensive architecture document that addresses both problems within the unified non-AI coordinator pattern.
+
+The pattern is:
+
+1. Non-AI coordinator assigns issues based on context estimates (Doc B)
+2. Agent works on issue
+3. Quality gates enforce completion standards (Doc A)
+4. Context monitoring prevents exhaustion (Doc B)
+5. Forced continuation prevents premature "done" (Doc A)
+6. Next issue assigned when ready (Doc B)
+
+Together they create a **robust, autonomous, quality-enforcing orchestration system** that scales beyond single-agent, single-issue scenarios.
+
+---
+
+**Next Steps:**
+
+1. User review of this analysis
+2. Decision on integration approach (Option 1, 2, or 3)
+3. Update Mosaic Stack documentation accordingly
+4. Proceed with PoC implementation