stack/docs/3-architecture/non-ai-coordinator-overlap-analysis.md

# Non-AI Coordinator Pattern - Overlap Analysis

**Date:** 2026-01-31
**Purpose:** Identify overlaps and differences between two complementary architecture documents

---

## Documents Compared

### Document A: Mosaic Stack Non-AI Coordinator Pattern

**Location:** `/home/jwoltje/src/mosaic-stack/docs/3-architecture/non-ai-coordinator-pattern.md`
**Length:** 903 lines
**Problem Space:** L-015 Agent Premature Completion
**Focus:** Single-agent quality enforcement

### Document B: Quality-Rails Orchestration Architecture

**Location:** `/home/jwoltje/src/jarvis-brain/docs/work/quality-rails-orchestration-architecture.md`
**Length:** ~600 lines
**Problem Space:** Context exhaustion in multi-issue orchestration
**Focus:** Multi-agent lifecycle management at scale

---

## Summary Table

| Aspect                     | Document A (Existing)                       | Document B (New)                         | Overlap?           |
| -------------------------- | ------------------------------------------- | ---------------------------------------- | ------------------ |
| **Primary Problem**        | Agents claim "done" prematurely             | Agents pause at 95% context              | Different          |
| **Coordinator Type**       | Non-AI (TypeScript/NestJS)                  | Non-AI (Python/Node.js)                  | ✅ Overlap         |
| **Quality Gates**          | BuildGate, LintGate, TestGate, CoverageGate | Mechanical gates (lint, typecheck, test) | ✅ Overlap         |
| **Agent Scope**            | Single agent per issue                      | Multi-agent orchestration                | Different          |
| **Context Management**     | Not addressed                               | Core feature (80% compact, 95% rotate)   | Different          |
| **Model Assignment**       | Not addressed                               | Agent profiles + difficulty matching     | Different          |
| **Issue Sizing**           | Not addressed                               | 50% rule, epic decomposition             | Different          |
| **Implementation Status**  | Full TypeScript code                        | Python pseudocode + PoC plan             | Different          |
| **Forced Continuation**    | Yes (rejection loop)                        | No (preventive via context mgmt)         | Different approach |
| **Non-negotiable Quality** | Yes                                         | Yes                                      | ✅ Overlap         |

---

## Unique to Document A (Existing Mosaic Stack Pattern)

### 1. **Premature Completion Problem**

- **Problem:** Agents claim work is "done" when tests fail, files are missing, or requirements are incomplete
- **Root cause:** Agent interprets partial completion as success
- **Example:** Agent implements feature, tests fail, agent says "done" anyway

### 2. **Rejection Loop & Forced Continuation**

```typescript
// CompletionVerificationEngine
if (!allGatesPassed) {
  return this.forcedContinuationService.generateContinuationPrompt({
    failedGates,
    tone: "non-negotiable",
  });
}
```

**Key innovation:** When agent claims "done" but gates fail, coordinator injects prompt forcing continuation:

```
COMPLETION REJECTED. The following quality gates have failed:
- Build Gate: Compilation errors detected
- Test Gate: 3/15 tests failing

You must continue working until ALL quality gates pass.
This is not optional. Do not claim completion until gates pass.
```

### 3. **State Machine for Completion Claims**

```
Agent Working → Claims Done → Run Gates → Pass/Reject
                                   ↓
                              Reject → Force Continue → Agent Working
```

### 4. **TypeScript/NestJS Implementation**

- Full production-ready service code
- QualityOrchestrator service
- Gate interfaces and implementations
- Dependency injection architecture

### 5. **CompletionVerificationEngine**

- Intercepts agent completion claims
- Runs all gates synchronously
- Blocks "done" status until gates pass

---

## Unique to Document B (New Quality-Rails Orchestration)

### 1. **Context Exhaustion Problem**

- **Problem:** AI orchestrators pause at 95% context usage, losing autonomy
- **Root cause:** Linear context growth without compaction
- **Example:** M4 session completed 11 issues, paused at 95%, required manual restart

### 2. **50% Rule for Issue Sizing**

```
Issue context estimate MUST NOT exceed 50% of target agent's context limit.

Example:
- Sonnet agent: 200K context limit
- Maximum issue estimate: 100K tokens
- Reasoning: Leaves 100K for system prompts, conversation, safety buffer
```

### 3. **Agent Profiles & Model Assignment**

```python
AGENT_PROFILES = {
    'opus': {
        'context_limit': 200000,
        'cost_per_mtok': 15.00,
        'capabilities': ['high', 'medium', 'low']
    },
    'sonnet': {
        'context_limit': 200000,
        'cost_per_mtok': 3.00,
        'capabilities': ['medium', 'low']
    },
    'glm': {
        'context_limit': 128000,
        'cost_per_mtok': 0.00,  # Self-hosted
        'capabilities': ['medium', 'low']
    }
}
```

**Assignment logic:** Choose cheapest capable agent based on:

- Estimated context usage
- Difficulty level
- Agent capabilities

### 4. **Context Monitoring & Session Rotation**

```python
def monitor_agent_context(agent_id: str) -> ContextAction:
    usage = get_context_usage(agent_id)

    if usage > 0.95:
        return ContextAction.ROTATE_SESSION  # Start fresh agent
    elif usage > 0.80:
        return ContextAction.COMPACT  # Summarize completed work
    else:
        return ContextAction.CONTINUE  # Keep working
```

### 5. **Context Estimation Formula**

```python
def estimate_context(issue: Issue) -> int:
    base = (
        issue.files_to_modify * 7000 +  # Average file size
        issue.implementation_complexity * 20000 +  # Code writing
        issue.test_requirements * 10000 +  # Test writing
        issue.documentation * 3000  # Docs
    )

    buffer = base * 1.3  # 30% safety margin
    return int(buffer)
```

### 6. **Epic Decomposition Workflow**

```
User creates Epic → Coordinator analyzes scope → Decomposes into sub-issues
                                                        ↓
                                        Each issue ≤ 50% agent context limit
                                                        ↓
                                        Assigns metadata: estimated_context, difficulty
```

### 7. **Multi-Model Support**

- Supports Opus, Sonnet, Haiku, GLM, MiniMax, Cogito
- Cost optimization through model selection
- Self-hosted model preference when capable

### 8. **Proactive Context Management**

- Prevents context exhaustion BEFORE it happens
- No manual intervention needed
- Maintains autonomy through entire queue

---

## Overlaps (Both Documents)

### 1. **Non-AI Coordinator Pattern** ✅

Both use deterministic code (not AI) as the orchestrator:

- **Doc A:** TypeScript/NestJS service
- **Doc B:** Python/Node.js coordinator
- **Rationale:** Avoid AI orchestrator context limits and inconsistency

### 2. **Mechanical Quality Gates** ✅

Both enforce quality through automated checks:

**Doc A gates:**

- BuildGate (compilation)
- LintGate (code style)
- TestGate (unit/integration tests)
- CoverageGate (test coverage threshold)

**Doc B gates:**

- lint (code quality)
- typecheck (type safety)
- test (functionality)
- coverage (same as Doc A)

### 3. **Programmatic Enforcement** ✅

Both prevent agent from bypassing quality:

- **Doc A:** Rejection loop blocks completion until gates pass
- **Doc B:** Coordinator enforces gates before allowing next issue
- **Shared principle:** Quality is a requirement, not a suggestion

### 4. **Non-Negotiable Quality Standards** ✅

Both use firm language about quality requirements:

- **Doc A:** "This is not optional. Do not claim completion until gates pass."
- **Doc B:** "Quality gates are mechanical blockers, not suggestions."

### 5. **State Management** ✅

Both track work state programmatically:

- **Doc A:** Agent state machine (working → claimed done → verified → actual done)
- **Doc B:** Issue state in tracking system (pending → in-progress → gate-check → completed)

### 6. **Validation Before Progression** ✅

Both prevent moving forward with broken code:

- **Doc A:** Cannot claim "done" until gates pass
- **Doc B:** Cannot start next issue until current issue passes gates

---

## Complementary Nature

These documents solve **different problems in the same architectural pattern**:

### Document A (Existing): Quality Enforcement

**Problem:** "How do we prevent an agent from claiming work is done when it's not?"
**Solution:** Rejection loop with forced continuation
**Scope:** Single agent working on single issue
**Lifecycle stage:** Task completion verification

### Document B (New): Orchestration at Scale

**Problem:** "How do we manage multiple agents working through dozens of issues without context exhaustion?"
**Solution:** Proactive context management + intelligent agent assignment
**Scope:** Multi-agent orchestration across entire milestone
**Lifecycle stage:** Agent selection, session management, queue progression

### Together They Form:

```
┌─────────────────────────────────────────────────────────┐
│         Non-AI Coordinator (Document B)                 │
│  - Monitors context usage across all agents             │
│  - Assigns issues based on context estimates            │
│  - Rotates agents at 95% context                        │
│  - Enforces 50% rule during issue creation              │
└─────────────────────────┬───────────────────────────────┘
                          │
        ┌─────────────────┼─────────────────┐
        ▼                 ▼                 ▼
   Agent 1           Agent 2           Agent 3
   Issue #42         Issue #57         Issue #89
        │                 │                 │
        └─────────────────┴─────────────────┘
                          │
                          ▼
        ┌─────────────────────────────────────────────────┐
        │   Quality Orchestrator (Document A)             │
        │   - Intercepts completion claims                │
        │   - Runs quality gates                          │
        │   - Forces continuation if gates fail           │
        │   - Only allows "done" when gates pass          │
        └─────────────────────────────────────────────────┘
```

**Document B (new)** manages the **agent lifecycle and orchestration**.
**Document A (existing)** manages the **quality enforcement per agent**.

---

## Integration Recommendations

### Option 1: Merge into Single Document (Recommended)

**Reason:** They're parts of the same system

**Structure:**

```markdown
# Non-AI Coordinator Pattern Architecture

## Part 1: Multi-Agent Orchestration (from Doc B)

- Context management
- Agent assignment
- Session rotation
- 50% rule
- Epic decomposition

## Part 2: Quality Enforcement (from Doc A)

- Premature completion problem
- Quality gates
- Rejection loop
- Forced continuation
- CompletionVerificationEngine

## Part 3: Implementation

- TypeScript/NestJS orchestrator (from Doc A)
- Python coordinator enhancements (from Doc B)
- Integration points
```

### Option 2: Keep Separate, Create Integration Doc

**Reason:** Different audiences (orchestration vs quality enforcement)

**Documents:**

1. `orchestration-architecture.md` (Doc B) - For understanding multi-agent coordination
2. `quality-enforcement-architecture.md` (Doc A) - For understanding quality gates
3. `non-ai-coordinator-integration.md` (NEW) - How they work together

### Option 3: Hierarchical Documentation

**Reason:** Layers of abstraction

```
non-ai-coordinator-pattern.md (Overview)
├── orchestration-layer.md (Doc B content)
└── quality-layer.md (Doc A content)
```

---

## Action Items

Based on overlap analysis, recommend:

1. **Merge the documents** into comprehensive architecture guide
   - Use Doc A's problem statement for quality enforcement
   - Use Doc B's problem statement for context exhaustion
   - Show how both problems require non-AI coordinator
   - Integrate TypeScript implementation with context monitoring

2. **Update Mosaic Stack issue #140**
   - Current: "Document Non-AI Coordinator Pattern Architecture"
   - Expand scope: Include both quality enforcement AND orchestration
   - Reference both problem spaces (L-015 + context exhaustion)

3. **Create unified PoC plan**
   - Phase 1: Context monitoring (from Doc B)
   - Phase 2: Agent assignment logic (from Doc B)
   - Phase 3: Quality gate integration (from Doc A)
   - Phase 4: Forced continuation (from Doc A)

4. **Preserve unique innovations from each**
   - Doc A: Rejection loop, forced continuation prompts
   - Doc B: 50% rule, agent profiles, context estimation formula

---

## Conclusion

**These documents are highly complementary, not duplicative.**

- **~20% overlap:** Both use non-AI coordinator, mechanical gates, non-negotiable quality
- **80% unique value:** Doc A solves premature completion, Doc B solves context exhaustion

**Best path forward:** Merge into single comprehensive architecture document that addresses both problems within the unified non-AI coordinator pattern.

The pattern is:

1. Non-AI coordinator assigns issues based on context estimates (Doc B)
2. Agent works on issue
3. Quality gates enforce completion standards (Doc A)
4. Context monitoring prevents exhaustion (Doc B)
5. Forced continuation prevents premature "done" (Doc A)
6. Next issue assigned when ready (Doc B)

Together they create a **robust, autonomous, quality-enforcing orchestration system** that scales beyond single-agent, single-issue scenarios.

---

**Next Steps:**

1. User review of this analysis
2. Decision on integration approach (Option 1, 2, or 3)
3. Update Mosaic Stack documentation accordingly
4. Proceed with PoC implementation