Files
stack/docs/3-architecture/non-ai-coordinator-overlap-analysis.md
Jason Woltje 903109ea40
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
docs: Add overlap analysis for non-AI coordinator patterns
Detailed comparison showing:
- Existing doc addresses L-015 (premature completion)
- New doc addresses context exhaustion (multi-issue orchestration)
- ~20% overlap (both use non-AI coordinator, mechanical gates)
- 80% complementary (different problems, different solutions)

Recommends merging into comprehensive document (already done).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-31 14:47:59 -06:00

14 KiB

Non-AI Coordinator Pattern - Overlap Analysis

Date: 2026-01-31 Purpose: Identify overlaps and differences between two complementary architecture documents


Documents Compared

Document A: Mosaic Stack Non-AI Coordinator Pattern

Location: /home/jwoltje/src/mosaic-stack/docs/3-architecture/non-ai-coordinator-pattern.md Length: 903 lines Problem Space: L-015 Agent Premature Completion Focus: Single-agent quality enforcement

Document B: Quality-Rails Orchestration Architecture

Location: /home/jwoltje/src/jarvis-brain/docs/work/quality-rails-orchestration-architecture.md Length: ~600 lines Problem Space: Context exhaustion in multi-issue orchestration Focus: Multi-agent lifecycle management at scale


Summary Table

Aspect Document A (Existing) Document B (New) Overlap?
Primary Problem Agents claim "done" prematurely Agents pause at 95% context Different
Coordinator Type Non-AI (TypeScript/NestJS) Non-AI (Python/Node.js) Overlap
Quality Gates BuildGate, LintGate, TestGate, CoverageGate Mechanical gates (lint, typecheck, test) Overlap
Agent Scope Single agent per issue Multi-agent orchestration Different
Context Management Not addressed Core feature (80% compact, 95% rotate) Different
Model Assignment Not addressed Agent profiles + difficulty matching Different
Issue Sizing Not addressed 50% rule, epic decomposition Different
Implementation Status Full TypeScript code Python pseudocode + PoC plan Different
Forced Continuation Yes (rejection loop) No (preventive via context mgmt) Different approach
Non-negotiable Quality Yes Yes Overlap

Unique to Document A (Existing Mosaic Stack Pattern)

1. Premature Completion Problem

  • Problem: Agents claim work is "done" when tests fail, files are missing, or requirements are incomplete
  • Root cause: Agent interprets partial completion as success
  • Example: Agent implements feature, tests fail, agent says "done" anyway

2. Rejection Loop & Forced Continuation

// CompletionVerificationEngine
if (!allGatesPassed) {
  return this.forcedContinuationService.generateContinuationPrompt({
    failedGates,
    tone: "non-negotiable",
  });
}

Key innovation: When agent claims "done" but gates fail, coordinator injects prompt forcing continuation:

COMPLETION REJECTED. The following quality gates have failed:
- Build Gate: Compilation errors detected
- Test Gate: 3/15 tests failing

You must continue working until ALL quality gates pass.
This is not optional. Do not claim completion until gates pass.

3. State Machine for Completion Claims

Agent Working → Claims Done → Run Gates → Pass/Reject
                                   ↓
                              Reject → Force Continue → Agent Working

4. TypeScript/NestJS Implementation

  • Full production-ready service code
  • QualityOrchestrator service
  • Gate interfaces and implementations
  • Dependency injection architecture

5. CompletionVerificationEngine

  • Intercepts agent completion claims
  • Runs all gates synchronously
  • Blocks "done" status until gates pass

Unique to Document B (New Quality-Rails Orchestration)

1. Context Exhaustion Problem

  • Problem: AI orchestrators pause at 95% context usage, losing autonomy
  • Root cause: Linear context growth without compaction
  • Example: M4 session completed 11 issues, paused at 95%, required manual restart

2. 50% Rule for Issue Sizing

Issue context estimate MUST NOT exceed 50% of target agent's context limit.

Example:
- Sonnet agent: 200K context limit
- Maximum issue estimate: 100K tokens
- Reasoning: Leaves 100K for system prompts, conversation, safety buffer

3. Agent Profiles & Model Assignment

AGENT_PROFILES = {
    'opus': {
        'context_limit': 200000,
        'cost_per_mtok': 15.00,
        'capabilities': ['high', 'medium', 'low']
    },
    'sonnet': {
        'context_limit': 200000,
        'cost_per_mtok': 3.00,
        'capabilities': ['medium', 'low']
    },
    'glm': {
        'context_limit': 128000,
        'cost_per_mtok': 0.00,  # Self-hosted
        'capabilities': ['medium', 'low']
    }
}

Assignment logic: Choose cheapest capable agent based on:

  • Estimated context usage
  • Difficulty level
  • Agent capabilities

4. Context Monitoring & Session Rotation

def monitor_agent_context(agent_id: str) -> ContextAction:
    usage = get_context_usage(agent_id)

    if usage > 0.95:
        return ContextAction.ROTATE_SESSION  # Start fresh agent
    elif usage > 0.80:
        return ContextAction.COMPACT  # Summarize completed work
    else:
        return ContextAction.CONTINUE  # Keep working

5. Context Estimation Formula

def estimate_context(issue: Issue) -> int:
    base = (
        issue.files_to_modify * 7000 +  # Average file size
        issue.implementation_complexity * 20000 +  # Code writing
        issue.test_requirements * 10000 +  # Test writing
        issue.documentation * 3000  # Docs
    )

    buffer = base * 1.3  # 30% safety margin
    return int(buffer)

6. Epic Decomposition Workflow

User creates Epic → Coordinator analyzes scope → Decomposes into sub-issues
                                                        ↓
                                        Each issue ≤ 50% agent context limit
                                                        ↓
                                        Assigns metadata: estimated_context, difficulty

7. Multi-Model Support

  • Supports Opus, Sonnet, Haiku, GLM, MiniMax, Cogito
  • Cost optimization through model selection
  • Self-hosted model preference when capable

8. Proactive Context Management

  • Prevents context exhaustion BEFORE it happens
  • No manual intervention needed
  • Maintains autonomy through entire queue

Overlaps (Both Documents)

1. Non-AI Coordinator Pattern

Both use deterministic code (not AI) as the orchestrator:

  • Doc A: TypeScript/NestJS service
  • Doc B: Python/Node.js coordinator
  • Rationale: Avoid AI orchestrator context limits and inconsistency

2. Mechanical Quality Gates

Both enforce quality through automated checks:

Doc A gates:

  • BuildGate (compilation)
  • LintGate (code style)
  • TestGate (unit/integration tests)
  • CoverageGate (test coverage threshold)

Doc B gates:

  • lint (code quality)
  • typecheck (type safety)
  • test (functionality)
  • coverage (same as Doc A)

3. Programmatic Enforcement

Both prevent agent from bypassing quality:

  • Doc A: Rejection loop blocks completion until gates pass
  • Doc B: Coordinator enforces gates before allowing next issue
  • Shared principle: Quality is a requirement, not a suggestion

4. Non-Negotiable Quality Standards

Both use firm language about quality requirements:

  • Doc A: "This is not optional. Do not claim completion until gates pass."
  • Doc B: "Quality gates are mechanical blockers, not suggestions."

5. State Management

Both track work state programmatically:

  • Doc A: Agent state machine (working → claimed done → verified → actual done)
  • Doc B: Issue state in tracking system (pending → in-progress → gate-check → completed)

6. Validation Before Progression

Both prevent moving forward with broken code:

  • Doc A: Cannot claim "done" until gates pass
  • Doc B: Cannot start next issue until current issue passes gates

Complementary Nature

These documents solve different problems in the same architectural pattern:

Document A (Existing): Quality Enforcement

Problem: "How do we prevent an agent from claiming work is done when it's not?" Solution: Rejection loop with forced continuation Scope: Single agent working on single issue Lifecycle stage: Task completion verification

Document B (New): Orchestration at Scale

Problem: "How do we manage multiple agents working through dozens of issues without context exhaustion?" Solution: Proactive context management + intelligent agent assignment Scope: Multi-agent orchestration across entire milestone Lifecycle stage: Agent selection, session management, queue progression

Together They Form:

┌─────────────────────────────────────────────────────────┐
│         Non-AI Coordinator (Document B)                 │
│  - Monitors context usage across all agents             │
│  - Assigns issues based on context estimates            │
│  - Rotates agents at 95% context                        │
│  - Enforces 50% rule during issue creation              │
└─────────────────────────┬───────────────────────────────┘
                          │
        ┌─────────────────┼─────────────────┐
        ▼                 ▼                 ▼
   Agent 1           Agent 2           Agent 3
   Issue #42         Issue #57         Issue #89
        │                 │                 │
        └─────────────────┴─────────────────┘
                          │
                          ▼
        ┌─────────────────────────────────────────────────┐
        │   Quality Orchestrator (Document A)             │
        │   - Intercepts completion claims                │
        │   - Runs quality gates                          │
        │   - Forces continuation if gates fail           │
        │   - Only allows "done" when gates pass          │
        └─────────────────────────────────────────────────┘

Document B (new) manages the agent lifecycle and orchestration. Document A (existing) manages the quality enforcement per agent.


Integration Recommendations

Reason: They're parts of the same system

Structure:

# Non-AI Coordinator Pattern Architecture

## Part 1: Multi-Agent Orchestration (from Doc B)

- Context management
- Agent assignment
- Session rotation
- 50% rule
- Epic decomposition

## Part 2: Quality Enforcement (from Doc A)

- Premature completion problem
- Quality gates
- Rejection loop
- Forced continuation
- CompletionVerificationEngine

## Part 3: Implementation

- TypeScript/NestJS orchestrator (from Doc A)
- Python coordinator enhancements (from Doc B)
- Integration points

Option 2: Keep Separate, Create Integration Doc

Reason: Different audiences (orchestration vs quality enforcement)

Documents:

  1. orchestration-architecture.md (Doc B) - For understanding multi-agent coordination
  2. quality-enforcement-architecture.md (Doc A) - For understanding quality gates
  3. non-ai-coordinator-integration.md (NEW) - How they work together

Option 3: Hierarchical Documentation

Reason: Layers of abstraction

non-ai-coordinator-pattern.md (Overview)
├── orchestration-layer.md (Doc B content)
└── quality-layer.md (Doc A content)

Action Items

Based on overlap analysis, recommend:

  1. Merge the documents into comprehensive architecture guide

    • Use Doc A's problem statement for quality enforcement
    • Use Doc B's problem statement for context exhaustion
    • Show how both problems require non-AI coordinator
    • Integrate TypeScript implementation with context monitoring
  2. Update Mosaic Stack issue #140

    • Current: "Document Non-AI Coordinator Pattern Architecture"
    • Expand scope: Include both quality enforcement AND orchestration
    • Reference both problem spaces (L-015 + context exhaustion)
  3. Create unified PoC plan

    • Phase 1: Context monitoring (from Doc B)
    • Phase 2: Agent assignment logic (from Doc B)
    • Phase 3: Quality gate integration (from Doc A)
    • Phase 4: Forced continuation (from Doc A)
  4. Preserve unique innovations from each

    • Doc A: Rejection loop, forced continuation prompts
    • Doc B: 50% rule, agent profiles, context estimation formula

Conclusion

These documents are highly complementary, not duplicative.

  • ~20% overlap: Both use non-AI coordinator, mechanical gates, non-negotiable quality
  • 80% unique value: Doc A solves premature completion, Doc B solves context exhaustion

Best path forward: Merge into single comprehensive architecture document that addresses both problems within the unified non-AI coordinator pattern.

The pattern is:

  1. Non-AI coordinator assigns issues based on context estimates (Doc B)
  2. Agent works on issue
  3. Quality gates enforce completion standards (Doc A)
  4. Context monitoring prevents exhaustion (Doc B)
  5. Forced continuation prevents premature "done" (Doc A)
  6. Next issue assigned when ready (Doc B)

Together they create a robust, autonomous, quality-enforcing orchestration system that scales beyond single-agent, single-issue scenarios.


Next Steps:

  1. User review of this analysis
  2. Decision on integration approach (Option 1, 2, or 3)
  3. Update Mosaic Stack documentation accordingly
  4. Proceed with PoC implementation