# Quality-Rails Orchestration Architecture

**Version**: 1.0
**Date**: 2026-01-31
**Status**: Proposed - Proof of Concept Required
**Authors**: Jason Woltje + Claude

---

## Executive Summary

A **non-AI coordinator** pattern for autonomous agent swarm orchestration with mechanical quality enforcement and intelligent context management.

**Key Innovation:** Separate coordination logic (deterministic code) from execution (AI agents), enabling infinite runtime, cost optimization, and guaranteed quality through mechanical gates.

**Core Principles:**

1. **Non-AI coordinator** - No context limit, runs forever
2. **Mechanical quality gates** - Lint, typecheck, test (not AI-judged)
3. **Context monitoring** - Track and manage AI agent capacity
4. **Model flexibility** - Assign right model for each task
5. **50% rule** - Issues never exceed 50% of agent context limit

---

## Problem Statement

### Current State: AI-Orchestrated Agents

```
AI Orchestrator (Opus/Sonnet)
├── Has context limit (200K tokens)
├── Context grows linearly during multi-issue work
├── At 95% usage: Pauses for confirmation (loses autonomy)
├── Manual intervention required (defeats automation)
└── Cannot work through large issue queues unattended

Result: Autonomous orchestration fails at scale
```

**Observed behavior (M4 milestone):**

- 11 issues completed in 97 minutes
- Agent paused at 95% context usage
- Asked "Should I continue?" (lost autonomy)
- 10 issues remained incomplete (32% incomplete)
- No compaction occurred
- Manual restart required

### Root Causes

1. **Context accumulation** - No automatic compaction
2. **AI risk aversion** - Conservative pause at high context
3. **Monolithic design** - Coordinator has same limits as workers
4. **No capacity planning** - Issues not sized for agent limits
5. **Model inflexibility** - One model for all tasks (waste)

---

## Solution: Non-AI Coordinator Architecture

### System Architecture

```
┌─────────────────────────────────────────────────────────┐
│ Non-AI Coordinator (Python/Node.js)                     │
├─────────────────────────────────────────────────────────┤
│ • No context limit (it's just code)                     │
│ • Reads issue queue                                     │
│ • Assigns agents based on context + difficulty          │
│ • Monitors agent context usage                          │
│ • Enforces mechanical quality gates                     │
│ • Triggers compaction at threshold                      │
│ • Rotates agents when exhausted                         │
│ • Infinite runtime capability                           │
└─────────────────────────────────────────────────────────┘
                            ↓
┌─────────────────────────────────────────────────────────┐
│ AI Swarm Controller (OpenClaw Session)                  │
├─────────────────────────────────────────────────────────┤
│ • Coordinates subagent work                             │
│ • Context monitored externally                          │
│ • Receives compaction commands                          │
│ • Replaceable/rotatable                                 │
│ • Just an executor (not decision-maker)                 │
└─────────────────────────────────────────────────────────┘
                            ↓
┌─────────────────────────────────────────────────────────┐
│ Subagents (OpenClaw Workers)                            │
├─────────────────────────────────────────────────────────┤
│ • Execute individual issues                             │
│ • Report to swarm controller                            │
│ • Quality-gated by coordinator                          │
│ • Model-specific (Opus, Sonnet, Haiku, etc.)           │
└─────────────────────────────────────────────────────────┘
```

### Separation of Concerns

| Concern              | Non-AI Coordinator                    | AI Swarm Controller | Subagents      |
| -------------------- | ------------------------------------- | ------------------- | -------------- |
| **Context limit**    | None (immortal)                       | 200K tokens         | 200K tokens    |
| **Lifespan**         | Entire milestone                      | Rotatable           | Per-issue      |
| **Decision-making**  | Model assignment, quality enforcement | Work coordination   | Task execution |
| **Quality gates**    | Enforces mechanically                 | N/A                 | N/A            |
| **State management** | Persistent                            | Can be rotated      | Ephemeral      |
| **Cost**             | Minimal (code execution)              | Per-token           | Per-token      |

---

## The 50% Rule

### Issue Size Constraint

**Rule:** Each issue must consume no more than **50% of the assigned agent's context limit.**

**Rationale:**

```
Agent context limit: 200,000 tokens

Overhead consumption:
├── System prompts: 10-20K tokens
├── Project context: 20-30K tokens
├── Code reading: 20-40K tokens
├── Execution buffer: 10-20K tokens
└── Total overhead: 60-110K tokens (30-55%)

Available for issue: 90-140K tokens
Safe limit (50%): 100K tokens

This allows:
- Room for overhead
- Iteration and debugging
- Unexpected complexity
- No mid-task exhaustion
```

**Enforcement:**

- Issue creation MUST include context estimate
- Coordinator validates estimate before assignment
- If estimate > 50% of target agent: Reject or decompose

### Epic Decomposition

**Large epics must be split:**

```
Epic: Authentication System
Estimated context: 300K tokens total
Target agent: Sonnet (200K limit)
Issue size limit: 100K tokens (50% rule)

Decomposition required:
├── Issue 1: Auth middleware [20K ctx | Medium]
├── Issue 2: JWT implementation [25K ctx | Medium]
├── Issue 3: User sessions [30K ctx | Medium]
├── Issue 4: Login endpoints [25K ctx | Low]
├── Issue 5: RBAC permissions [20K ctx | Medium]
└── Total: 120K ctx across 5 issues

Each issue < 100K ✅
Epic fits within multiple agent sessions ✅
```

---

## Agent Profiles

### Model Capabilities Matrix

```json
{
  "agents": {
    "opus": {
      "model": "claude-opus-4-5",
      "context_limit": 200000,
      "difficulty_levels": ["high", "medium", "low"],
      "cost_per_1k_input": 0.015,
      "cost_per_1k_output": 0.075,
      "speed": "slow",
      "use_cases": [
        "Complex refactoring",
        "Architecture design",
        "Difficult debugging",
        "Novel algorithms"
      ]
    },
    "sonnet": {
      "model": "claude-sonnet-4-5",
      "context_limit": 200000,
      "difficulty_levels": ["medium", "low"],
      "cost_per_1k_input": 0.003,
      "cost_per_1k_output": 0.015,
      "speed": "medium",
      "use_cases": ["API endpoints", "Business logic", "Standard features", "Test writing"]
    },
    "haiku": {
      "model": "claude-haiku-4",
      "context_limit": 200000,
      "difficulty_levels": ["low"],
      "cost_per_1k_input": 0.00025,
      "cost_per_1k_output": 0.00125,
      "speed": "fast",
      "use_cases": ["CRUD operations", "Config changes", "Documentation", "Simple fixes"]
    },
    "glm": {
      "model": "glm-4-plus",
      "context_limit": 128000,
      "difficulty_levels": ["medium", "low"],
      "cost_per_1k_input": 0.001,
      "cost_per_1k_output": 0.001,
      "speed": "fast",
      "use_cases": ["Standard features (lower cost)", "International projects", "High-volume tasks"]
    },
    "minimax": {
      "model": "minimax-01",
      "context_limit": 128000,
      "difficulty_levels": ["low"],
      "cost_per_1k_input": 0.0005,
      "cost_per_1k_output": 0.0005,
      "speed": "fast",
      "use_cases": ["Simple tasks (very low cost)", "Bulk operations", "Non-critical work"]
    }
  }
}
```

### Difficulty Levels Defined

**Low Difficulty:**

- CRUD operations (create, read, update, delete)
- Configuration changes
- Documentation updates
- Simple bug fixes
- UI text changes
- Adding logging/comments

**Criteria:**

- Well-established patterns
- No complex logic
- Minimal dependencies
- Low risk of regressions

**Medium Difficulty:**

- API endpoint implementation
- Business logic features
- Database schema changes
- Integration with external services
- Standard refactoring
- Test suite additions

**Criteria:**

- Moderate complexity
- Some novel logic required
- Multiple file changes
- Moderate risk of side effects

**High Difficulty:**

- Architecture changes
- Complex algorithms
- Performance optimization
- Security-critical features
- Large-scale refactoring
- Novel problem-solving

**Criteria:**

- High complexity
- Requires deep understanding
- Cross-cutting concerns
- High risk of regressions

---

## Issue Metadata Schema

### Required Fields

```json
{
  "issue": {
    "id": 123,
    "title": "Add JWT authentication [25K | Medium]",
    "description": "Implement JWT token-based authentication...",

    "metadata": {
      "estimated_context": 25000,
      "difficulty": "medium",
      "epic": "auth-system",
      "dependencies": [122],
      "quality_gates": ["lint", "typecheck", "test", "security-scan"],

      "assignment": {
        "suggested_models": ["sonnet", "opus"],
        "assigned_model": null,
        "assigned_agent_id": null
      },

      "tracking": {
        "created_at": "2026-01-31T10:00:00Z",
        "started_at": null,
        "completed_at": null,
        "actual_context_used": null,
        "duration_minutes": null
      }
    }
  }
}
```

### Issue Title Format

**Template:** `[Feature name] [Context estimate | Difficulty]`

**Examples:**

```
✅ "Add JWT authentication [25K | Medium]"
✅ "Fix typo in README [2K | Low]"
✅ "Refactor auth system [80K | High]"
✅ "Implement rate limiting [30K | Medium]"
✅ "Add OpenAPI docs [15K | Low]"

❌ "Add authentication" (missing metadata)
❌ "Refactor auth [High]" (missing context estimate)
❌ "Fix bug [20K]" (missing difficulty)
```

### Issue Body Template

```markdown
## Context Estimate

**Estimated tokens:** 25,000 (12.5% of 200K limit)

## Difficulty

**Level:** Medium

**Rationale:**

- Requires understanding JWT spec
- Integration with existing auth middleware
- Security considerations (token signing, validation)
- Test coverage for auth flows

## Suggested Models

- Primary: Sonnet (cost-effective for medium difficulty)
- Fallback: Opus (if complexity increases)

## Dependencies

- #122 (Auth middleware must be complete first)

## Quality Gates

- [x] Lint (ESLint + Prettier)
- [x] Typecheck (TypeScript strict mode)
- [x] Tests (Unit + Integration, 80%+ coverage)
- [x] Security scan (No hardcoded secrets, safe crypto)

## Task Description

[Detailed description of work to be done...]

## Acceptance Criteria

- [ ] JWT tokens generated on login
- [ ] Tokens validated on protected routes
- [ ] Token refresh mechanism implemented
- [ ] Tests cover happy path + edge cases
- [ ] Documentation updated

## Context Breakdown

| Activity                          | Estimated Tokens |
| --------------------------------- | ---------------- |
| Read existing auth code           | 5,000            |
| Implement JWT library integration | 8,000            |
| Write middleware logic            | 6,000            |
| Add tests                         | 4,000            |
| Update documentation              | 2,000            |
| **Total**                         | **25,000**       |
```

---

## Context Estimation Guidelines

### Estimation Formula

```
Estimated Context = (
    Files to read × 5-10K per file +
    Implementation complexity × 10-30K +
    Test writing × 5-15K +
    Documentation × 2-5K +
    Buffer for iteration × 20-50%
)
```

### Examples

**Simple (Low Difficulty):**

```
Task: Fix typo in README.md

Files to read: 1 × 5K = 5K
Implementation: Minimal = 1K
Tests: None = 0K
Docs: None = 0K
Buffer: 20% = 1.2K
Total: ~7K tokens

Rounded estimate: 10K tokens (conservative)
```

**Medium (Medium Difficulty):**

```
Task: Add API endpoint for user profile

Files to read: 3 × 8K = 24K
Implementation: Standard endpoint = 15K
Tests: Unit + integration = 10K
Docs: API spec update = 3K
Buffer: 30% = 15.6K
Total: ~67.6K tokens

Rounded estimate: 70K tokens
```

**Complex (High Difficulty):**

```
Task: Refactor authentication system

Files to read: 8 × 10K = 80K
Implementation: Complex refactor = 30K
Tests: Extensive = 15K
Docs: Architecture guide = 5K
Buffer: 50% = 65K
Total: ~195K tokens

⚠️ Exceeds 50% rule (100K limit)!
Action: Split into 2-3 smaller issues
```

### Estimation Accuracy Tracking

**After each issue, measure variance:**

```python
variance = actual_context - estimated_context
variance_pct = (variance / estimated_context) * 100

# Log for calibration
if variance_pct > 20%:
    print(f"⚠️ Estimate off by {variance_pct}%")
    print(f"Estimated: {estimated_context}")
    print(f"Actual: {actual_context}")
    print("Review estimation guidelines")
```

**Over time, refine estimation formula based on historical data.**

---

## Coordinator Implementation

### Core Algorithm

```python
class QualityRailsCoordinator:
    """Non-AI coordinator for agent swarm orchestration."""

    def __init__(self, issue_queue, agent_profiles, quality_gates):
        self.issues = issue_queue
        self.agents = agent_profiles
        self.gates = quality_gates
        self.current_controller = None

    def run(self):
        """Main orchestration loop."""

        # Validate all issues
        self.validate_issues()

        # Sort by dependencies and priority
        self.issues = self.topological_sort(self.issues)

        # Start AI swarm controller
        self.start_swarm_controller()

        # Process queue
        for issue in self.issues:
            print(f"\n{'='*60}")
            print(f"Starting issue #{issue['id']}: {issue['title']}")
            print(f"{'='*60}\n")

            # Assign optimal agent
            agent = self.assign_agent(issue)

            # Monitor and execute
            self.execute_issue(issue, agent)

            # Log metrics
            self.log_metrics(issue, agent)

        print("\n✅ All issues complete. Queue empty.")

    def validate_issues(self):
        """Ensure all issues have required metadata."""
        for issue in self.issues:
            if not issue.get("estimated_context"):
                raise ValueError(
                    f"Issue {issue['id']} missing context estimate"
                )

            if not issue.get("difficulty"):
                raise ValueError(
                    f"Issue {issue['id']} missing difficulty rating"
                )

            # Validate 50% rule
            max_context = max(
                agent["context_limit"]
                for agent in self.agents.values()
            )

            if issue["estimated_context"] > (max_context * 0.5):
                raise ValueError(
                    f"Issue {issue['id']} exceeds 50% rule: "
                    f"{issue['estimated_context']} > {max_context * 0.5}"
                )

    def assign_agent(self, issue):
        """Assign optimal agent based on context + difficulty."""
        context_est = issue["estimated_context"]
        difficulty = issue["difficulty"]

        # Filter models that can handle this issue
        candidates = []

        for model_name, profile in self.agents.items():
            # Check context capacity (50% rule)
            if context_est <= (profile["context_limit"] * 0.5):
                # Check difficulty match
                if difficulty in profile["difficulty_levels"]:
                    # Calculate cost
                    cost = (
                        context_est * profile["cost_per_1k_input"] / 1000
                    )

                    candidates.append({
                        "model": model_name,
                        "profile": profile,
                        "cost": cost
                    })

        if not candidates:
            raise ValueError(
                f"No model can handle issue {issue['id']}: "
                f"{context_est}K ctx, {difficulty} difficulty"
            )

        # Optimize for cost (prefer cheaper models when capable)
        candidates.sort(key=lambda x: x["cost"])
        selected = candidates[0]

        print(f"📋 Assigned {selected['model']} to issue {issue['id']}")
        print(f"   Context: {context_est}K tokens")
        print(f"   Difficulty: {difficulty}")
        print(f"   Estimated cost: ${selected['cost']:.4f}")

        return selected

    def execute_issue(self, issue, agent):
        """Execute issue with assigned agent."""

        # Start agent session
        session = self.start_agent_session(agent["profile"])

        # Track context
        session_context = 0
        context_limit = agent["profile"]["context_limit"]

        # Execution loop
        iteration = 0
        while not issue.get("complete"):
            iteration += 1

            # Check context health
            if session_context > (context_limit * 0.80):
                print(f"⚠️ Context at 80% ({session_context}/{context_limit})")
                print("   Triggering compaction...")
                session_context = self.compact_session(session)
                print(f"   ✓ Compacted to {session_context} tokens")

            if session_context > (context_limit * 0.95):
                print(f"🔄 Context at 95% - rotating agent session")
                state = session.save_state()
                session.terminate()
                session = self.start_agent_session(agent["profile"])
                session.load_state(state)
                session_context = session.current_context()

            # Agent executes step
            print(f"   Iteration {iteration}...")
            result = session.execute_step(issue)

            # Update context tracking
            session_context += result["context_used"]

            # Check if agent claims completion
            if result.get("claims_complete"):
                print("   Agent claims completion. Running quality gates...")

                # Enforce quality gates
                gate_results = self.gates.validate(result)

                if gate_results["passed"]:
                    print("   ✅ All quality gates passed")
                    issue["complete"] = True
                    issue["actual_context_used"] = session_context
                else:
                    print("   ❌ Quality gates failed:")
                    for gate, errors in gate_results["failures"].items():
                        print(f"      {gate}: {errors}")

                    # Send feedback to agent
                    session.send_feedback(gate_results["failures"])

        # Clean up
        session.terminate()

    def start_swarm_controller(self):
        """Start AI swarm controller (OpenClaw session)."""
        # Initialize OpenClaw swarm controller
        # This coordinates subagents but is managed by this coordinator
        pass

    def start_agent_session(self, agent_profile):
        """Start individual agent session."""
        # Start agent with specific model
        # Return session handle
        pass

    def compact_session(self, session):
        """Trigger compaction in agent session."""
        summary = session.send_message(
            "Summarize all completed work concisely. "
            "Keep only essential context for continuation."
        )

        session.reset_history_with_summary(summary)

        return session.current_context()

    def topological_sort(self, issues):
        """Sort issues by dependencies."""
        # Implement dependency graph sorting
        # Ensures dependencies complete before dependents
        pass

    def log_metrics(self, issue, agent):
        """Log issue completion metrics."""
        metrics = {
            "issue_id": issue["id"],
            "title": issue["title"],
            "estimated_context": issue["estimated_context"],
            "actual_context": issue.get("actual_context_used"),
            "variance": (
                issue.get("actual_context_used", 0) -
                issue["estimated_context"]
            ),
            "model": agent["model"],
            "difficulty": issue["difficulty"],
            "timestamp": datetime.now().isoformat()
        }

        # Write to metrics file
        with open("orchestrator-metrics.jsonl", "a") as f:
            f.write(json.dumps(metrics) + "\n")
```

### Quality Gates Implementation

```python
class QualityGates:
    """Mechanical quality enforcement."""

    def validate(self, result):
        """Run all quality gates."""

        gates = {
            "lint": self.run_lint,
            "typecheck": self.run_typecheck,
            "test": self.run_tests,
            "security": self.run_security_scan
        }

        failures = {}

        for gate_name, gate_fn in gates.items():
            gate_result = gate_fn(result)

            if not gate_result["passed"]:
                failures[gate_name] = gate_result["errors"]

        return {
            "passed": len(failures) == 0,
            "failures": failures
        }

    def run_lint(self, result):
        """Run linting (ESLint, Prettier, etc.)."""
        # Execute: pnpm turbo run lint
        # Parse output
        # Return pass/fail + errors
        pass

    def run_typecheck(self, result):
        """Run TypeScript type checking."""
        # Execute: pnpm turbo run typecheck
        # Parse output
        # Return pass/fail + errors
        pass

    def run_tests(self, result):
        """Run test suite."""
        # Execute: pnpm turbo run test
        # Check coverage threshold
        # Return pass/fail + errors
        pass

    def run_security_scan(self, result):
        """Run security checks."""
        # Execute: detect-secrets scan
        # Check for vulnerabilities
        # Return pass/fail + errors
        pass
```

---

## Issue Creation Process

### Workflow

```
1. Epic Planning Agent
   ├── Receives epic description
   ├── Estimates total context required
   ├── Checks against agent limits
   └── Decomposes into issues if needed

2. Issue Creation
   ├── For each sub-issue:
   │   ├── Estimate context (formula + buffer)
   │   ├── Assign difficulty level
   │   ├── Validate 50% rule
   │   └── Create issue with metadata

3. Validation
   ├── Coordinator validates all issues
   ├── Checks for missing metadata
   └── Rejects oversized issues

4. Execution
   ├── Coordinator assigns agents
   ├── Monitors context usage
   ├── Enforces quality gates
   └── Logs metrics for calibration
```

### Epic Planning Agent Prompt

````markdown
You are an Epic Planning Agent. Your job is to decompose epics into
properly-sized issues for autonomous execution.

## Guidelines

1. **Estimate total context:**
   - Read all related code files
   - Estimate implementation complexity
   - Account for tests and documentation
   - Add 30% buffer for iteration

2. **Apply 50% rule:**
   - Target agent context limit: 200K tokens
   - Maximum issue size: 100K tokens
   - If epic exceeds 100K: Split into multiple issues

3. **Assign difficulty:**
   - Low: CRUD, config, docs, simple fixes
   - Medium: APIs, business logic, integrations
   - High: Architecture, complex algorithms, refactors

4. **Create issues with metadata:**
   ```json
   {
     "title": "[Feature] [Context | Difficulty]",
     "estimated_context": 25000,
     "difficulty": "medium",
     "epic": "epic-name",
     "dependencies": [],
     "quality_gates": ["lint", "typecheck", "test"]
   }
   ```
````

5. **Validate:**
   - Each issue < 100K tokens ✓
   - Dependencies are explicit ✓
   - Difficulty matches complexity ✓
   - Quality gates defined ✓

## Output Format

Create a JSON array of issues:

```json
[
  {
    "id": 1,
    "title": "Add auth middleware [20K | Medium]",
    "estimated_context": 20000,
    "difficulty": "medium",
    ...
  },
  ...
]
```

```

---

## Proof of Concept Plan

### PoC Goals

1. **Validate non-AI coordinator pattern** - Prove it can manage agent lifecycle
2. **Test context monitoring** - Verify we can track and react to context usage
3. **Validate quality gates** - Ensure mechanical enforcement works
4. **Test agent assignment** - Confirm model selection logic
5. **Measure metrics** - Collect data on estimate accuracy

### PoC Scope

**Small test project:**
- 5-10 simple issues
- Mix of difficulty levels
- Use Haiku + Sonnet (cheap)
- Real quality gates (lint, typecheck, test)

**What we'll build:**
```

poc/
├── coordinator.py # Non-AI coordinator
├── agent_profiles.json # Model capabilities
├── issues.json # Test issue queue
├── quality_gates.py # Mechanical gates
└── metrics.jsonl # Results log

````

**Test cases:**
1. Low difficulty issue → Haiku (cheap, fast)
2. Medium difficulty issue → Sonnet (balanced)
3. Oversized issue → Should reject (50% rule)
4. Issue with failed quality gate → Agent retries
5. High context issue → Triggers compaction

### PoC Success Criteria

- [ ] Coordinator completes all issues without human intervention
- [ ] Quality gates enforce standards (at least 1 failure caught + fixed)
- [ ] Context monitoring works (log shows tracking)
- [ ] Agent assignment is optimal (cheapest capable model chosen)
- [ ] Metrics collected for all issues
- [ ] No agent exhaustion (50% rule enforced)

### PoC Timeline

**Week 1: Foundation**
- [ ] Build coordinator skeleton
- [ ] Implement agent profiles
- [ ] Create test issue queue
- [ ] Set up quality gates

**Week 2: Integration**
- [ ] Connect to Claude API
- [ ] Implement context monitoring
- [ ] Test agent lifecycle
- [ ] Validate quality gates

**Week 3: Testing**
- [ ] Run full PoC
- [ ] Collect metrics
- [ ] Analyze results
- [ ] Document findings

**Week 4: Refinement**
- [ ] Fix issues discovered
- [ ] Optimize assignment logic
- [ ] Update documentation
- [ ] Prepare for production

---

## Production Deployment (Post-PoC)

### Integration with Mosaic Stack

**Phase 1: Core Implementation**
- Implement coordinator in Mosaic Stack codebase
- Add agent profiles to configuration
- Integrate with existing OpenClaw infrastructure
- Add quality gates to CI/CD

**Phase 2: Issue Management**
- Update issue templates with metadata fields
- Train team on estimation guidelines
- Build issue validation tools
- Create epic planning workflows

**Phase 3: Monitoring**
- Add coordinator metrics dashboard
- Track estimate accuracy over time
- Monitor cost optimization
- Alert on failures

**Phase 4: Scale**
- Expand to all milestones
- Add more agent types (GLM, MiniMax)
- Optimize for multi-epic orchestration
- Build self-learning estimation

---

## Open Questions (To Resolve in PoC)

1. **Compaction effectiveness:** How much context does summarization actually free?
2. **Estimation accuracy:** How close are initial estimates to reality?
3. **Model selection:** Is cost-optimized assignment actually optimal, or should we prioritize speed/quality?
4. **Quality gate timing:** Should gates run after each commit, or only at issue completion?
5. **Session rotation overhead:** What's the cost of rotating agents vs compaction?
6. **Dependency handling:** How to ensure dependencies are truly complete before starting dependent issues?

---

## Success Metrics

### PoC Metrics

- **Autonomy:** % of issues completed without human intervention
- **Quality:** % of commits passing all quality gates on first try
- **Cost:** Total cost vs baseline (all-Opus)
- **Accuracy:** Context estimate variance (target: <20%)
- **Efficiency:** Issues per hour

### Production Metrics

- **Throughput:** Issues completed per day
- **Quality rate:** % passing all gates first try
- **Context efficiency:** Avg context used vs estimated
- **Cost savings:** % saved vs all-Opus baseline
- **Agent utilization:** % of time agents are productive (not waiting)

---

## Appendix: Agent Skill Definitions

### Agent Skills Schema

```json
{
  "skills": {
    "backend-api": {
      "description": "Build RESTful APIs and endpoints",
      "difficulty": "medium",
      "typical_context": "20-40K",
      "quality_gates": ["lint", "typecheck", "test", "api-spec"]
    },
    "frontend-ui": {
      "description": "Build UI components and pages",
      "difficulty": "medium",
      "typical_context": "15-35K",
      "quality_gates": ["lint", "typecheck", "test", "a11y"]
    },
    "database-schema": {
      "description": "Design and migrate database schemas",
      "difficulty": "high",
      "typical_context": "30-50K",
      "quality_gates": ["typecheck", "test", "migration-validate"]
    },
    "documentation": {
      "description": "Write technical documentation",
      "difficulty": "low",
      "typical_context": "5-15K",
      "quality_gates": ["spelling", "markdown-lint"]
    },
    "refactoring": {
      "description": "Refactor existing code",
      "difficulty": "high",
      "typical_context": "40-80K",
      "quality_gates": ["lint", "typecheck", "test", "no-behavior-change"]
    },
    "bug-fix": {
      "description": "Fix reported bugs",
      "difficulty": "low-medium",
      "typical_context": "10-30K",
      "quality_gates": ["lint", "typecheck", "test", "regression-test"]
    }
  }
}
````

**Usage:**

- Issues can reference skills: `"skills": ["backend-api", "database-schema"]`
- Coordinator uses skill metadata to inform estimates
- Helps with consistent difficulty assignment

---

## Document Status

**Version:** 1.0 - Proposed Architecture
**Next Steps:** Build Proof of Concept
**Approval Required:** After successful PoC

---

**End of Architecture Document**