All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
1049 lines
30 KiB
Markdown
1049 lines
30 KiB
Markdown
# Quality-Rails Orchestration Architecture
|
||
|
||
**Version**: 1.0
|
||
**Date**: 2026-01-31
|
||
**Status**: Proposed - Proof of Concept Required
|
||
**Authors**: Jason Woltje + Claude
|
||
|
||
---
|
||
|
||
## Executive Summary
|
||
|
||
A **non-AI coordinator** pattern for autonomous agent swarm orchestration with mechanical quality enforcement and intelligent context management.
|
||
|
||
**Key Innovation:** Separate coordination logic (deterministic code) from execution (AI agents), enabling infinite runtime, cost optimization, and guaranteed quality through mechanical gates.
|
||
|
||
**Core Principles:**
|
||
|
||
1. **Non-AI coordinator** - No context limit, runs forever
|
||
2. **Mechanical quality gates** - Lint, typecheck, test (not AI-judged)
|
||
3. **Context monitoring** - Track and manage AI agent capacity
|
||
4. **Model flexibility** - Assign right model for each task
|
||
5. **50% rule** - Issues never exceed 50% of agent context limit
|
||
|
||
---
|
||
|
||
## Problem Statement
|
||
|
||
### Current State: AI-Orchestrated Agents
|
||
|
||
```
|
||
AI Orchestrator (Opus/Sonnet)
|
||
├── Has context limit (200K tokens)
|
||
├── Context grows linearly during multi-issue work
|
||
├── At 95% usage: Pauses for confirmation (loses autonomy)
|
||
├── Manual intervention required (defeats automation)
|
||
└── Cannot work through large issue queues unattended
|
||
|
||
Result: Autonomous orchestration fails at scale
|
||
```
|
||
|
||
**Observed behavior (M4 milestone):**
|
||
|
||
- 11 issues completed in 97 minutes
|
||
- Agent paused at 95% context usage
|
||
- Asked "Should I continue?" (lost autonomy)
|
||
- 10 issues remained incomplete (32% incomplete)
|
||
- No compaction occurred
|
||
- Manual restart required
|
||
|
||
### Root Causes
|
||
|
||
1. **Context accumulation** - No automatic compaction
|
||
2. **AI risk aversion** - Conservative pause at high context
|
||
3. **Monolithic design** - Coordinator has same limits as workers
|
||
4. **No capacity planning** - Issues not sized for agent limits
|
||
5. **Model inflexibility** - One model for all tasks (waste)
|
||
|
||
---
|
||
|
||
## Solution: Non-AI Coordinator Architecture
|
||
|
||
### System Architecture
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────────┐
|
||
│ Non-AI Coordinator (Python/Node.js) │
|
||
├─────────────────────────────────────────────────────────┤
|
||
│ • No context limit (it's just code) │
|
||
│ • Reads issue queue │
|
||
│ • Assigns agents based on context + difficulty │
|
||
│ • Monitors agent context usage │
|
||
│ • Enforces mechanical quality gates │
|
||
│ • Triggers compaction at threshold │
|
||
│ • Rotates agents when exhausted │
|
||
│ • Infinite runtime capability │
|
||
└─────────────────────────────────────────────────────────┘
|
||
↓
|
||
┌─────────────────────────────────────────────────────────┐
|
||
│ AI Swarm Controller (OpenClaw Session) │
|
||
├─────────────────────────────────────────────────────────┤
|
||
│ • Coordinates subagent work │
|
||
│ • Context monitored externally │
|
||
│ • Receives compaction commands │
|
||
│ • Replaceable/rotatable │
|
||
│ • Just an executor (not decision-maker) │
|
||
└─────────────────────────────────────────────────────────┘
|
||
↓
|
||
┌─────────────────────────────────────────────────────────┐
|
||
│ Subagents (OpenClaw Workers) │
|
||
├─────────────────────────────────────────────────────────┤
|
||
│ • Execute individual issues │
|
||
│ • Report to swarm controller │
|
||
│ • Quality-gated by coordinator │
|
||
│ • Model-specific (Opus, Sonnet, Haiku, etc.) │
|
||
└─────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
### Separation of Concerns
|
||
|
||
| Concern | Non-AI Coordinator | AI Swarm Controller | Subagents |
|
||
| -------------------- | ------------------------------------- | ------------------- | -------------- |
|
||
| **Context limit** | None (immortal) | 200K tokens | 200K tokens |
|
||
| **Lifespan** | Entire milestone | Rotatable | Per-issue |
|
||
| **Decision-making** | Model assignment, quality enforcement | Work coordination | Task execution |
|
||
| **Quality gates** | Enforces mechanically | N/A | N/A |
|
||
| **State management** | Persistent | Can be rotated | Ephemeral |
|
||
| **Cost** | Minimal (code execution) | Per-token | Per-token |
|
||
|
||
---
|
||
|
||
## The 50% Rule
|
||
|
||
### Issue Size Constraint
|
||
|
||
**Rule:** Each issue must consume no more than **50% of the assigned agent's context limit.**
|
||
|
||
**Rationale:**
|
||
|
||
```
|
||
Agent context limit: 200,000 tokens
|
||
|
||
Overhead consumption:
|
||
├── System prompts: 10-20K tokens
|
||
├── Project context: 20-30K tokens
|
||
├── Code reading: 20-40K tokens
|
||
├── Execution buffer: 10-20K tokens
|
||
└── Total overhead: 60-110K tokens (30-55%)
|
||
|
||
Available for issue: 90-140K tokens
|
||
Safe limit (50%): 100K tokens
|
||
|
||
This allows:
|
||
- Room for overhead
|
||
- Iteration and debugging
|
||
- Unexpected complexity
|
||
- No mid-task exhaustion
|
||
```
|
||
|
||
**Enforcement:**
|
||
|
||
- Issue creation MUST include context estimate
|
||
- Coordinator validates estimate before assignment
|
||
- If estimate > 50% of target agent: Reject or decompose
|
||
|
||
### Epic Decomposition
|
||
|
||
**Large epics must be split:**
|
||
|
||
```
|
||
Epic: Authentication System
|
||
Estimated context: 300K tokens total
|
||
Target agent: Sonnet (200K limit)
|
||
Issue size limit: 100K tokens (50% rule)
|
||
|
||
Decomposition required:
|
||
├── Issue 1: Auth middleware [20K ctx | Medium]
|
||
├── Issue 2: JWT implementation [25K ctx | Medium]
|
||
├── Issue 3: User sessions [30K ctx | Medium]
|
||
├── Issue 4: Login endpoints [25K ctx | Low]
|
||
├── Issue 5: RBAC permissions [20K ctx | Medium]
|
||
└── Total: 120K ctx across 5 issues
|
||
|
||
Each issue < 100K ✅
|
||
Epic fits within multiple agent sessions ✅
|
||
```
|
||
|
||
---
|
||
|
||
## Agent Profiles
|
||
|
||
### Model Capabilities Matrix
|
||
|
||
```json
|
||
{
|
||
"agents": {
|
||
"opus": {
|
||
"model": "claude-opus-4-5",
|
||
"context_limit": 200000,
|
||
"difficulty_levels": ["high", "medium", "low"],
|
||
"cost_per_1k_input": 0.015,
|
||
"cost_per_1k_output": 0.075,
|
||
"speed": "slow",
|
||
"use_cases": [
|
||
"Complex refactoring",
|
||
"Architecture design",
|
||
"Difficult debugging",
|
||
"Novel algorithms"
|
||
]
|
||
},
|
||
"sonnet": {
|
||
"model": "claude-sonnet-4-5",
|
||
"context_limit": 200000,
|
||
"difficulty_levels": ["medium", "low"],
|
||
"cost_per_1k_input": 0.003,
|
||
"cost_per_1k_output": 0.015,
|
||
"speed": "medium",
|
||
"use_cases": ["API endpoints", "Business logic", "Standard features", "Test writing"]
|
||
},
|
||
"haiku": {
|
||
"model": "claude-haiku-4",
|
||
"context_limit": 200000,
|
||
"difficulty_levels": ["low"],
|
||
"cost_per_1k_input": 0.00025,
|
||
"cost_per_1k_output": 0.00125,
|
||
"speed": "fast",
|
||
"use_cases": ["CRUD operations", "Config changes", "Documentation", "Simple fixes"]
|
||
},
|
||
"glm": {
|
||
"model": "glm-4-plus",
|
||
"context_limit": 128000,
|
||
"difficulty_levels": ["medium", "low"],
|
||
"cost_per_1k_input": 0.001,
|
||
"cost_per_1k_output": 0.001,
|
||
"speed": "fast",
|
||
"use_cases": ["Standard features (lower cost)", "International projects", "High-volume tasks"]
|
||
},
|
||
"minimax": {
|
||
"model": "minimax-01",
|
||
"context_limit": 128000,
|
||
"difficulty_levels": ["low"],
|
||
"cost_per_1k_input": 0.0005,
|
||
"cost_per_1k_output": 0.0005,
|
||
"speed": "fast",
|
||
"use_cases": ["Simple tasks (very low cost)", "Bulk operations", "Non-critical work"]
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
### Difficulty Levels Defined
|
||
|
||
**Low Difficulty:**
|
||
|
||
- CRUD operations (create, read, update, delete)
|
||
- Configuration changes
|
||
- Documentation updates
|
||
- Simple bug fixes
|
||
- UI text changes
|
||
- Adding logging/comments
|
||
|
||
**Criteria:**
|
||
|
||
- Well-established patterns
|
||
- No complex logic
|
||
- Minimal dependencies
|
||
- Low risk of regressions
|
||
|
||
**Medium Difficulty:**
|
||
|
||
- API endpoint implementation
|
||
- Business logic features
|
||
- Database schema changes
|
||
- Integration with external services
|
||
- Standard refactoring
|
||
- Test suite additions
|
||
|
||
**Criteria:**
|
||
|
||
- Moderate complexity
|
||
- Some novel logic required
|
||
- Multiple file changes
|
||
- Moderate risk of side effects
|
||
|
||
**High Difficulty:**
|
||
|
||
- Architecture changes
|
||
- Complex algorithms
|
||
- Performance optimization
|
||
- Security-critical features
|
||
- Large-scale refactoring
|
||
- Novel problem-solving
|
||
|
||
**Criteria:**
|
||
|
||
- High complexity
|
||
- Requires deep understanding
|
||
- Cross-cutting concerns
|
||
- High risk of regressions
|
||
|
||
---
|
||
|
||
## Issue Metadata Schema
|
||
|
||
### Required Fields
|
||
|
||
```json
|
||
{
|
||
"issue": {
|
||
"id": 123,
|
||
"title": "Add JWT authentication [25K | Medium]",
|
||
"description": "Implement JWT token-based authentication...",
|
||
|
||
"metadata": {
|
||
"estimated_context": 25000,
|
||
"difficulty": "medium",
|
||
"epic": "auth-system",
|
||
"dependencies": [122],
|
||
"quality_gates": ["lint", "typecheck", "test", "security-scan"],
|
||
|
||
"assignment": {
|
||
"suggested_models": ["sonnet", "opus"],
|
||
"assigned_model": null,
|
||
"assigned_agent_id": null
|
||
},
|
||
|
||
"tracking": {
|
||
"created_at": "2026-01-31T10:00:00Z",
|
||
"started_at": null,
|
||
"completed_at": null,
|
||
"actual_context_used": null,
|
||
"duration_minutes": null
|
||
}
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
### Issue Title Format
|
||
|
||
**Template:** `[Feature name] [Context estimate | Difficulty]`
|
||
|
||
**Examples:**
|
||
|
||
```
|
||
✅ "Add JWT authentication [25K | Medium]"
|
||
✅ "Fix typo in README [2K | Low]"
|
||
✅ "Refactor auth system [80K | High]"
|
||
✅ "Implement rate limiting [30K | Medium]"
|
||
✅ "Add OpenAPI docs [15K | Low]"
|
||
|
||
❌ "Add authentication" (missing metadata)
|
||
❌ "Refactor auth [High]" (missing context estimate)
|
||
❌ "Fix bug [20K]" (missing difficulty)
|
||
```
|
||
|
||
### Issue Body Template
|
||
|
||
```markdown
|
||
## Context Estimate
|
||
|
||
**Estimated tokens:** 25,000 (12.5% of 200K limit)
|
||
|
||
## Difficulty
|
||
|
||
**Level:** Medium
|
||
|
||
**Rationale:**
|
||
|
||
- Requires understanding JWT spec
|
||
- Integration with existing auth middleware
|
||
- Security considerations (token signing, validation)
|
||
- Test coverage for auth flows
|
||
|
||
## Suggested Models
|
||
|
||
- Primary: Sonnet (cost-effective for medium difficulty)
|
||
- Fallback: Opus (if complexity increases)
|
||
|
||
## Dependencies
|
||
|
||
- #122 (Auth middleware must be complete first)
|
||
|
||
## Quality Gates
|
||
|
||
- [x] Lint (ESLint + Prettier)
|
||
- [x] Typecheck (TypeScript strict mode)
|
||
- [x] Tests (Unit + Integration, 80%+ coverage)
|
||
- [x] Security scan (No hardcoded secrets, safe crypto)
|
||
|
||
## Task Description
|
||
|
||
[Detailed description of work to be done...]
|
||
|
||
## Acceptance Criteria
|
||
|
||
- [ ] JWT tokens generated on login
|
||
- [ ] Tokens validated on protected routes
|
||
- [ ] Token refresh mechanism implemented
|
||
- [ ] Tests cover happy path + edge cases
|
||
- [ ] Documentation updated
|
||
|
||
## Context Breakdown
|
||
|
||
| Activity | Estimated Tokens |
|
||
| --------------------------------- | ---------------- |
|
||
| Read existing auth code | 5,000 |
|
||
| Implement JWT library integration | 8,000 |
|
||
| Write middleware logic | 6,000 |
|
||
| Add tests | 4,000 |
|
||
| Update documentation | 2,000 |
|
||
| **Total** | **25,000** |
|
||
```
|
||
|
||
---
|
||
|
||
## Context Estimation Guidelines
|
||
|
||
### Estimation Formula
|
||
|
||
```
|
||
Estimated Context = (
|
||
Files to read × 5-10K per file +
|
||
Implementation complexity × 10-30K +
|
||
Test writing × 5-15K +
|
||
Documentation × 2-5K +
|
||
Buffer for iteration × 20-50%
|
||
)
|
||
```
|
||
|
||
### Examples
|
||
|
||
**Simple (Low Difficulty):**
|
||
|
||
```
|
||
Task: Fix typo in README.md
|
||
|
||
Files to read: 1 × 5K = 5K
|
||
Implementation: Minimal = 1K
|
||
Tests: None = 0K
|
||
Docs: None = 0K
|
||
Buffer: 20% = 1.2K
|
||
Total: ~7K tokens
|
||
|
||
Rounded estimate: 10K tokens (conservative)
|
||
```
|
||
|
||
**Medium (Medium Difficulty):**
|
||
|
||
```
|
||
Task: Add API endpoint for user profile
|
||
|
||
Files to read: 3 × 8K = 24K
|
||
Implementation: Standard endpoint = 15K
|
||
Tests: Unit + integration = 10K
|
||
Docs: API spec update = 3K
|
||
Buffer: 30% = 15.6K
|
||
Total: ~67.6K tokens
|
||
|
||
Rounded estimate: 70K tokens
|
||
```
|
||
|
||
**Complex (High Difficulty):**
|
||
|
||
```
|
||
Task: Refactor authentication system
|
||
|
||
Files to read: 8 × 10K = 80K
|
||
Implementation: Complex refactor = 30K
|
||
Tests: Extensive = 15K
|
||
Docs: Architecture guide = 5K
|
||
Buffer: 50% = 65K
|
||
Total: ~195K tokens
|
||
|
||
⚠️ Exceeds 50% rule (100K limit)!
|
||
Action: Split into 2-3 smaller issues
|
||
```
|
||
|
||
### Estimation Accuracy Tracking
|
||
|
||
**After each issue, measure variance:**
|
||
|
||
```python
|
||
variance = actual_context - estimated_context
|
||
variance_pct = (variance / estimated_context) * 100
|
||
|
||
# Log for calibration
|
||
if variance_pct > 20%:
|
||
print(f"⚠️ Estimate off by {variance_pct}%")
|
||
print(f"Estimated: {estimated_context}")
|
||
print(f"Actual: {actual_context}")
|
||
print("Review estimation guidelines")
|
||
```
|
||
|
||
**Over time, refine estimation formula based on historical data.**
|
||
|
||
---
|
||
|
||
## Coordinator Implementation
|
||
|
||
### Core Algorithm
|
||
|
||
```python
|
||
class QualityRailsCoordinator:
|
||
"""Non-AI coordinator for agent swarm orchestration."""
|
||
|
||
def __init__(self, issue_queue, agent_profiles, quality_gates):
|
||
self.issues = issue_queue
|
||
self.agents = agent_profiles
|
||
self.gates = quality_gates
|
||
self.current_controller = None
|
||
|
||
def run(self):
|
||
"""Main orchestration loop."""
|
||
|
||
# Validate all issues
|
||
self.validate_issues()
|
||
|
||
# Sort by dependencies and priority
|
||
self.issues = self.topological_sort(self.issues)
|
||
|
||
# Start AI swarm controller
|
||
self.start_swarm_controller()
|
||
|
||
# Process queue
|
||
for issue in self.issues:
|
||
print(f"\n{'='*60}")
|
||
print(f"Starting issue #{issue['id']}: {issue['title']}")
|
||
print(f"{'='*60}\n")
|
||
|
||
# Assign optimal agent
|
||
agent = self.assign_agent(issue)
|
||
|
||
# Monitor and execute
|
||
self.execute_issue(issue, agent)
|
||
|
||
# Log metrics
|
||
self.log_metrics(issue, agent)
|
||
|
||
print("\n✅ All issues complete. Queue empty.")
|
||
|
||
def validate_issues(self):
|
||
"""Ensure all issues have required metadata."""
|
||
for issue in self.issues:
|
||
if not issue.get("estimated_context"):
|
||
raise ValueError(
|
||
f"Issue {issue['id']} missing context estimate"
|
||
)
|
||
|
||
if not issue.get("difficulty"):
|
||
raise ValueError(
|
||
f"Issue {issue['id']} missing difficulty rating"
|
||
)
|
||
|
||
# Validate 50% rule
|
||
max_context = max(
|
||
agent["context_limit"]
|
||
for agent in self.agents.values()
|
||
)
|
||
|
||
if issue["estimated_context"] > (max_context * 0.5):
|
||
raise ValueError(
|
||
f"Issue {issue['id']} exceeds 50% rule: "
|
||
f"{issue['estimated_context']} > {max_context * 0.5}"
|
||
)
|
||
|
||
def assign_agent(self, issue):
|
||
"""Assign optimal agent based on context + difficulty."""
|
||
context_est = issue["estimated_context"]
|
||
difficulty = issue["difficulty"]
|
||
|
||
# Filter models that can handle this issue
|
||
candidates = []
|
||
|
||
for model_name, profile in self.agents.items():
|
||
# Check context capacity (50% rule)
|
||
if context_est <= (profile["context_limit"] * 0.5):
|
||
# Check difficulty match
|
||
if difficulty in profile["difficulty_levels"]:
|
||
# Calculate cost
|
||
cost = (
|
||
context_est * profile["cost_per_1k_input"] / 1000
|
||
)
|
||
|
||
candidates.append({
|
||
"model": model_name,
|
||
"profile": profile,
|
||
"cost": cost
|
||
})
|
||
|
||
if not candidates:
|
||
raise ValueError(
|
||
f"No model can handle issue {issue['id']}: "
|
||
f"{context_est}K ctx, {difficulty} difficulty"
|
||
)
|
||
|
||
# Optimize for cost (prefer cheaper models when capable)
|
||
candidates.sort(key=lambda x: x["cost"])
|
||
selected = candidates[0]
|
||
|
||
print(f"📋 Assigned {selected['model']} to issue {issue['id']}")
|
||
print(f" Context: {context_est}K tokens")
|
||
print(f" Difficulty: {difficulty}")
|
||
print(f" Estimated cost: ${selected['cost']:.4f}")
|
||
|
||
return selected
|
||
|
||
def execute_issue(self, issue, agent):
|
||
"""Execute issue with assigned agent."""
|
||
|
||
# Start agent session
|
||
session = self.start_agent_session(agent["profile"])
|
||
|
||
# Track context
|
||
session_context = 0
|
||
context_limit = agent["profile"]["context_limit"]
|
||
|
||
# Execution loop
|
||
iteration = 0
|
||
while not issue.get("complete"):
|
||
iteration += 1
|
||
|
||
# Check context health
|
||
if session_context > (context_limit * 0.80):
|
||
print(f"⚠️ Context at 80% ({session_context}/{context_limit})")
|
||
print(" Triggering compaction...")
|
||
session_context = self.compact_session(session)
|
||
print(f" ✓ Compacted to {session_context} tokens")
|
||
|
||
if session_context > (context_limit * 0.95):
|
||
print(f"🔄 Context at 95% - rotating agent session")
|
||
state = session.save_state()
|
||
session.terminate()
|
||
session = self.start_agent_session(agent["profile"])
|
||
session.load_state(state)
|
||
session_context = session.current_context()
|
||
|
||
# Agent executes step
|
||
print(f" Iteration {iteration}...")
|
||
result = session.execute_step(issue)
|
||
|
||
# Update context tracking
|
||
session_context += result["context_used"]
|
||
|
||
# Check if agent claims completion
|
||
if result.get("claims_complete"):
|
||
print(" Agent claims completion. Running quality gates...")
|
||
|
||
# Enforce quality gates
|
||
gate_results = self.gates.validate(result)
|
||
|
||
if gate_results["passed"]:
|
||
print(" ✅ All quality gates passed")
|
||
issue["complete"] = True
|
||
issue["actual_context_used"] = session_context
|
||
else:
|
||
print(" ❌ Quality gates failed:")
|
||
for gate, errors in gate_results["failures"].items():
|
||
print(f" {gate}: {errors}")
|
||
|
||
# Send feedback to agent
|
||
session.send_feedback(gate_results["failures"])
|
||
|
||
# Clean up
|
||
session.terminate()
|
||
|
||
def start_swarm_controller(self):
|
||
"""Start AI swarm controller (OpenClaw session)."""
|
||
# Initialize OpenClaw swarm controller
|
||
# This coordinates subagents but is managed by this coordinator
|
||
pass
|
||
|
||
def start_agent_session(self, agent_profile):
|
||
"""Start individual agent session."""
|
||
# Start agent with specific model
|
||
# Return session handle
|
||
pass
|
||
|
||
def compact_session(self, session):
|
||
"""Trigger compaction in agent session."""
|
||
summary = session.send_message(
|
||
"Summarize all completed work concisely. "
|
||
"Keep only essential context for continuation."
|
||
)
|
||
|
||
session.reset_history_with_summary(summary)
|
||
|
||
return session.current_context()
|
||
|
||
def topological_sort(self, issues):
|
||
"""Sort issues by dependencies."""
|
||
# Implement dependency graph sorting
|
||
# Ensures dependencies complete before dependents
|
||
pass
|
||
|
||
def log_metrics(self, issue, agent):
|
||
"""Log issue completion metrics."""
|
||
metrics = {
|
||
"issue_id": issue["id"],
|
||
"title": issue["title"],
|
||
"estimated_context": issue["estimated_context"],
|
||
"actual_context": issue.get("actual_context_used"),
|
||
"variance": (
|
||
issue.get("actual_context_used", 0) -
|
||
issue["estimated_context"]
|
||
),
|
||
"model": agent["model"],
|
||
"difficulty": issue["difficulty"],
|
||
"timestamp": datetime.now().isoformat()
|
||
}
|
||
|
||
# Write to metrics file
|
||
with open("orchestrator-metrics.jsonl", "a") as f:
|
||
f.write(json.dumps(metrics) + "\n")
|
||
```
|
||
|
||
### Quality Gates Implementation
|
||
|
||
```python
|
||
class QualityGates:
|
||
"""Mechanical quality enforcement."""
|
||
|
||
def validate(self, result):
|
||
"""Run all quality gates."""
|
||
|
||
gates = {
|
||
"lint": self.run_lint,
|
||
"typecheck": self.run_typecheck,
|
||
"test": self.run_tests,
|
||
"security": self.run_security_scan
|
||
}
|
||
|
||
failures = {}
|
||
|
||
for gate_name, gate_fn in gates.items():
|
||
gate_result = gate_fn(result)
|
||
|
||
if not gate_result["passed"]:
|
||
failures[gate_name] = gate_result["errors"]
|
||
|
||
return {
|
||
"passed": len(failures) == 0,
|
||
"failures": failures
|
||
}
|
||
|
||
def run_lint(self, result):
|
||
"""Run linting (ESLint, Prettier, etc.)."""
|
||
# Execute: pnpm turbo run lint
|
||
# Parse output
|
||
# Return pass/fail + errors
|
||
pass
|
||
|
||
def run_typecheck(self, result):
|
||
"""Run TypeScript type checking."""
|
||
# Execute: pnpm turbo run typecheck
|
||
# Parse output
|
||
# Return pass/fail + errors
|
||
pass
|
||
|
||
def run_tests(self, result):
|
||
"""Run test suite."""
|
||
# Execute: pnpm turbo run test
|
||
# Check coverage threshold
|
||
# Return pass/fail + errors
|
||
pass
|
||
|
||
def run_security_scan(self, result):
|
||
"""Run security checks."""
|
||
# Execute: detect-secrets scan
|
||
# Check for vulnerabilities
|
||
# Return pass/fail + errors
|
||
pass
|
||
```
|
||
|
||
---
|
||
|
||
## Issue Creation Process
|
||
|
||
### Workflow
|
||
|
||
```
|
||
1. Epic Planning Agent
|
||
├── Receives epic description
|
||
├── Estimates total context required
|
||
├── Checks against agent limits
|
||
└── Decomposes into issues if needed
|
||
|
||
2. Issue Creation
|
||
├── For each sub-issue:
|
||
│ ├── Estimate context (formula + buffer)
|
||
│ ├── Assign difficulty level
|
||
│ ├── Validate 50% rule
|
||
│ └── Create issue with metadata
|
||
|
||
3. Validation
|
||
├── Coordinator validates all issues
|
||
├── Checks for missing metadata
|
||
└── Rejects oversized issues
|
||
|
||
4. Execution
|
||
├── Coordinator assigns agents
|
||
├── Monitors context usage
|
||
├── Enforces quality gates
|
||
└── Logs metrics for calibration
|
||
```
|
||
|
||
### Epic Planning Agent Prompt
|
||
|
||
````markdown
|
||
You are an Epic Planning Agent. Your job is to decompose epics into
|
||
properly-sized issues for autonomous execution.
|
||
|
||
## Guidelines
|
||
|
||
1. **Estimate total context:**
|
||
- Read all related code files
|
||
- Estimate implementation complexity
|
||
- Account for tests and documentation
|
||
- Add 30% buffer for iteration
|
||
|
||
2. **Apply 50% rule:**
|
||
- Target agent context limit: 200K tokens
|
||
- Maximum issue size: 100K tokens
|
||
- If epic exceeds 100K: Split into multiple issues
|
||
|
||
3. **Assign difficulty:**
|
||
- Low: CRUD, config, docs, simple fixes
|
||
- Medium: APIs, business logic, integrations
|
||
- High: Architecture, complex algorithms, refactors
|
||
|
||
4. **Create issues with metadata:**
|
||
```json
|
||
{
|
||
"title": "[Feature] [Context | Difficulty]",
|
||
"estimated_context": 25000,
|
||
"difficulty": "medium",
|
||
"epic": "epic-name",
|
||
"dependencies": [],
|
||
"quality_gates": ["lint", "typecheck", "test"]
|
||
}
|
||
```
|
||
````
|
||
|
||
5. **Validate:**
|
||
- Each issue < 100K tokens ✓
|
||
- Dependencies are explicit ✓
|
||
- Difficulty matches complexity ✓
|
||
- Quality gates defined ✓
|
||
|
||
## Output Format
|
||
|
||
Create a JSON array of issues:
|
||
|
||
```json
|
||
[
|
||
{
|
||
"id": 1,
|
||
"title": "Add auth middleware [20K | Medium]",
|
||
"estimated_context": 20000,
|
||
"difficulty": "medium",
|
||
...
|
||
},
|
||
...
|
||
]
|
||
```
|
||
|
||
```
|
||
|
||
---
|
||
|
||
## Proof of Concept Plan
|
||
|
||
### PoC Goals
|
||
|
||
1. **Validate non-AI coordinator pattern** - Prove it can manage agent lifecycle
|
||
2. **Test context monitoring** - Verify we can track and react to context usage
|
||
3. **Validate quality gates** - Ensure mechanical enforcement works
|
||
4. **Test agent assignment** - Confirm model selection logic
|
||
5. **Measure metrics** - Collect data on estimate accuracy
|
||
|
||
### PoC Scope
|
||
|
||
**Small test project:**
|
||
- 5-10 simple issues
|
||
- Mix of difficulty levels
|
||
- Use Haiku + Sonnet (cheap)
|
||
- Real quality gates (lint, typecheck, test)
|
||
|
||
**What we'll build:**
|
||
```
|
||
|
||
poc/
|
||
├── coordinator.py # Non-AI coordinator
|
||
├── agent_profiles.json # Model capabilities
|
||
├── issues.json # Test issue queue
|
||
├── quality_gates.py # Mechanical gates
|
||
└── metrics.jsonl # Results log
|
||
|
||
````
|
||
|
||
**Test cases:**
|
||
1. Low difficulty issue → Haiku (cheap, fast)
|
||
2. Medium difficulty issue → Sonnet (balanced)
|
||
3. Oversized issue → Should reject (50% rule)
|
||
4. Issue with failed quality gate → Agent retries
|
||
5. High context issue → Triggers compaction
|
||
|
||
### PoC Success Criteria
|
||
|
||
- [ ] Coordinator completes all issues without human intervention
|
||
- [ ] Quality gates enforce standards (at least 1 failure caught + fixed)
|
||
- [ ] Context monitoring works (log shows tracking)
|
||
- [ ] Agent assignment is optimal (cheapest capable model chosen)
|
||
- [ ] Metrics collected for all issues
|
||
- [ ] No agent exhaustion (50% rule enforced)
|
||
|
||
### PoC Timeline
|
||
|
||
**Week 1: Foundation**
|
||
- [ ] Build coordinator skeleton
|
||
- [ ] Implement agent profiles
|
||
- [ ] Create test issue queue
|
||
- [ ] Set up quality gates
|
||
|
||
**Week 2: Integration**
|
||
- [ ] Connect to Claude API
|
||
- [ ] Implement context monitoring
|
||
- [ ] Test agent lifecycle
|
||
- [ ] Validate quality gates
|
||
|
||
**Week 3: Testing**
|
||
- [ ] Run full PoC
|
||
- [ ] Collect metrics
|
||
- [ ] Analyze results
|
||
- [ ] Document findings
|
||
|
||
**Week 4: Refinement**
|
||
- [ ] Fix issues discovered
|
||
- [ ] Optimize assignment logic
|
||
- [ ] Update documentation
|
||
- [ ] Prepare for production
|
||
|
||
---
|
||
|
||
## Production Deployment (Post-PoC)
|
||
|
||
### Integration with Mosaic Stack
|
||
|
||
**Phase 1: Core Implementation**
|
||
- Implement coordinator in Mosaic Stack codebase
|
||
- Add agent profiles to configuration
|
||
- Integrate with existing OpenClaw infrastructure
|
||
- Add quality gates to CI/CD
|
||
|
||
**Phase 2: Issue Management**
|
||
- Update issue templates with metadata fields
|
||
- Train team on estimation guidelines
|
||
- Build issue validation tools
|
||
- Create epic planning workflows
|
||
|
||
**Phase 3: Monitoring**
|
||
- Add coordinator metrics dashboard
|
||
- Track estimate accuracy over time
|
||
- Monitor cost optimization
|
||
- Alert on failures
|
||
|
||
**Phase 4: Scale**
|
||
- Expand to all milestones
|
||
- Add more agent types (GLM, MiniMax)
|
||
- Optimize for multi-epic orchestration
|
||
- Build self-learning estimation
|
||
|
||
---
|
||
|
||
## Open Questions (To Resolve in PoC)
|
||
|
||
1. **Compaction effectiveness:** How much context does summarization actually free?
|
||
2. **Estimation accuracy:** How close are initial estimates to reality?
|
||
3. **Model selection:** Is cost-optimized assignment actually optimal, or should we prioritize speed/quality?
|
||
4. **Quality gate timing:** Should gates run after each commit, or only at issue completion?
|
||
5. **Session rotation overhead:** What's the cost of rotating agents vs compaction?
|
||
6. **Dependency handling:** How to ensure dependencies are truly complete before starting dependent issues?
|
||
|
||
---
|
||
|
||
## Success Metrics
|
||
|
||
### PoC Metrics
|
||
|
||
- **Autonomy:** % of issues completed without human intervention
|
||
- **Quality:** % of commits passing all quality gates on first try
|
||
- **Cost:** Total cost vs baseline (all-Opus)
|
||
- **Accuracy:** Context estimate variance (target: <20%)
|
||
- **Efficiency:** Issues per hour
|
||
|
||
### Production Metrics
|
||
|
||
- **Throughput:** Issues completed per day
|
||
- **Quality rate:** % passing all gates first try
|
||
- **Context efficiency:** Avg context used vs estimated
|
||
- **Cost savings:** % saved vs all-Opus baseline
|
||
- **Agent utilization:** % of time agents are productive (not waiting)
|
||
|
||
---
|
||
|
||
## Appendix: Agent Skill Definitions
|
||
|
||
### Agent Skills Schema
|
||
|
||
```json
|
||
{
|
||
"skills": {
|
||
"backend-api": {
|
||
"description": "Build RESTful APIs and endpoints",
|
||
"difficulty": "medium",
|
||
"typical_context": "20-40K",
|
||
"quality_gates": ["lint", "typecheck", "test", "api-spec"]
|
||
},
|
||
"frontend-ui": {
|
||
"description": "Build UI components and pages",
|
||
"difficulty": "medium",
|
||
"typical_context": "15-35K",
|
||
"quality_gates": ["lint", "typecheck", "test", "a11y"]
|
||
},
|
||
"database-schema": {
|
||
"description": "Design and migrate database schemas",
|
||
"difficulty": "high",
|
||
"typical_context": "30-50K",
|
||
"quality_gates": ["typecheck", "test", "migration-validate"]
|
||
},
|
||
"documentation": {
|
||
"description": "Write technical documentation",
|
||
"difficulty": "low",
|
||
"typical_context": "5-15K",
|
||
"quality_gates": ["spelling", "markdown-lint"]
|
||
},
|
||
"refactoring": {
|
||
"description": "Refactor existing code",
|
||
"difficulty": "high",
|
||
"typical_context": "40-80K",
|
||
"quality_gates": ["lint", "typecheck", "test", "no-behavior-change"]
|
||
},
|
||
"bug-fix": {
|
||
"description": "Fix reported bugs",
|
||
"difficulty": "low-medium",
|
||
"typical_context": "10-30K",
|
||
"quality_gates": ["lint", "typecheck", "test", "regression-test"]
|
||
}
|
||
}
|
||
}
|
||
````
|
||
|
||
**Usage:**
|
||
|
||
- Issues can reference skills: `"skills": ["backend-api", "database-schema"]`
|
||
- Coordinator uses skill metadata to inform estimates
|
||
- Helps with consistent difficulty assignment
|
||
|
||
---
|
||
|
||
## Document Status
|
||
|
||
**Version:** 1.0 - Proposed Architecture
|
||
**Next Steps:** Build Proof of Concept
|
||
**Approval Required:** After successful PoC
|
||
|
||
---
|
||
|
||
**End of Architecture Document**
|