docs: Add issue parser estimation strategy

Critical enhancement for real-world usage - parser must handle: - Unformatted issues (estimate from content) - Incomplete metadata (best-guess + confidence score) - Oversized issues (auto-decompose before queuing) Three-level estimation: 1. Structured metadata → extract directly (95%+ confidence) 2. Content analysis → AI estimates from description (50-95%) 3. Minimal info → defaults + warn user (<50%) 50% rule enforcement: - Detect issues > 50% of agent's context limit - Auto-decompose into sub-issues using Opus - Create sub-issues in Gitea with dependencies - Label parent as EPIC Confidence-based workflow: - ≥60%: Queue automatically - 30-59%: Queue with warning - <30%: Don't queue, request more details Makes coordinator truly autonomous - handles whatever users throw at it. Refs #158 (COORD-002) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-31 15:40:34 -06:00
parent 3be60ccd18
commit 403aba4cd3
1 changed files with 753 additions and 0 deletions
--- a/docs/3-architecture/issue-parser-estimation-strategy.md
+++ b/docs/3-architecture/issue-parser-estimation-strategy.md
@@ -0,0 +1,753 @@
+# Issue Parser Estimation Strategy
+
+**Status:** Proposed (Phase 0 Enhancement)
+**Related Issues:** COORD-002 (Issue Parser Agent)
+**Priority:** Critical (P0) - Required for real-world usage
+
+---
+
+## Problem Statement
+
+Not all issues will follow the formatted metadata structure used in COORD-XXX issues. The issue parser must handle:
+
+1. **Unformatted issues** - Just title and description, no metadata
+2. **Incomplete metadata** - Some fields present, others missing
+3. **Oversized issues** - Exceed 50% rule, need decomposition
+4. **Varying formats** - Different teams use different templates
+
+**The parser must make intelligent estimates when metadata is missing.**
+
+---
+
+## Estimation Strategy
+
+### Level 1: Structured Metadata (Best Case)
+
+**When issue has formatted metadata:**
+
+```markdown
+## Context Estimate
+
+- Files to modify: 3
+- Implementation complexity: medium (20000 tokens)
+- **Total estimated: 46800 tokens**
+- **Recommended agent: glm**
+
+## Difficulty
+
+medium
+```
+
+**Action:**
+
+- Extract directly from markdown
+- **Confidence: HIGH** (95%+)
+- Use values as-is
+
+---
+
+### Level 2: Content Analysis (Common Case)
+
+**When metadata is missing, analyze issue content:**
+
+#### 2.1 Analyze Title and Description
+
+```python
+async def estimate_from_content(issue: dict) -> dict:
+    """Estimate metadata from issue content using AI."""
+
+    client = anthropic.Anthropic()
+
+    response = await client.messages.create(
+        model="claude-sonnet-4-5",
+        max_tokens=2000,
+        messages=[{
+            "role": "user",
+            "content": f"""Analyze this issue and estimate resource requirements.
+
+Title: {issue['title']}
+Description:
+{issue['body']}
+
+Estimate:
+1. **Files to modify**: How many files will likely be touched?
+   - Count mentions of specific files, modules, components
+   - Look for scope indicators (refactor, add feature, fix bug)
+
+2. **Implementation complexity**:
+   - Low: Simple CRUD, config changes, one-file fixes
+   - Medium: Multi-file changes, business logic, API development
+   - High: Architecture changes, complex refactoring, new systems
+
+3. **Context estimate**:
+   - Use formula: (files × 7000) + complexity + tests + docs
+   - Low: ~20-40K tokens
+   - Medium: ~40-80K tokens
+   - High: ~80-150K tokens
+
+4. **Difficulty**: low/medium/high
+
+5. **Confidence**: 0-100% (based on clarity of issue description)
+
+Return JSON:
+{{
+  "estimated_context": <integer>,
+  "difficulty": "low" | "medium" | "high",
+  "assigned_agent": "haiku" | "sonnet" | "glm" | "opus",
+  "confidence": <integer 0-100>,
+  "reasoning": "Brief explanation of estimates"
+}}
+"""
+        }]
+    )
+
+    metadata = json.loads(response.content[0].text)
+
+    # Add metadata source
+    metadata['source'] = 'content_analysis'
+
+    return metadata
+```
+
+**Confidence factors:**
+
+| Factor                  | High Confidence              | Low Confidence          |
+| ----------------------- | ---------------------------- | ----------------------- |
+| **Description length**  | >500 chars, detailed         | <100 chars, vague       |
+| **Specific mentions**   | Files, modules, APIs named   | Generic "fix the thing" |
+| **Acceptance criteria** | Clear checklist              | None provided           |
+| **Technical details**   | Stack traces, logs, examples | "It's broken"           |
+| **Scope clarity**       | Well-defined boundaries      | Open-ended              |
+
+**Confidence scoring:**
+
+````python
+def calculate_confidence(issue: dict, analysis: dict) -> int:
+    """Calculate confidence score 0-100."""
+
+    score = 50  # Start at neutral
+
+    # Description length
+    if len(issue['body']) > 500:
+        score += 15
+    elif len(issue['body']) < 100:
+        score -= 20
+
+    # Specific file/module mentions
+    code_patterns = r'(`[^`]+`|\.ts|\.py|\.js|src/|components/)'
+    mentions = len(re.findall(code_patterns, issue['body']))
+    score += min(mentions * 5, 20)
+
+    # Acceptance criteria
+    if '- [ ]' in issue['body'] or '- [x]' in issue['body']:
+        score += 10
+
+    # Technical details (stack traces, logs, code blocks)
+    if '```' in issue['body']:
+        score += 10
+
+    # Scope keywords
+    scope_keywords = ['refactor', 'implement', 'add', 'fix', 'update']
+    if any(kw in issue['title'].lower() for kw in scope_keywords):
+        score += 5
+
+    return max(0, min(100, score))
+````
+
+**Action:**
+
+- Use AI to estimate from content
+- **Confidence: MEDIUM** (50-80%)
+- Comment confidence on issue
+
+**Example comment:**
+
+```markdown
+🤖 Estimated metadata (confidence: 65%):
+
+- Estimated context: 52,000 tokens
+- Difficulty: medium
+- Recommended agent: glm
+
+📊 Reasoning:
+
+- Mentions 3 components (UserService, AuthMiddleware, tests)
+- Requires API changes (medium complexity)
+- Has acceptance criteria (+confidence)
+- Description is detailed (+confidence)
+
+Note: These are estimates. Actual usage may vary.
+```
+
+---
+
+### Level 3: Minimal Information (Worst Case)
+
+**When issue is very vague:**
+
+```markdown
+Title: Fix the login bug
+Body: Login doesn't work
+```
+
+**Action:**
+
+- Use conservative defaults
+- **Confidence: LOW** (<50%)
+- Warn user, suggest more details
+
+**Default estimates:**
+
+```python
+DEFAULT_ESTIMATES = {
+    'estimated_context': 50000,      # Conservative default
+    'difficulty': 'medium',
+    'assigned_agent': 'sonnet',      # Safe middle-tier agent
+    'confidence': 30,
+    'reasoning': 'Minimal information provided, using defaults'
+}
+```
+
+**Example comment:**
+
+```markdown
+⚠️ Low confidence estimate (30%):
+
+- Estimated context: 50,000 tokens
+- Difficulty: medium
+- Recommended agent: sonnet
+
+📝 Suggestion: For better estimates, please add:
+
+- Which files/components are affected
+- Expected scope (one file? multiple modules?)
+- Acceptance criteria or definition of "done"
+- Any relevant logs, stack traces, or examples
+```
+
+---
+
+## Oversized Issue Detection & Decomposition
+
+### 50% Rule Enforcement
+
+**Before queuing, check if issue exceeds 50% of target agent's limit:**
+
+```python
+async def check_and_decompose(issue: dict, metadata: dict) -> List[dict]:
+    """Check if issue exceeds 50% rule. If so, decompose."""
+
+    # Get agent limit
+    agent = metadata['assigned_agent']
+    agent_limit = AGENT_PROFILES[agent]['context_limit']
+    max_issue_size = agent_limit * 0.5
+
+    # Check if oversized
+    if metadata['estimated_context'] > max_issue_size:
+        logger.warning(
+            f"Issue #{issue['number']} exceeds 50% rule: "
+            f"{metadata['estimated_context']} > {max_issue_size}"
+        )
+
+        # Decompose into sub-issues
+        sub_issues = await decompose_epic(issue, metadata)
+
+        # Comment on parent issue
+        await gitea_client.comment_on_issue(
+            issue['number'],
+            f"⚠️ Issue exceeds 50% rule ({metadata['estimated_context']:,} tokens)\n\n"
+            f"Auto-decomposing into {len(sub_issues)} sub-issues...\n\n"
+            f"This issue will be converted to an EPIC tracking the sub-issues."
+        )
+
+        # Label as epic
+        await gitea_client.add_label(issue['number'], 'epic')
+
+        return sub_issues
+
+    else:
+        # Single issue, good to go
+        return [issue]
+```
+
+### Automatic Epic Decomposition
+
+**When issue is oversized, use AI to break it down:**
+
+```python
+async def decompose_epic(issue: dict, metadata: dict) -> List[dict]:
+    """Decompose oversized issue into sub-issues."""
+
+    client = anthropic.Anthropic()
+
+    # Get max issue size for target agent
+    agent = metadata['assigned_agent']
+    max_size = AGENT_PROFILES[agent]['context_limit'] * 0.5
+
+    response = await client.messages.create(
+        model="claude-opus-4-5",  # Use Opus for decomposition
+        max_tokens=4000,
+        messages=[{
+            "role": "user",
+            "content": f"""This issue is too large ({metadata['estimated_context']:,} tokens)
+and must be broken into smaller sub-issues.
+
+**Original Issue:**
+Title: {issue['title']}
+Body: {issue['body']}
+
+**Constraints:**
+- Each sub-issue must be ≤ {max_size:,} tokens
+- Sub-issues should be independently completable
+- Maintain logical order (dependencies)
+- Cover all aspects of original issue
+
+**Instructions:**
+1. Identify logical breakdown points
+2. Create 3-6 sub-issues
+3. Estimate context for each
+4. Define dependencies (what must come first)
+
+Return JSON array:
+[
+  {{
+    "title": "Sub-issue title",
+    "description": "Detailed description",
+    "estimated_context": <integer>,
+    "difficulty": "low" | "medium" | "high",
+    "depends_on": [<array of titles this depends on>]
+  }},
+  ...
+]
+
+Ensure NO sub-issue exceeds {max_size:,} tokens.
+"""
+        }]
+    )
+
+    sub_issues = json.loads(response.content[0].text)
+
+    # Validate all sub-issues fit 50% rule
+    for sub in sub_issues:
+        if sub['estimated_context'] > max_size:
+            raise ValueError(
+                f"Sub-issue '{sub['title']}' still exceeds limit: "
+                f"{sub['estimated_context']} > {max_size}"
+            )
+
+    # Create sub-issues in Gitea
+    created_issues = []
+    issue_map = {}  # title -> issue number
+
+    for sub in sub_issues:
+        # Create issue
+        new_issue = await gitea_client.create_issue(
+            title=f"[SUB] {sub['title']}",
+            body=f"""**Parent Epic:** #{issue['number']} - {issue['title']}
+
+## Objective
+{sub['description']}
+
+## Context Estimate
+- **Total estimated: {sub['estimated_context']:,} tokens**
+- Difficulty: {sub['difficulty']}
+
+## Dependencies
+{format_dependencies(sub['depends_on'], issue_map)}
+
+## Notes
+Auto-generated from epic decomposition.
+""",
+            labels=['sub-issue', f"p{issue.get('priority', 1)}"],
+            milestone=issue.get('milestone')
+        )
+
+        created_issues.append(new_issue)
+        issue_map[sub['title']] = new_issue['number']
+
+    # Update parent issue to reference sub-issues
+    sub_issue_list = '\n'.join(
+        f"- #{i['number']} {i['title']}"
+        for i in created_issues
+    )
+
+    await gitea_client.comment_on_issue(
+        issue['number'],
+        f"## Sub-Issues Created\n\n{sub_issue_list}\n\n"
+        f"This issue is now an EPIC. Close this when all sub-issues complete."
+    )
+
+    return created_issues
+```
+
+**Example decomposition:**
+
+```markdown
+Original Issue #200: "Refactor authentication system"
+Estimated: 180,000 tokens (EXCEEDS 50% rule for Opus: 100K limit)
+
+Auto-decomposed into:
+├─ #201 [SUB] Extract auth middleware (45K tokens) → Ready
+├─ #202 [SUB] Implement JWT service (38K tokens) → Blocked by #201
+├─ #203 [SUB] Add token refresh logic (32K tokens) → Blocked by #202
+├─ #204 [SUB] Update auth guards (28K tokens) → Blocked by #202
+└─ #205 [SUB] Add integration tests (35K tokens) → Blocked by #201,#202,#203,#204
+
+Total: 178K tokens across 5 sub-issues
+Each sub-issue: ≤50K tokens ✅
+```
+
+---
+
+## Confidence-Based Workflow
+
+### High Confidence (95%+)
+
+- **Source:** Structured metadata in issue body
+- **Action:** Use values directly, queue immediately
+- **Comment:** "✅ Metadata detected, high confidence"
+
+### Medium Confidence (50-95%)
+
+- **Source:** Content analysis
+- **Action:** Use estimates, queue with note
+- **Comment:** "📊 Estimated from content (confidence: X%)"
+
+### Low Confidence (<50%)
+
+- **Source:** Minimal info, using defaults
+- **Action:** Use defaults, warn user
+- **Comment:** "⚠️ Low confidence - please add details"
+- **Optional:** Wait for user to update issue before queuing
+
+---
+
+## Confidence Thresholds
+
+```python
+class ConfidenceThresholds:
+    """Confidence-based behavior thresholds."""
+
+    AUTO_QUEUE = 60      # ≥60% confidence: Queue automatically
+    WARN_USER = 50       # <50% confidence: Warn user
+    WAIT_FOR_UPDATE = 30 # <30% confidence: Don't queue, wait for update
+```
+
+**Workflow:**
+
+```python
+async def handle_issue_assignment(issue: dict):
+    """Handle issue assigned to @mosaic."""
+
+    # Parse metadata (structured or estimated)
+    metadata = await parse_issue_metadata(issue)
+
+    # Check confidence
+    if metadata['confidence'] >= ConfidenceThresholds.AUTO_QUEUE:
+        # High/medium confidence - queue it
+        await queue_manager.enqueue(issue, metadata)
+
+        await gitea_client.comment_on_issue(
+            issue['number'],
+            f"🤖 Added to coordinator queue\n\n"
+            f"**Metadata** (confidence: {metadata['confidence']}%):\n"
+            f"- Estimated context: {metadata['estimated_context']:,} tokens\n"
+            f"- Difficulty: {metadata['difficulty']}\n"
+            f"- Assigned agent: {metadata['assigned_agent']}\n\n"
+            f"{metadata.get('reasoning', '')}"
+        )
+
+    elif metadata['confidence'] >= ConfidenceThresholds.WARN_USER:
+        # Low confidence - queue but warn
+        await queue_manager.enqueue(issue, metadata)
+
+        await gitea_client.comment_on_issue(
+            issue['number'],
+            f"⚠️ Low confidence estimate ({metadata['confidence']}%)\n\n"
+            f"Using best-guess estimates:\n"
+            f"- Estimated context: {metadata['estimated_context']:,} tokens\n"
+            f"- Difficulty: {metadata['difficulty']}\n"
+            f"- Assigned agent: {metadata['assigned_agent']}\n\n"
+            f"💡 For better estimates, please add:\n"
+            f"- Which files/components are affected\n"
+            f"- Expected scope\n"
+            f"- Acceptance criteria\n\n"
+            f"Queued anyway - work will proceed with these estimates."
+        )
+
+    else:
+        # Very low confidence - don't queue
+        await gitea_client.comment_on_issue(
+            issue['number'],
+            f"❌ Cannot queue - insufficient information ({metadata['confidence']}%)\n\n"
+            f"Please add more details:\n"
+            f"- What files/components need changes?\n"
+            f"- What is the expected scope?\n"
+            f"- What are the acceptance criteria?\n\n"
+            f"Re-assign to @mosaic when ready."
+        )
+
+        # Unassign from coordinator
+        await gitea_client.unassign_issue(issue['number'], 'mosaic')
+```
+
+---
+
+## Edge Cases
+
+### Case 1: Issue Updated After Queuing
+
+**User adds details after low-confidence queuing:**
+
+```python
+@app.post('/webhook/gitea')
+async def handle_webhook(payload: dict):
+    """Handle Gitea webhooks."""
+
+    if payload['action'] == 'edited':
+        issue = payload['issue']
+
+        # Check if already in queue
+        if queue_manager.has_issue(issue['number']):
+            # Re-parse with updated content
+            new_metadata = await parse_issue_metadata(issue)
+
+            # Update queue
+            queue_manager.update_metadata(issue['number'], new_metadata)
+
+            await gitea_client.comment_on_issue(
+                issue['number'],
+                f"🔄 Issue updated - re-estimated metadata:\n"
+                f"- Estimated context: {new_metadata['estimated_context']:,} tokens\n"
+                f"- Difficulty: {new_metadata['difficulty']}\n"
+                f"- Confidence: {new_metadata['confidence']}%"
+            )
+```
+
+### Case 2: Decomposition Creates More Oversized Issues
+
+**If decomposed sub-issue still exceeds 50% rule:**
+
+```python
+# Recursive decomposition
+async def decompose_epic(issue: dict, metadata: dict, depth: int = 0) -> List[dict]:
+    """Decompose with recursion limit."""
+
+    if depth > 2:
+        raise ValueError(
+            f"Issue #{issue['number']} cannot be decomposed enough. "
+            f"Manual intervention required."
+        )
+
+    sub_issues = await ai_decompose(issue, metadata)
+
+    # Check if any sub-issue is still too large
+    oversized = [s for s in sub_issues if s['estimated_context'] > max_size]
+
+    if oversized:
+        # Recursively decompose oversized sub-issues
+        final_issues = []
+        for sub in sub_issues:
+            if sub['estimated_context'] > max_size:
+                # Decompose further
+                sub_sub_issues = await decompose_epic(sub, sub, depth + 1)
+                final_issues.extend(sub_sub_issues)
+            else:
+                final_issues.append(sub)
+        return final_issues
+
+    return sub_issues
+```
+
+### Case 3: No Clear Decomposition
+
+**If AI can't find good breakdown points:**
+
+```python
+# Comment on issue, unassign from coordinator
+await gitea_client.comment_on_issue(
+    issue['number'],
+    f"❌ Cannot auto-decompose this issue.\n\n"
+    f"Estimated at {metadata['estimated_context']:,} tokens "
+    f"(exceeds {max_size:,} limit), but no clear breakdown found.\n\n"
+    f"**Manual action needed:**\n"
+    f"1. Break this into smaller sub-issues manually\n"
+    f"2. Assign sub-issues to @mosaic\n"
+    f"3. This issue can become an EPIC tracking sub-issues\n\n"
+    f"Unassigning from coordinator."
+)
+
+await gitea_client.unassign_issue(issue['number'], 'mosaic')
+```
+
+---
+
+## Implementation Checklist
+
+**Phase 0 (COORD-002) must include:**
+
+- [ ] Structured metadata extraction (existing plan)
+- [ ] Content analysis estimation (NEW)
+- [ ] Confidence scoring (NEW)
+- [ ] Best-guess defaults (NEW)
+- [ ] 50% rule validation (NEW)
+- [ ] Automatic epic decomposition (NEW)
+- [ ] Recursive decomposition handling (NEW)
+- [ ] Confidence-based workflow (NEW)
+- [ ] Update issue handling (NEW)
+
+---
+
+## Success Criteria
+
+**Parser handles all issue types:**
+
+- ✅ Formatted issues → High confidence, extract directly
+- ✅ Unformatted issues → Medium confidence, estimate from content
+- ✅ Vague issues → Low confidence, use defaults + warn
+- ✅ Oversized issues → Auto-decompose, create sub-issues
+- ✅ Updated issues → Re-parse, update queue
+
+**No manual intervention needed for:**
+
+- Well-formatted issues
+- Clear descriptions (even without metadata)
+- Oversized issues (auto-decompose)
+
+**Manual intervention only for:**
+
+- Very vague issues (<30% confidence)
+- Issues that can't be decomposed
+- Edge cases requiring human judgment
+
+---
+
+## Example Scenarios
+
+### Scenario 1: Well-Formatted Issue
+
+```markdown
+Issue #300: [COORD-020] Implement user profile caching
+
+## Context Estimate
+
+- Files: 4
+- Total: 52,000 tokens
+- Agent: glm
+
+## Difficulty
+
+medium
+```
+
+**Result:**
+
+- ✅ Extract directly
+- Confidence: 95%
+- Queue immediately
+
+---
+
+### Scenario 2: Clear But Unformatted Issue
+
+```markdown
+Issue #301: Add caching to user profile API
+
+Need to cache user profiles to reduce database load.
+
+Files affected:
+
+- src/api/users/users.service.ts
+- src/cache/cache.service.ts
+- src/api/users/users.controller.ts
+- tests/users.service.spec.ts
+
+Acceptance criteria:
+
+- [ ] Cache GET /users/:id requests
+- [ ] 5 minute TTL
+- [ ] Invalidate on update/delete
+- [ ] Add tests
+```
+
+**Result:**
+
+- 📊 Estimate from content
+- Files: 4 → 28K base
+- Clear scope → Medium complexity (20K)
+- Tests mentioned → 10K
+- **Total: ~58K tokens**
+- Confidence: 75%
+- Queue with note
+
+---
+
+### Scenario 3: Vague Issue
+
+```markdown
+Issue #302: Fix user thing
+
+Users are complaining
+```
+
+**Result:**
+
+- ⚠️ Minimal info
+- Use defaults (50K, medium, sonnet)
+- Confidence: 25%
+- Comment: "Please add details"
+- Don't queue (<30% threshold)
+- Unassign from @mosaic
+
+---
+
+### Scenario 4: Oversized Issue
+
+```markdown
+Issue #303: Refactor entire authentication system
+
+We need to modernize our auth:
+
+- Replace session-based auth with JWT
+- Add OAuth2 support
+- Implement refresh tokens
+- Add MFA
+- Update all protected routes
+- Migration for existing users
+```
+
+**Result:**
+
+- 📊 Estimate: 180K tokens
+- ⚠️ Exceeds 50% rule (>100K)
+- Auto-decompose into sub-issues:
+  - #304: Extract JWT service (35K)
+  - #305: Add OAuth2 integration (40K)
+  - #306: Implement refresh tokens (28K)
+  - #307: Add MFA support (32K)
+  - #308: Update route guards (22K)
+  - #309: User migration script (18K)
+- Label #303 as EPIC
+- Queue sub-issues
+
+---
+
+## Conclusion
+
+The issue parser must be **robust and intelligent** to handle real-world issues:
+
+- ✅ Extract structured metadata when available
+- ✅ Estimate from content when missing
+- ✅ Use confidence scores to guide behavior
+- ✅ Auto-decompose oversized issues
+- ✅ Warn users on low confidence
+- ✅ Handle edge cases gracefully
+
+**This makes the coordinator truly autonomous** - it can handle whatever issues users throw at it.
+
+---
+
+**Document Version:** 1.0
+**Created:** 2026-01-31
+**Status:** Proposed - Update COORD-002
+**Priority:** Critical (P0) - Required for real-world usage