diff --git a/docs/3-architecture/issue-parser-estimation-strategy.md b/docs/3-architecture/issue-parser-estimation-strategy.md new file mode 100644 index 0000000..ff046b5 --- /dev/null +++ b/docs/3-architecture/issue-parser-estimation-strategy.md @@ -0,0 +1,753 @@ +# Issue Parser Estimation Strategy + +**Status:** Proposed (Phase 0 Enhancement) +**Related Issues:** COORD-002 (Issue Parser Agent) +**Priority:** Critical (P0) - Required for real-world usage + +--- + +## Problem Statement + +Not all issues will follow the formatted metadata structure used in COORD-XXX issues. The issue parser must handle: + +1. **Unformatted issues** - Just title and description, no metadata +2. **Incomplete metadata** - Some fields present, others missing +3. **Oversized issues** - Exceed 50% rule, need decomposition +4. **Varying formats** - Different teams use different templates + +**The parser must make intelligent estimates when metadata is missing.** + +--- + +## Estimation Strategy + +### Level 1: Structured Metadata (Best Case) + +**When issue has formatted metadata:** + +```markdown +## Context Estimate + +- Files to modify: 3 +- Implementation complexity: medium (20000 tokens) +- **Total estimated: 46800 tokens** +- **Recommended agent: glm** + +## Difficulty + +medium +``` + +**Action:** + +- Extract directly from markdown +- **Confidence: HIGH** (95%+) +- Use values as-is + +--- + +### Level 2: Content Analysis (Common Case) + +**When metadata is missing, analyze issue content:** + +#### 2.1 Analyze Title and Description + +```python +async def estimate_from_content(issue: dict) -> dict: + """Estimate metadata from issue content using AI.""" + + client = anthropic.Anthropic() + + response = await client.messages.create( + model="claude-sonnet-4-5", + max_tokens=2000, + messages=[{ + "role": "user", + "content": f"""Analyze this issue and estimate resource requirements. + +Title: {issue['title']} +Description: +{issue['body']} + +Estimate: +1. **Files to modify**: How many files will likely be touched? + - Count mentions of specific files, modules, components + - Look for scope indicators (refactor, add feature, fix bug) + +2. **Implementation complexity**: + - Low: Simple CRUD, config changes, one-file fixes + - Medium: Multi-file changes, business logic, API development + - High: Architecture changes, complex refactoring, new systems + +3. **Context estimate**: + - Use formula: (files × 7000) + complexity + tests + docs + - Low: ~20-40K tokens + - Medium: ~40-80K tokens + - High: ~80-150K tokens + +4. **Difficulty**: low/medium/high + +5. **Confidence**: 0-100% (based on clarity of issue description) + +Return JSON: +{{ + "estimated_context": , + "difficulty": "low" | "medium" | "high", + "assigned_agent": "haiku" | "sonnet" | "glm" | "opus", + "confidence": , + "reasoning": "Brief explanation of estimates" +}} +""" + }] + ) + + metadata = json.loads(response.content[0].text) + + # Add metadata source + metadata['source'] = 'content_analysis' + + return metadata +``` + +**Confidence factors:** + +| Factor | High Confidence | Low Confidence | +| ----------------------- | ---------------------------- | ----------------------- | +| **Description length** | >500 chars, detailed | <100 chars, vague | +| **Specific mentions** | Files, modules, APIs named | Generic "fix the thing" | +| **Acceptance criteria** | Clear checklist | None provided | +| **Technical details** | Stack traces, logs, examples | "It's broken" | +| **Scope clarity** | Well-defined boundaries | Open-ended | + +**Confidence scoring:** + +````python +def calculate_confidence(issue: dict, analysis: dict) -> int: + """Calculate confidence score 0-100.""" + + score = 50 # Start at neutral + + # Description length + if len(issue['body']) > 500: + score += 15 + elif len(issue['body']) < 100: + score -= 20 + + # Specific file/module mentions + code_patterns = r'(`[^`]+`|\.ts|\.py|\.js|src/|components/)' + mentions = len(re.findall(code_patterns, issue['body'])) + score += min(mentions * 5, 20) + + # Acceptance criteria + if '- [ ]' in issue['body'] or '- [x]' in issue['body']: + score += 10 + + # Technical details (stack traces, logs, code blocks) + if '```' in issue['body']: + score += 10 + + # Scope keywords + scope_keywords = ['refactor', 'implement', 'add', 'fix', 'update'] + if any(kw in issue['title'].lower() for kw in scope_keywords): + score += 5 + + return max(0, min(100, score)) +```` + +**Action:** + +- Use AI to estimate from content +- **Confidence: MEDIUM** (50-80%) +- Comment confidence on issue + +**Example comment:** + +```markdown +🤖 Estimated metadata (confidence: 65%): + +- Estimated context: 52,000 tokens +- Difficulty: medium +- Recommended agent: glm + +📊 Reasoning: + +- Mentions 3 components (UserService, AuthMiddleware, tests) +- Requires API changes (medium complexity) +- Has acceptance criteria (+confidence) +- Description is detailed (+confidence) + +Note: These are estimates. Actual usage may vary. +``` + +--- + +### Level 3: Minimal Information (Worst Case) + +**When issue is very vague:** + +```markdown +Title: Fix the login bug +Body: Login doesn't work +``` + +**Action:** + +- Use conservative defaults +- **Confidence: LOW** (<50%) +- Warn user, suggest more details + +**Default estimates:** + +```python +DEFAULT_ESTIMATES = { + 'estimated_context': 50000, # Conservative default + 'difficulty': 'medium', + 'assigned_agent': 'sonnet', # Safe middle-tier agent + 'confidence': 30, + 'reasoning': 'Minimal information provided, using defaults' +} +``` + +**Example comment:** + +```markdown +⚠️ Low confidence estimate (30%): + +- Estimated context: 50,000 tokens +- Difficulty: medium +- Recommended agent: sonnet + +📝 Suggestion: For better estimates, please add: + +- Which files/components are affected +- Expected scope (one file? multiple modules?) +- Acceptance criteria or definition of "done" +- Any relevant logs, stack traces, or examples +``` + +--- + +## Oversized Issue Detection & Decomposition + +### 50% Rule Enforcement + +**Before queuing, check if issue exceeds 50% of target agent's limit:** + +```python +async def check_and_decompose(issue: dict, metadata: dict) -> List[dict]: + """Check if issue exceeds 50% rule. If so, decompose.""" + + # Get agent limit + agent = metadata['assigned_agent'] + agent_limit = AGENT_PROFILES[agent]['context_limit'] + max_issue_size = agent_limit * 0.5 + + # Check if oversized + if metadata['estimated_context'] > max_issue_size: + logger.warning( + f"Issue #{issue['number']} exceeds 50% rule: " + f"{metadata['estimated_context']} > {max_issue_size}" + ) + + # Decompose into sub-issues + sub_issues = await decompose_epic(issue, metadata) + + # Comment on parent issue + await gitea_client.comment_on_issue( + issue['number'], + f"⚠️ Issue exceeds 50% rule ({metadata['estimated_context']:,} tokens)\n\n" + f"Auto-decomposing into {len(sub_issues)} sub-issues...\n\n" + f"This issue will be converted to an EPIC tracking the sub-issues." + ) + + # Label as epic + await gitea_client.add_label(issue['number'], 'epic') + + return sub_issues + + else: + # Single issue, good to go + return [issue] +``` + +### Automatic Epic Decomposition + +**When issue is oversized, use AI to break it down:** + +```python +async def decompose_epic(issue: dict, metadata: dict) -> List[dict]: + """Decompose oversized issue into sub-issues.""" + + client = anthropic.Anthropic() + + # Get max issue size for target agent + agent = metadata['assigned_agent'] + max_size = AGENT_PROFILES[agent]['context_limit'] * 0.5 + + response = await client.messages.create( + model="claude-opus-4-5", # Use Opus for decomposition + max_tokens=4000, + messages=[{ + "role": "user", + "content": f"""This issue is too large ({metadata['estimated_context']:,} tokens) +and must be broken into smaller sub-issues. + +**Original Issue:** +Title: {issue['title']} +Body: {issue['body']} + +**Constraints:** +- Each sub-issue must be ≤ {max_size:,} tokens +- Sub-issues should be independently completable +- Maintain logical order (dependencies) +- Cover all aspects of original issue + +**Instructions:** +1. Identify logical breakdown points +2. Create 3-6 sub-issues +3. Estimate context for each +4. Define dependencies (what must come first) + +Return JSON array: +[ + {{ + "title": "Sub-issue title", + "description": "Detailed description", + "estimated_context": , + "difficulty": "low" | "medium" | "high", + "depends_on": [] + }}, + ... +] + +Ensure NO sub-issue exceeds {max_size:,} tokens. +""" + }] + ) + + sub_issues = json.loads(response.content[0].text) + + # Validate all sub-issues fit 50% rule + for sub in sub_issues: + if sub['estimated_context'] > max_size: + raise ValueError( + f"Sub-issue '{sub['title']}' still exceeds limit: " + f"{sub['estimated_context']} > {max_size}" + ) + + # Create sub-issues in Gitea + created_issues = [] + issue_map = {} # title -> issue number + + for sub in sub_issues: + # Create issue + new_issue = await gitea_client.create_issue( + title=f"[SUB] {sub['title']}", + body=f"""**Parent Epic:** #{issue['number']} - {issue['title']} + +## Objective +{sub['description']} + +## Context Estimate +- **Total estimated: {sub['estimated_context']:,} tokens** +- Difficulty: {sub['difficulty']} + +## Dependencies +{format_dependencies(sub['depends_on'], issue_map)} + +## Notes +Auto-generated from epic decomposition. +""", + labels=['sub-issue', f"p{issue.get('priority', 1)}"], + milestone=issue.get('milestone') + ) + + created_issues.append(new_issue) + issue_map[sub['title']] = new_issue['number'] + + # Update parent issue to reference sub-issues + sub_issue_list = '\n'.join( + f"- #{i['number']} {i['title']}" + for i in created_issues + ) + + await gitea_client.comment_on_issue( + issue['number'], + f"## Sub-Issues Created\n\n{sub_issue_list}\n\n" + f"This issue is now an EPIC. Close this when all sub-issues complete." + ) + + return created_issues +``` + +**Example decomposition:** + +```markdown +Original Issue #200: "Refactor authentication system" +Estimated: 180,000 tokens (EXCEEDS 50% rule for Opus: 100K limit) + +Auto-decomposed into: +├─ #201 [SUB] Extract auth middleware (45K tokens) → Ready +├─ #202 [SUB] Implement JWT service (38K tokens) → Blocked by #201 +├─ #203 [SUB] Add token refresh logic (32K tokens) → Blocked by #202 +├─ #204 [SUB] Update auth guards (28K tokens) → Blocked by #202 +└─ #205 [SUB] Add integration tests (35K tokens) → Blocked by #201,#202,#203,#204 + +Total: 178K tokens across 5 sub-issues +Each sub-issue: ≤50K tokens ✅ +``` + +--- + +## Confidence-Based Workflow + +### High Confidence (95%+) + +- **Source:** Structured metadata in issue body +- **Action:** Use values directly, queue immediately +- **Comment:** "✅ Metadata detected, high confidence" + +### Medium Confidence (50-95%) + +- **Source:** Content analysis +- **Action:** Use estimates, queue with note +- **Comment:** "📊 Estimated from content (confidence: X%)" + +### Low Confidence (<50%) + +- **Source:** Minimal info, using defaults +- **Action:** Use defaults, warn user +- **Comment:** "⚠️ Low confidence - please add details" +- **Optional:** Wait for user to update issue before queuing + +--- + +## Confidence Thresholds + +```python +class ConfidenceThresholds: + """Confidence-based behavior thresholds.""" + + AUTO_QUEUE = 60 # ≥60% confidence: Queue automatically + WARN_USER = 50 # <50% confidence: Warn user + WAIT_FOR_UPDATE = 30 # <30% confidence: Don't queue, wait for update +``` + +**Workflow:** + +```python +async def handle_issue_assignment(issue: dict): + """Handle issue assigned to @mosaic.""" + + # Parse metadata (structured or estimated) + metadata = await parse_issue_metadata(issue) + + # Check confidence + if metadata['confidence'] >= ConfidenceThresholds.AUTO_QUEUE: + # High/medium confidence - queue it + await queue_manager.enqueue(issue, metadata) + + await gitea_client.comment_on_issue( + issue['number'], + f"🤖 Added to coordinator queue\n\n" + f"**Metadata** (confidence: {metadata['confidence']}%):\n" + f"- Estimated context: {metadata['estimated_context']:,} tokens\n" + f"- Difficulty: {metadata['difficulty']}\n" + f"- Assigned agent: {metadata['assigned_agent']}\n\n" + f"{metadata.get('reasoning', '')}" + ) + + elif metadata['confidence'] >= ConfidenceThresholds.WARN_USER: + # Low confidence - queue but warn + await queue_manager.enqueue(issue, metadata) + + await gitea_client.comment_on_issue( + issue['number'], + f"⚠️ Low confidence estimate ({metadata['confidence']}%)\n\n" + f"Using best-guess estimates:\n" + f"- Estimated context: {metadata['estimated_context']:,} tokens\n" + f"- Difficulty: {metadata['difficulty']}\n" + f"- Assigned agent: {metadata['assigned_agent']}\n\n" + f"💡 For better estimates, please add:\n" + f"- Which files/components are affected\n" + f"- Expected scope\n" + f"- Acceptance criteria\n\n" + f"Queued anyway - work will proceed with these estimates." + ) + + else: + # Very low confidence - don't queue + await gitea_client.comment_on_issue( + issue['number'], + f"❌ Cannot queue - insufficient information ({metadata['confidence']}%)\n\n" + f"Please add more details:\n" + f"- What files/components need changes?\n" + f"- What is the expected scope?\n" + f"- What are the acceptance criteria?\n\n" + f"Re-assign to @mosaic when ready." + ) + + # Unassign from coordinator + await gitea_client.unassign_issue(issue['number'], 'mosaic') +``` + +--- + +## Edge Cases + +### Case 1: Issue Updated After Queuing + +**User adds details after low-confidence queuing:** + +```python +@app.post('/webhook/gitea') +async def handle_webhook(payload: dict): + """Handle Gitea webhooks.""" + + if payload['action'] == 'edited': + issue = payload['issue'] + + # Check if already in queue + if queue_manager.has_issue(issue['number']): + # Re-parse with updated content + new_metadata = await parse_issue_metadata(issue) + + # Update queue + queue_manager.update_metadata(issue['number'], new_metadata) + + await gitea_client.comment_on_issue( + issue['number'], + f"🔄 Issue updated - re-estimated metadata:\n" + f"- Estimated context: {new_metadata['estimated_context']:,} tokens\n" + f"- Difficulty: {new_metadata['difficulty']}\n" + f"- Confidence: {new_metadata['confidence']}%" + ) +``` + +### Case 2: Decomposition Creates More Oversized Issues + +**If decomposed sub-issue still exceeds 50% rule:** + +```python +# Recursive decomposition +async def decompose_epic(issue: dict, metadata: dict, depth: int = 0) -> List[dict]: + """Decompose with recursion limit.""" + + if depth > 2: + raise ValueError( + f"Issue #{issue['number']} cannot be decomposed enough. " + f"Manual intervention required." + ) + + sub_issues = await ai_decompose(issue, metadata) + + # Check if any sub-issue is still too large + oversized = [s for s in sub_issues if s['estimated_context'] > max_size] + + if oversized: + # Recursively decompose oversized sub-issues + final_issues = [] + for sub in sub_issues: + if sub['estimated_context'] > max_size: + # Decompose further + sub_sub_issues = await decompose_epic(sub, sub, depth + 1) + final_issues.extend(sub_sub_issues) + else: + final_issues.append(sub) + return final_issues + + return sub_issues +``` + +### Case 3: No Clear Decomposition + +**If AI can't find good breakdown points:** + +```python +# Comment on issue, unassign from coordinator +await gitea_client.comment_on_issue( + issue['number'], + f"❌ Cannot auto-decompose this issue.\n\n" + f"Estimated at {metadata['estimated_context']:,} tokens " + f"(exceeds {max_size:,} limit), but no clear breakdown found.\n\n" + f"**Manual action needed:**\n" + f"1. Break this into smaller sub-issues manually\n" + f"2. Assign sub-issues to @mosaic\n" + f"3. This issue can become an EPIC tracking sub-issues\n\n" + f"Unassigning from coordinator." +) + +await gitea_client.unassign_issue(issue['number'], 'mosaic') +``` + +--- + +## Implementation Checklist + +**Phase 0 (COORD-002) must include:** + +- [ ] Structured metadata extraction (existing plan) +- [ ] Content analysis estimation (NEW) +- [ ] Confidence scoring (NEW) +- [ ] Best-guess defaults (NEW) +- [ ] 50% rule validation (NEW) +- [ ] Automatic epic decomposition (NEW) +- [ ] Recursive decomposition handling (NEW) +- [ ] Confidence-based workflow (NEW) +- [ ] Update issue handling (NEW) + +--- + +## Success Criteria + +**Parser handles all issue types:** + +- ✅ Formatted issues → High confidence, extract directly +- ✅ Unformatted issues → Medium confidence, estimate from content +- ✅ Vague issues → Low confidence, use defaults + warn +- ✅ Oversized issues → Auto-decompose, create sub-issues +- ✅ Updated issues → Re-parse, update queue + +**No manual intervention needed for:** + +- Well-formatted issues +- Clear descriptions (even without metadata) +- Oversized issues (auto-decompose) + +**Manual intervention only for:** + +- Very vague issues (<30% confidence) +- Issues that can't be decomposed +- Edge cases requiring human judgment + +--- + +## Example Scenarios + +### Scenario 1: Well-Formatted Issue + +```markdown +Issue #300: [COORD-020] Implement user profile caching + +## Context Estimate + +- Files: 4 +- Total: 52,000 tokens +- Agent: glm + +## Difficulty + +medium +``` + +**Result:** + +- ✅ Extract directly +- Confidence: 95% +- Queue immediately + +--- + +### Scenario 2: Clear But Unformatted Issue + +```markdown +Issue #301: Add caching to user profile API + +Need to cache user profiles to reduce database load. + +Files affected: + +- src/api/users/users.service.ts +- src/cache/cache.service.ts +- src/api/users/users.controller.ts +- tests/users.service.spec.ts + +Acceptance criteria: + +- [ ] Cache GET /users/:id requests +- [ ] 5 minute TTL +- [ ] Invalidate on update/delete +- [ ] Add tests +``` + +**Result:** + +- 📊 Estimate from content +- Files: 4 → 28K base +- Clear scope → Medium complexity (20K) +- Tests mentioned → 10K +- **Total: ~58K tokens** +- Confidence: 75% +- Queue with note + +--- + +### Scenario 3: Vague Issue + +```markdown +Issue #302: Fix user thing + +Users are complaining +``` + +**Result:** + +- ⚠️ Minimal info +- Use defaults (50K, medium, sonnet) +- Confidence: 25% +- Comment: "Please add details" +- Don't queue (<30% threshold) +- Unassign from @mosaic + +--- + +### Scenario 4: Oversized Issue + +```markdown +Issue #303: Refactor entire authentication system + +We need to modernize our auth: + +- Replace session-based auth with JWT +- Add OAuth2 support +- Implement refresh tokens +- Add MFA +- Update all protected routes +- Migration for existing users +``` + +**Result:** + +- 📊 Estimate: 180K tokens +- ⚠️ Exceeds 50% rule (>100K) +- Auto-decompose into sub-issues: + - #304: Extract JWT service (35K) + - #305: Add OAuth2 integration (40K) + - #306: Implement refresh tokens (28K) + - #307: Add MFA support (32K) + - #308: Update route guards (22K) + - #309: User migration script (18K) +- Label #303 as EPIC +- Queue sub-issues + +--- + +## Conclusion + +The issue parser must be **robust and intelligent** to handle real-world issues: + +- ✅ Extract structured metadata when available +- ✅ Estimate from content when missing +- ✅ Use confidence scores to guide behavior +- ✅ Auto-decompose oversized issues +- ✅ Warn users on low confidence +- ✅ Handle edge cases gracefully + +**This makes the coordinator truly autonomous** - it can handle whatever issues users throw at it. + +--- + +**Document Version:** 1.0 +**Created:** 2026-01-31 +**Status:** Proposed - Update COORD-002 +**Priority:** Critical (P0) - Required for real-world usage