Files
stack/docs/3-architecture/issue-parser-estimation-strategy.md
Jason Woltje 403aba4cd3 docs: Add issue parser estimation strategy
Critical enhancement for real-world usage - parser must handle:
- Unformatted issues (estimate from content)
- Incomplete metadata (best-guess + confidence score)
- Oversized issues (auto-decompose before queuing)

Three-level estimation:
1. Structured metadata → extract directly (95%+ confidence)
2. Content analysis → AI estimates from description (50-95%)
3. Minimal info → defaults + warn user (<50%)

50% rule enforcement:
- Detect issues > 50% of agent's context limit
- Auto-decompose into sub-issues using Opus
- Create sub-issues in Gitea with dependencies
- Label parent as EPIC

Confidence-based workflow:
- ≥60%: Queue automatically
- 30-59%: Queue with warning
- <30%: Don't queue, request more details

Makes coordinator truly autonomous - handles whatever users throw at it.

Refs #158 (COORD-002)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-31 15:40:34 -06:00

754 lines
20 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Issue Parser Estimation Strategy
**Status:** Proposed (Phase 0 Enhancement)
**Related Issues:** COORD-002 (Issue Parser Agent)
**Priority:** Critical (P0) - Required for real-world usage
---
## Problem Statement
Not all issues will follow the formatted metadata structure used in COORD-XXX issues. The issue parser must handle:
1. **Unformatted issues** - Just title and description, no metadata
2. **Incomplete metadata** - Some fields present, others missing
3. **Oversized issues** - Exceed 50% rule, need decomposition
4. **Varying formats** - Different teams use different templates
**The parser must make intelligent estimates when metadata is missing.**
---
## Estimation Strategy
### Level 1: Structured Metadata (Best Case)
**When issue has formatted metadata:**
```markdown
## Context Estimate
- Files to modify: 3
- Implementation complexity: medium (20000 tokens)
- **Total estimated: 46800 tokens**
- **Recommended agent: glm**
## Difficulty
medium
```
**Action:**
- Extract directly from markdown
- **Confidence: HIGH** (95%+)
- Use values as-is
---
### Level 2: Content Analysis (Common Case)
**When metadata is missing, analyze issue content:**
#### 2.1 Analyze Title and Description
```python
async def estimate_from_content(issue: dict) -> dict:
"""Estimate metadata from issue content using AI."""
client = anthropic.Anthropic()
response = await client.messages.create(
model="claude-sonnet-4-5",
max_tokens=2000,
messages=[{
"role": "user",
"content": f"""Analyze this issue and estimate resource requirements.
Title: {issue['title']}
Description:
{issue['body']}
Estimate:
1. **Files to modify**: How many files will likely be touched?
- Count mentions of specific files, modules, components
- Look for scope indicators (refactor, add feature, fix bug)
2. **Implementation complexity**:
- Low: Simple CRUD, config changes, one-file fixes
- Medium: Multi-file changes, business logic, API development
- High: Architecture changes, complex refactoring, new systems
3. **Context estimate**:
- Use formula: (files × 7000) + complexity + tests + docs
- Low: ~20-40K tokens
- Medium: ~40-80K tokens
- High: ~80-150K tokens
4. **Difficulty**: low/medium/high
5. **Confidence**: 0-100% (based on clarity of issue description)
Return JSON:
{{
"estimated_context": <integer>,
"difficulty": "low" | "medium" | "high",
"assigned_agent": "haiku" | "sonnet" | "glm" | "opus",
"confidence": <integer 0-100>,
"reasoning": "Brief explanation of estimates"
}}
"""
}]
)
metadata = json.loads(response.content[0].text)
# Add metadata source
metadata['source'] = 'content_analysis'
return metadata
```
**Confidence factors:**
| Factor | High Confidence | Low Confidence |
| ----------------------- | ---------------------------- | ----------------------- |
| **Description length** | >500 chars, detailed | <100 chars, vague |
| **Specific mentions** | Files, modules, APIs named | Generic "fix the thing" |
| **Acceptance criteria** | Clear checklist | None provided |
| **Technical details** | Stack traces, logs, examples | "It's broken" |
| **Scope clarity** | Well-defined boundaries | Open-ended |
**Confidence scoring:**
````python
def calculate_confidence(issue: dict, analysis: dict) -> int:
"""Calculate confidence score 0-100."""
score = 50 # Start at neutral
# Description length
if len(issue['body']) > 500:
score += 15
elif len(issue['body']) < 100:
score -= 20
# Specific file/module mentions
code_patterns = r'(`[^`]+`|\.ts|\.py|\.js|src/|components/)'
mentions = len(re.findall(code_patterns, issue['body']))
score += min(mentions * 5, 20)
# Acceptance criteria
if '- [ ]' in issue['body'] or '- [x]' in issue['body']:
score += 10
# Technical details (stack traces, logs, code blocks)
if '```' in issue['body']:
score += 10
# Scope keywords
scope_keywords = ['refactor', 'implement', 'add', 'fix', 'update']
if any(kw in issue['title'].lower() for kw in scope_keywords):
score += 5
return max(0, min(100, score))
````
**Action:**
- Use AI to estimate from content
- **Confidence: MEDIUM** (50-80%)
- Comment confidence on issue
**Example comment:**
```markdown
🤖 Estimated metadata (confidence: 65%):
- Estimated context: 52,000 tokens
- Difficulty: medium
- Recommended agent: glm
📊 Reasoning:
- Mentions 3 components (UserService, AuthMiddleware, tests)
- Requires API changes (medium complexity)
- Has acceptance criteria (+confidence)
- Description is detailed (+confidence)
Note: These are estimates. Actual usage may vary.
```
---
### Level 3: Minimal Information (Worst Case)
**When issue is very vague:**
```markdown
Title: Fix the login bug
Body: Login doesn't work
```
**Action:**
- Use conservative defaults
- **Confidence: LOW** (<50%)
- Warn user, suggest more details
**Default estimates:**
```python
DEFAULT_ESTIMATES = {
'estimated_context': 50000, # Conservative default
'difficulty': 'medium',
'assigned_agent': 'sonnet', # Safe middle-tier agent
'confidence': 30,
'reasoning': 'Minimal information provided, using defaults'
}
```
**Example comment:**
```markdown
⚠️ Low confidence estimate (30%):
- Estimated context: 50,000 tokens
- Difficulty: medium
- Recommended agent: sonnet
📝 Suggestion: For better estimates, please add:
- Which files/components are affected
- Expected scope (one file? multiple modules?)
- Acceptance criteria or definition of "done"
- Any relevant logs, stack traces, or examples
```
---
## Oversized Issue Detection & Decomposition
### 50% Rule Enforcement
**Before queuing, check if issue exceeds 50% of target agent's limit:**
```python
async def check_and_decompose(issue: dict, metadata: dict) -> List[dict]:
"""Check if issue exceeds 50% rule. If so, decompose."""
# Get agent limit
agent = metadata['assigned_agent']
agent_limit = AGENT_PROFILES[agent]['context_limit']
max_issue_size = agent_limit * 0.5
# Check if oversized
if metadata['estimated_context'] > max_issue_size:
logger.warning(
f"Issue #{issue['number']} exceeds 50% rule: "
f"{metadata['estimated_context']} > {max_issue_size}"
)
# Decompose into sub-issues
sub_issues = await decompose_epic(issue, metadata)
# Comment on parent issue
await gitea_client.comment_on_issue(
issue['number'],
f"⚠️ Issue exceeds 50% rule ({metadata['estimated_context']:,} tokens)\n\n"
f"Auto-decomposing into {len(sub_issues)} sub-issues...\n\n"
f"This issue will be converted to an EPIC tracking the sub-issues."
)
# Label as epic
await gitea_client.add_label(issue['number'], 'epic')
return sub_issues
else:
# Single issue, good to go
return [issue]
```
### Automatic Epic Decomposition
**When issue is oversized, use AI to break it down:**
```python
async def decompose_epic(issue: dict, metadata: dict) -> List[dict]:
"""Decompose oversized issue into sub-issues."""
client = anthropic.Anthropic()
# Get max issue size for target agent
agent = metadata['assigned_agent']
max_size = AGENT_PROFILES[agent]['context_limit'] * 0.5
response = await client.messages.create(
model="claude-opus-4-5", # Use Opus for decomposition
max_tokens=4000,
messages=[{
"role": "user",
"content": f"""This issue is too large ({metadata['estimated_context']:,} tokens)
and must be broken into smaller sub-issues.
**Original Issue:**
Title: {issue['title']}
Body: {issue['body']}
**Constraints:**
- Each sub-issue must be ≤ {max_size:,} tokens
- Sub-issues should be independently completable
- Maintain logical order (dependencies)
- Cover all aspects of original issue
**Instructions:**
1. Identify logical breakdown points
2. Create 3-6 sub-issues
3. Estimate context for each
4. Define dependencies (what must come first)
Return JSON array:
[
{{
"title": "Sub-issue title",
"description": "Detailed description",
"estimated_context": <integer>,
"difficulty": "low" | "medium" | "high",
"depends_on": [<array of titles this depends on>]
}},
...
]
Ensure NO sub-issue exceeds {max_size:,} tokens.
"""
}]
)
sub_issues = json.loads(response.content[0].text)
# Validate all sub-issues fit 50% rule
for sub in sub_issues:
if sub['estimated_context'] > max_size:
raise ValueError(
f"Sub-issue '{sub['title']}' still exceeds limit: "
f"{sub['estimated_context']} > {max_size}"
)
# Create sub-issues in Gitea
created_issues = []
issue_map = {} # title -> issue number
for sub in sub_issues:
# Create issue
new_issue = await gitea_client.create_issue(
title=f"[SUB] {sub['title']}",
body=f"""**Parent Epic:** #{issue['number']} - {issue['title']}
## Objective
{sub['description']}
## Context Estimate
- **Total estimated: {sub['estimated_context']:,} tokens**
- Difficulty: {sub['difficulty']}
## Dependencies
{format_dependencies(sub['depends_on'], issue_map)}
## Notes
Auto-generated from epic decomposition.
""",
labels=['sub-issue', f"p{issue.get('priority', 1)}"],
milestone=issue.get('milestone')
)
created_issues.append(new_issue)
issue_map[sub['title']] = new_issue['number']
# Update parent issue to reference sub-issues
sub_issue_list = '\n'.join(
f"- #{i['number']} {i['title']}"
for i in created_issues
)
await gitea_client.comment_on_issue(
issue['number'],
f"## Sub-Issues Created\n\n{sub_issue_list}\n\n"
f"This issue is now an EPIC. Close this when all sub-issues complete."
)
return created_issues
```
**Example decomposition:**
```markdown
Original Issue #200: "Refactor authentication system"
Estimated: 180,000 tokens (EXCEEDS 50% rule for Opus: 100K limit)
Auto-decomposed into:
├─ #201 [SUB] Extract auth middleware (45K tokens) → Ready
├─ #202 [SUB] Implement JWT service (38K tokens) → Blocked by #201
├─ #203 [SUB] Add token refresh logic (32K tokens) → Blocked by #202
├─ #204 [SUB] Update auth guards (28K tokens) → Blocked by #202
└─ #205 [SUB] Add integration tests (35K tokens) → Blocked by #201,#202,#203,#204
Total: 178K tokens across 5 sub-issues
Each sub-issue: ≤50K tokens ✅
```
---
## Confidence-Based Workflow
### High Confidence (95%+)
- **Source:** Structured metadata in issue body
- **Action:** Use values directly, queue immediately
- **Comment:** "✅ Metadata detected, high confidence"
### Medium Confidence (50-95%)
- **Source:** Content analysis
- **Action:** Use estimates, queue with note
- **Comment:** "📊 Estimated from content (confidence: X%)"
### Low Confidence (<50%)
- **Source:** Minimal info, using defaults
- **Action:** Use defaults, warn user
- **Comment:** "⚠️ Low confidence - please add details"
- **Optional:** Wait for user to update issue before queuing
---
## Confidence Thresholds
```python
class ConfidenceThresholds:
"""Confidence-based behavior thresholds."""
AUTO_QUEUE = 60 # ≥60% confidence: Queue automatically
WARN_USER = 50 # <50% confidence: Warn user
WAIT_FOR_UPDATE = 30 # <30% confidence: Don't queue, wait for update
```
**Workflow:**
```python
async def handle_issue_assignment(issue: dict):
"""Handle issue assigned to @mosaic."""
# Parse metadata (structured or estimated)
metadata = await parse_issue_metadata(issue)
# Check confidence
if metadata['confidence'] >= ConfidenceThresholds.AUTO_QUEUE:
# High/medium confidence - queue it
await queue_manager.enqueue(issue, metadata)
await gitea_client.comment_on_issue(
issue['number'],
f"🤖 Added to coordinator queue\n\n"
f"**Metadata** (confidence: {metadata['confidence']}%):\n"
f"- Estimated context: {metadata['estimated_context']:,} tokens\n"
f"- Difficulty: {metadata['difficulty']}\n"
f"- Assigned agent: {metadata['assigned_agent']}\n\n"
f"{metadata.get('reasoning', '')}"
)
elif metadata['confidence'] >= ConfidenceThresholds.WARN_USER:
# Low confidence - queue but warn
await queue_manager.enqueue(issue, metadata)
await gitea_client.comment_on_issue(
issue['number'],
f"⚠️ Low confidence estimate ({metadata['confidence']}%)\n\n"
f"Using best-guess estimates:\n"
f"- Estimated context: {metadata['estimated_context']:,} tokens\n"
f"- Difficulty: {metadata['difficulty']}\n"
f"- Assigned agent: {metadata['assigned_agent']}\n\n"
f"💡 For better estimates, please add:\n"
f"- Which files/components are affected\n"
f"- Expected scope\n"
f"- Acceptance criteria\n\n"
f"Queued anyway - work will proceed with these estimates."
)
else:
# Very low confidence - don't queue
await gitea_client.comment_on_issue(
issue['number'],
f"❌ Cannot queue - insufficient information ({metadata['confidence']}%)\n\n"
f"Please add more details:\n"
f"- What files/components need changes?\n"
f"- What is the expected scope?\n"
f"- What are the acceptance criteria?\n\n"
f"Re-assign to @mosaic when ready."
)
# Unassign from coordinator
await gitea_client.unassign_issue(issue['number'], 'mosaic')
```
---
## Edge Cases
### Case 1: Issue Updated After Queuing
**User adds details after low-confidence queuing:**
```python
@app.post('/webhook/gitea')
async def handle_webhook(payload: dict):
"""Handle Gitea webhooks."""
if payload['action'] == 'edited':
issue = payload['issue']
# Check if already in queue
if queue_manager.has_issue(issue['number']):
# Re-parse with updated content
new_metadata = await parse_issue_metadata(issue)
# Update queue
queue_manager.update_metadata(issue['number'], new_metadata)
await gitea_client.comment_on_issue(
issue['number'],
f"🔄 Issue updated - re-estimated metadata:\n"
f"- Estimated context: {new_metadata['estimated_context']:,} tokens\n"
f"- Difficulty: {new_metadata['difficulty']}\n"
f"- Confidence: {new_metadata['confidence']}%"
)
```
### Case 2: Decomposition Creates More Oversized Issues
**If decomposed sub-issue still exceeds 50% rule:**
```python
# Recursive decomposition
async def decompose_epic(issue: dict, metadata: dict, depth: int = 0) -> List[dict]:
"""Decompose with recursion limit."""
if depth > 2:
raise ValueError(
f"Issue #{issue['number']} cannot be decomposed enough. "
f"Manual intervention required."
)
sub_issues = await ai_decompose(issue, metadata)
# Check if any sub-issue is still too large
oversized = [s for s in sub_issues if s['estimated_context'] > max_size]
if oversized:
# Recursively decompose oversized sub-issues
final_issues = []
for sub in sub_issues:
if sub['estimated_context'] > max_size:
# Decompose further
sub_sub_issues = await decompose_epic(sub, sub, depth + 1)
final_issues.extend(sub_sub_issues)
else:
final_issues.append(sub)
return final_issues
return sub_issues
```
### Case 3: No Clear Decomposition
**If AI can't find good breakdown points:**
```python
# Comment on issue, unassign from coordinator
await gitea_client.comment_on_issue(
issue['number'],
f"❌ Cannot auto-decompose this issue.\n\n"
f"Estimated at {metadata['estimated_context']:,} tokens "
f"(exceeds {max_size:,} limit), but no clear breakdown found.\n\n"
f"**Manual action needed:**\n"
f"1. Break this into smaller sub-issues manually\n"
f"2. Assign sub-issues to @mosaic\n"
f"3. This issue can become an EPIC tracking sub-issues\n\n"
f"Unassigning from coordinator."
)
await gitea_client.unassign_issue(issue['number'], 'mosaic')
```
---
## Implementation Checklist
**Phase 0 (COORD-002) must include:**
- [ ] Structured metadata extraction (existing plan)
- [ ] Content analysis estimation (NEW)
- [ ] Confidence scoring (NEW)
- [ ] Best-guess defaults (NEW)
- [ ] 50% rule validation (NEW)
- [ ] Automatic epic decomposition (NEW)
- [ ] Recursive decomposition handling (NEW)
- [ ] Confidence-based workflow (NEW)
- [ ] Update issue handling (NEW)
---
## Success Criteria
**Parser handles all issue types:**
- ✅ Formatted issues → High confidence, extract directly
- ✅ Unformatted issues → Medium confidence, estimate from content
- ✅ Vague issues → Low confidence, use defaults + warn
- ✅ Oversized issues → Auto-decompose, create sub-issues
- ✅ Updated issues → Re-parse, update queue
**No manual intervention needed for:**
- Well-formatted issues
- Clear descriptions (even without metadata)
- Oversized issues (auto-decompose)
**Manual intervention only for:**
- Very vague issues (<30% confidence)
- Issues that can't be decomposed
- Edge cases requiring human judgment
---
## Example Scenarios
### Scenario 1: Well-Formatted Issue
```markdown
Issue #300: [COORD-020] Implement user profile caching
## Context Estimate
- Files: 4
- Total: 52,000 tokens
- Agent: glm
## Difficulty
medium
```
**Result:**
- ✅ Extract directly
- Confidence: 95%
- Queue immediately
---
### Scenario 2: Clear But Unformatted Issue
```markdown
Issue #301: Add caching to user profile API
Need to cache user profiles to reduce database load.
Files affected:
- src/api/users/users.service.ts
- src/cache/cache.service.ts
- src/api/users/users.controller.ts
- tests/users.service.spec.ts
Acceptance criteria:
- [ ] Cache GET /users/:id requests
- [ ] 5 minute TTL
- [ ] Invalidate on update/delete
- [ ] Add tests
```
**Result:**
- 📊 Estimate from content
- Files: 4 → 28K base
- Clear scope → Medium complexity (20K)
- Tests mentioned → 10K
- **Total: ~58K tokens**
- Confidence: 75%
- Queue with note
---
### Scenario 3: Vague Issue
```markdown
Issue #302: Fix user thing
Users are complaining
```
**Result:**
- ⚠️ Minimal info
- Use defaults (50K, medium, sonnet)
- Confidence: 25%
- Comment: "Please add details"
- Don't queue (<30% threshold)
- Unassign from @mosaic
---
### Scenario 4: Oversized Issue
```markdown
Issue #303: Refactor entire authentication system
We need to modernize our auth:
- Replace session-based auth with JWT
- Add OAuth2 support
- Implement refresh tokens
- Add MFA
- Update all protected routes
- Migration for existing users
```
**Result:**
- 📊 Estimate: 180K tokens
- ⚠️ Exceeds 50% rule (>100K)
- Auto-decompose into sub-issues:
- #304: Extract JWT service (35K)
- #305: Add OAuth2 integration (40K)
- #306: Implement refresh tokens (28K)
- #307: Add MFA support (32K)
- #308: Update route guards (22K)
- #309: User migration script (18K)
- Label #303 as EPIC
- Queue sub-issues
---
## Conclusion
The issue parser must be **robust and intelligent** to handle real-world issues:
- ✅ Extract structured metadata when available
- ✅ Estimate from content when missing
- ✅ Use confidence scores to guide behavior
- ✅ Auto-decompose oversized issues
- ✅ Warn users on low confidence
- ✅ Handle edge cases gracefully
**This makes the coordinator truly autonomous** - it can handle whatever issues users throw at it.
---
**Document Version:** 1.0
**Created:** 2026-01-31
**Status:** Proposed - Update COORD-002
**Priority:** Critical (P0) - Required for real-world usage