Files
stack/docs/3-architecture/issue-parser-estimation-strategy.md
Jason Woltje 403aba4cd3 docs: Add issue parser estimation strategy
Critical enhancement for real-world usage - parser must handle:
- Unformatted issues (estimate from content)
- Incomplete metadata (best-guess + confidence score)
- Oversized issues (auto-decompose before queuing)

Three-level estimation:
1. Structured metadata → extract directly (95%+ confidence)
2. Content analysis → AI estimates from description (50-95%)
3. Minimal info → defaults + warn user (<50%)

50% rule enforcement:
- Detect issues > 50% of agent's context limit
- Auto-decompose into sub-issues using Opus
- Create sub-issues in Gitea with dependencies
- Label parent as EPIC

Confidence-based workflow:
- ≥60%: Queue automatically
- 30-59%: Queue with warning
- <30%: Don't queue, request more details

Makes coordinator truly autonomous - handles whatever users throw at it.

Refs #158 (COORD-002)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-31 15:40:34 -06:00

20 KiB
Raw Blame History

Issue Parser Estimation Strategy

Status: Proposed (Phase 0 Enhancement) Related Issues: COORD-002 (Issue Parser Agent) Priority: Critical (P0) - Required for real-world usage


Problem Statement

Not all issues will follow the formatted metadata structure used in COORD-XXX issues. The issue parser must handle:

  1. Unformatted issues - Just title and description, no metadata
  2. Incomplete metadata - Some fields present, others missing
  3. Oversized issues - Exceed 50% rule, need decomposition
  4. Varying formats - Different teams use different templates

The parser must make intelligent estimates when metadata is missing.


Estimation Strategy

Level 1: Structured Metadata (Best Case)

When issue has formatted metadata:

## Context Estimate

- Files to modify: 3
- Implementation complexity: medium (20000 tokens)
- **Total estimated: 46800 tokens**
- **Recommended agent: glm**

## Difficulty

medium

Action:

  • Extract directly from markdown
  • Confidence: HIGH (95%+)
  • Use values as-is

Level 2: Content Analysis (Common Case)

When metadata is missing, analyze issue content:

2.1 Analyze Title and Description

async def estimate_from_content(issue: dict) -> dict:
    """Estimate metadata from issue content using AI."""

    client = anthropic.Anthropic()

    response = await client.messages.create(
        model="claude-sonnet-4-5",
        max_tokens=2000,
        messages=[{
            "role": "user",
            "content": f"""Analyze this issue and estimate resource requirements.

Title: {issue['title']}
Description:
{issue['body']}

Estimate:
1. **Files to modify**: How many files will likely be touched?
   - Count mentions of specific files, modules, components
   - Look for scope indicators (refactor, add feature, fix bug)

2. **Implementation complexity**:
   - Low: Simple CRUD, config changes, one-file fixes
   - Medium: Multi-file changes, business logic, API development
   - High: Architecture changes, complex refactoring, new systems

3. **Context estimate**:
   - Use formula: (files × 7000) + complexity + tests + docs
   - Low: ~20-40K tokens
   - Medium: ~40-80K tokens
   - High: ~80-150K tokens

4. **Difficulty**: low/medium/high

5. **Confidence**: 0-100% (based on clarity of issue description)

Return JSON:
{{
  "estimated_context": <integer>,
  "difficulty": "low" | "medium" | "high",
  "assigned_agent": "haiku" | "sonnet" | "glm" | "opus",
  "confidence": <integer 0-100>,
  "reasoning": "Brief explanation of estimates"
}}
"""
        }]
    )

    metadata = json.loads(response.content[0].text)

    # Add metadata source
    metadata['source'] = 'content_analysis'

    return metadata

Confidence factors:

Factor High Confidence Low Confidence
Description length >500 chars, detailed <100 chars, vague
Specific mentions Files, modules, APIs named Generic "fix the thing"
Acceptance criteria Clear checklist None provided
Technical details Stack traces, logs, examples "It's broken"
Scope clarity Well-defined boundaries Open-ended

Confidence scoring:

def calculate_confidence(issue: dict, analysis: dict) -> int:
    """Calculate confidence score 0-100."""

    score = 50  # Start at neutral

    # Description length
    if len(issue['body']) > 500:
        score += 15
    elif len(issue['body']) < 100:
        score -= 20

    # Specific file/module mentions
    code_patterns = r'(`[^`]+`|\.ts|\.py|\.js|src/|components/)'
    mentions = len(re.findall(code_patterns, issue['body']))
    score += min(mentions * 5, 20)

    # Acceptance criteria
    if '- [ ]' in issue['body'] or '- [x]' in issue['body']:
        score += 10

    # Technical details (stack traces, logs, code blocks)
    if '```' in issue['body']:
        score += 10

    # Scope keywords
    scope_keywords = ['refactor', 'implement', 'add', 'fix', 'update']
    if any(kw in issue['title'].lower() for kw in scope_keywords):
        score += 5

    return max(0, min(100, score))

Action:

  • Use AI to estimate from content
  • Confidence: MEDIUM (50-80%)
  • Comment confidence on issue

Example comment:

🤖 Estimated metadata (confidence: 65%):

- Estimated context: 52,000 tokens
- Difficulty: medium
- Recommended agent: glm

📊 Reasoning:

- Mentions 3 components (UserService, AuthMiddleware, tests)
- Requires API changes (medium complexity)
- Has acceptance criteria (+confidence)
- Description is detailed (+confidence)

Note: These are estimates. Actual usage may vary.

Level 3: Minimal Information (Worst Case)

When issue is very vague:

Title: Fix the login bug
Body: Login doesn't work

Action:

  • Use conservative defaults
  • Confidence: LOW (<50%)
  • Warn user, suggest more details

Default estimates:

DEFAULT_ESTIMATES = {
    'estimated_context': 50000,      # Conservative default
    'difficulty': 'medium',
    'assigned_agent': 'sonnet',      # Safe middle-tier agent
    'confidence': 30,
    'reasoning': 'Minimal information provided, using defaults'
}

Example comment:

⚠️ Low confidence estimate (30%):

- Estimated context: 50,000 tokens
- Difficulty: medium
- Recommended agent: sonnet

📝 Suggestion: For better estimates, please add:

- Which files/components are affected
- Expected scope (one file? multiple modules?)
- Acceptance criteria or definition of "done"
- Any relevant logs, stack traces, or examples

Oversized Issue Detection & Decomposition

50% Rule Enforcement

Before queuing, check if issue exceeds 50% of target agent's limit:

async def check_and_decompose(issue: dict, metadata: dict) -> List[dict]:
    """Check if issue exceeds 50% rule. If so, decompose."""

    # Get agent limit
    agent = metadata['assigned_agent']
    agent_limit = AGENT_PROFILES[agent]['context_limit']
    max_issue_size = agent_limit * 0.5

    # Check if oversized
    if metadata['estimated_context'] > max_issue_size:
        logger.warning(
            f"Issue #{issue['number']} exceeds 50% rule: "
            f"{metadata['estimated_context']} > {max_issue_size}"
        )

        # Decompose into sub-issues
        sub_issues = await decompose_epic(issue, metadata)

        # Comment on parent issue
        await gitea_client.comment_on_issue(
            issue['number'],
            f"⚠️ Issue exceeds 50% rule ({metadata['estimated_context']:,} tokens)\n\n"
            f"Auto-decomposing into {len(sub_issues)} sub-issues...\n\n"
            f"This issue will be converted to an EPIC tracking the sub-issues."
        )

        # Label as epic
        await gitea_client.add_label(issue['number'], 'epic')

        return sub_issues

    else:
        # Single issue, good to go
        return [issue]

Automatic Epic Decomposition

When issue is oversized, use AI to break it down:

async def decompose_epic(issue: dict, metadata: dict) -> List[dict]:
    """Decompose oversized issue into sub-issues."""

    client = anthropic.Anthropic()

    # Get max issue size for target agent
    agent = metadata['assigned_agent']
    max_size = AGENT_PROFILES[agent]['context_limit'] * 0.5

    response = await client.messages.create(
        model="claude-opus-4-5",  # Use Opus for decomposition
        max_tokens=4000,
        messages=[{
            "role": "user",
            "content": f"""This issue is too large ({metadata['estimated_context']:,} tokens)
and must be broken into smaller sub-issues.

**Original Issue:**
Title: {issue['title']}
Body: {issue['body']}

**Constraints:**
- Each sub-issue must be ≤ {max_size:,} tokens
- Sub-issues should be independently completable
- Maintain logical order (dependencies)
- Cover all aspects of original issue

**Instructions:**
1. Identify logical breakdown points
2. Create 3-6 sub-issues
3. Estimate context for each
4. Define dependencies (what must come first)

Return JSON array:
[
  {{
    "title": "Sub-issue title",
    "description": "Detailed description",
    "estimated_context": <integer>,
    "difficulty": "low" | "medium" | "high",
    "depends_on": [<array of titles this depends on>]
  }},
  ...
]

Ensure NO sub-issue exceeds {max_size:,} tokens.
"""
        }]
    )

    sub_issues = json.loads(response.content[0].text)

    # Validate all sub-issues fit 50% rule
    for sub in sub_issues:
        if sub['estimated_context'] > max_size:
            raise ValueError(
                f"Sub-issue '{sub['title']}' still exceeds limit: "
                f"{sub['estimated_context']} > {max_size}"
            )

    # Create sub-issues in Gitea
    created_issues = []
    issue_map = {}  # title -> issue number

    for sub in sub_issues:
        # Create issue
        new_issue = await gitea_client.create_issue(
            title=f"[SUB] {sub['title']}",
            body=f"""**Parent Epic:** #{issue['number']} - {issue['title']}

## Objective
{sub['description']}

## Context Estimate
- **Total estimated: {sub['estimated_context']:,} tokens**
- Difficulty: {sub['difficulty']}

## Dependencies
{format_dependencies(sub['depends_on'], issue_map)}

## Notes
Auto-generated from epic decomposition.
""",
            labels=['sub-issue', f"p{issue.get('priority', 1)}"],
            milestone=issue.get('milestone')
        )

        created_issues.append(new_issue)
        issue_map[sub['title']] = new_issue['number']

    # Update parent issue to reference sub-issues
    sub_issue_list = '\n'.join(
        f"- #{i['number']} {i['title']}"
        for i in created_issues
    )

    await gitea_client.comment_on_issue(
        issue['number'],
        f"## Sub-Issues Created\n\n{sub_issue_list}\n\n"
        f"This issue is now an EPIC. Close this when all sub-issues complete."
    )

    return created_issues

Example decomposition:

Original Issue #200: "Refactor authentication system"
Estimated: 180,000 tokens (EXCEEDS 50% rule for Opus: 100K limit)

Auto-decomposed into:
├─ #201 [SUB] Extract auth middleware (45K tokens) → Ready
├─ #202 [SUB] Implement JWT service (38K tokens) → Blocked by #201
├─ #203 [SUB] Add token refresh logic (32K tokens) → Blocked by #202
├─ #204 [SUB] Update auth guards (28K tokens) → Blocked by #202
└─ #205 [SUB] Add integration tests (35K tokens) → Blocked by #201,#202,#203,#204

Total: 178K tokens across 5 sub-issues
Each sub-issue: ≤50K tokens ✅

Confidence-Based Workflow

High Confidence (95%+)

  • Source: Structured metadata in issue body
  • Action: Use values directly, queue immediately
  • Comment: " Metadata detected, high confidence"

Medium Confidence (50-95%)

  • Source: Content analysis
  • Action: Use estimates, queue with note
  • Comment: "📊 Estimated from content (confidence: X%)"

Low Confidence (<50%)

  • Source: Minimal info, using defaults
  • Action: Use defaults, warn user
  • Comment: "⚠️ Low confidence - please add details"
  • Optional: Wait for user to update issue before queuing

Confidence Thresholds

class ConfidenceThresholds:
    """Confidence-based behavior thresholds."""

    AUTO_QUEUE = 60      # ≥60% confidence: Queue automatically
    WARN_USER = 50       # <50% confidence: Warn user
    WAIT_FOR_UPDATE = 30 # <30% confidence: Don't queue, wait for update

Workflow:

async def handle_issue_assignment(issue: dict):
    """Handle issue assigned to @mosaic."""

    # Parse metadata (structured or estimated)
    metadata = await parse_issue_metadata(issue)

    # Check confidence
    if metadata['confidence'] >= ConfidenceThresholds.AUTO_QUEUE:
        # High/medium confidence - queue it
        await queue_manager.enqueue(issue, metadata)

        await gitea_client.comment_on_issue(
            issue['number'],
            f"🤖 Added to coordinator queue\n\n"
            f"**Metadata** (confidence: {metadata['confidence']}%):\n"
            f"- Estimated context: {metadata['estimated_context']:,} tokens\n"
            f"- Difficulty: {metadata['difficulty']}\n"
            f"- Assigned agent: {metadata['assigned_agent']}\n\n"
            f"{metadata.get('reasoning', '')}"
        )

    elif metadata['confidence'] >= ConfidenceThresholds.WARN_USER:
        # Low confidence - queue but warn
        await queue_manager.enqueue(issue, metadata)

        await gitea_client.comment_on_issue(
            issue['number'],
            f"⚠️ Low confidence estimate ({metadata['confidence']}%)\n\n"
            f"Using best-guess estimates:\n"
            f"- Estimated context: {metadata['estimated_context']:,} tokens\n"
            f"- Difficulty: {metadata['difficulty']}\n"
            f"- Assigned agent: {metadata['assigned_agent']}\n\n"
            f"💡 For better estimates, please add:\n"
            f"- Which files/components are affected\n"
            f"- Expected scope\n"
            f"- Acceptance criteria\n\n"
            f"Queued anyway - work will proceed with these estimates."
        )

    else:
        # Very low confidence - don't queue
        await gitea_client.comment_on_issue(
            issue['number'],
            f"❌ Cannot queue - insufficient information ({metadata['confidence']}%)\n\n"
            f"Please add more details:\n"
            f"- What files/components need changes?\n"
            f"- What is the expected scope?\n"
            f"- What are the acceptance criteria?\n\n"
            f"Re-assign to @mosaic when ready."
        )

        # Unassign from coordinator
        await gitea_client.unassign_issue(issue['number'], 'mosaic')

Edge Cases

Case 1: Issue Updated After Queuing

User adds details after low-confidence queuing:

@app.post('/webhook/gitea')
async def handle_webhook(payload: dict):
    """Handle Gitea webhooks."""

    if payload['action'] == 'edited':
        issue = payload['issue']

        # Check if already in queue
        if queue_manager.has_issue(issue['number']):
            # Re-parse with updated content
            new_metadata = await parse_issue_metadata(issue)

            # Update queue
            queue_manager.update_metadata(issue['number'], new_metadata)

            await gitea_client.comment_on_issue(
                issue['number'],
                f"🔄 Issue updated - re-estimated metadata:\n"
                f"- Estimated context: {new_metadata['estimated_context']:,} tokens\n"
                f"- Difficulty: {new_metadata['difficulty']}\n"
                f"- Confidence: {new_metadata['confidence']}%"
            )

Case 2: Decomposition Creates More Oversized Issues

If decomposed sub-issue still exceeds 50% rule:

# Recursive decomposition
async def decompose_epic(issue: dict, metadata: dict, depth: int = 0) -> List[dict]:
    """Decompose with recursion limit."""

    if depth > 2:
        raise ValueError(
            f"Issue #{issue['number']} cannot be decomposed enough. "
            f"Manual intervention required."
        )

    sub_issues = await ai_decompose(issue, metadata)

    # Check if any sub-issue is still too large
    oversized = [s for s in sub_issues if s['estimated_context'] > max_size]

    if oversized:
        # Recursively decompose oversized sub-issues
        final_issues = []
        for sub in sub_issues:
            if sub['estimated_context'] > max_size:
                # Decompose further
                sub_sub_issues = await decompose_epic(sub, sub, depth + 1)
                final_issues.extend(sub_sub_issues)
            else:
                final_issues.append(sub)
        return final_issues

    return sub_issues

Case 3: No Clear Decomposition

If AI can't find good breakdown points:

# Comment on issue, unassign from coordinator
await gitea_client.comment_on_issue(
    issue['number'],
    f"❌ Cannot auto-decompose this issue.\n\n"
    f"Estimated at {metadata['estimated_context']:,} tokens "
    f"(exceeds {max_size:,} limit), but no clear breakdown found.\n\n"
    f"**Manual action needed:**\n"
    f"1. Break this into smaller sub-issues manually\n"
    f"2. Assign sub-issues to @mosaic\n"
    f"3. This issue can become an EPIC tracking sub-issues\n\n"
    f"Unassigning from coordinator."
)

await gitea_client.unassign_issue(issue['number'], 'mosaic')

Implementation Checklist

Phase 0 (COORD-002) must include:

  • Structured metadata extraction (existing plan)
  • Content analysis estimation (NEW)
  • Confidence scoring (NEW)
  • Best-guess defaults (NEW)
  • 50% rule validation (NEW)
  • Automatic epic decomposition (NEW)
  • Recursive decomposition handling (NEW)
  • Confidence-based workflow (NEW)
  • Update issue handling (NEW)

Success Criteria

Parser handles all issue types:

  • Formatted issues → High confidence, extract directly
  • Unformatted issues → Medium confidence, estimate from content
  • Vague issues → Low confidence, use defaults + warn
  • Oversized issues → Auto-decompose, create sub-issues
  • Updated issues → Re-parse, update queue

No manual intervention needed for:

  • Well-formatted issues
  • Clear descriptions (even without metadata)
  • Oversized issues (auto-decompose)

Manual intervention only for:

  • Very vague issues (<30% confidence)
  • Issues that can't be decomposed
  • Edge cases requiring human judgment

Example Scenarios

Scenario 1: Well-Formatted Issue

Issue #300: [COORD-020] Implement user profile caching

## Context Estimate

- Files: 4
- Total: 52,000 tokens
- Agent: glm

## Difficulty

medium

Result:

  • Extract directly
  • Confidence: 95%
  • Queue immediately

Scenario 2: Clear But Unformatted Issue

Issue #301: Add caching to user profile API

Need to cache user profiles to reduce database load.

Files affected:

- src/api/users/users.service.ts
- src/cache/cache.service.ts
- src/api/users/users.controller.ts
- tests/users.service.spec.ts

Acceptance criteria:

- [ ] Cache GET /users/:id requests
- [ ] 5 minute TTL
- [ ] Invalidate on update/delete
- [ ] Add tests

Result:

  • 📊 Estimate from content
  • Files: 4 → 28K base
  • Clear scope → Medium complexity (20K)
  • Tests mentioned → 10K
  • Total: ~58K tokens
  • Confidence: 75%
  • Queue with note

Scenario 3: Vague Issue

Issue #302: Fix user thing

Users are complaining

Result:

  • ⚠️ Minimal info
  • Use defaults (50K, medium, sonnet)
  • Confidence: 25%
  • Comment: "Please add details"
  • Don't queue (<30% threshold)
  • Unassign from @mosaic

Scenario 4: Oversized Issue

Issue #303: Refactor entire authentication system

We need to modernize our auth:

- Replace session-based auth with JWT
- Add OAuth2 support
- Implement refresh tokens
- Add MFA
- Update all protected routes
- Migration for existing users

Result:

  • 📊 Estimate: 180K tokens
  • ⚠️ Exceeds 50% rule (>100K)
  • Auto-decompose into sub-issues:
    • #304: Extract JWT service (35K)
    • #305: Add OAuth2 integration (40K)
    • #306: Implement refresh tokens (28K)
    • #307: Add MFA support (32K)
    • #308: Update route guards (22K)
    • #309: User migration script (18K)
  • Label #303 as EPIC
  • Queue sub-issues

Conclusion

The issue parser must be robust and intelligent to handle real-world issues:

  • Extract structured metadata when available
  • Estimate from content when missing
  • Use confidence scores to guide behavior
  • Auto-decompose oversized issues
  • Warn users on low confidence
  • Handle edge cases gracefully

This makes the coordinator truly autonomous - it can handle whatever issues users throw at it.


Document Version: 1.0 Created: 2026-01-31 Status: Proposed - Update COORD-002 Priority: Critical (P0) - Required for real-world usage