stack/scripts/coordinator/ESTIMATOR.md

# Context Estimator

Formula-based context estimation for predicting token usage before issue assignment.

## Overview

The context estimator predicts token requirements for issues based on:

- **Files to modify** - Number of files expected to change
- **Implementation complexity** - Complexity of the implementation
- **Test requirements** - Level of testing needed
- **Documentation** - Documentation requirements

It applies a 30% safety buffer to account for iteration, debugging, and unexpected complexity.

## Formula

```
base = (files × 7000) + complexity + tests + docs
total = base × 1.3  (30% safety buffer)
```

### Component Allocations

**Complexity Levels:**

- `LOW` = 10,000 tokens (simple, straightforward)
- `MEDIUM` = 20,000 tokens (moderate complexity, some edge cases)
- `HIGH` = 30,000 tokens (complex logic, many edge cases)

**Test Levels:**

- `LOW` = 5,000 tokens (basic unit tests)
- `MEDIUM` = 10,000 tokens (unit + integration tests)
- `HIGH` = 15,000 tokens (unit + integration + E2E tests)

**Documentation Levels:**

- `NONE` = 0 tokens (no documentation needed)
- `LIGHT` = 2,000 tokens (inline comments, basic docstrings)
- `MEDIUM` = 3,000 tokens (API docs, usage examples)
- `HEAVY` = 5,000 tokens (comprehensive docs, guides)

**Files Context:**

- Each file = 7,000 tokens (for reading and understanding)

**Safety Buffer:**

- 30% buffer (1.3x multiplier) for iteration and debugging

## Agent Recommendations

Based on total estimated tokens:

- **haiku** - < 30K tokens (fast, efficient for small tasks)
- **sonnet** - 30K-80K tokens (balanced for medium tasks)
- **opus** - > 80K tokens (powerful for complex tasks)

## Usage

### Quick Estimation (Convenience Function)

```python
from issue_estimator import estimate_issue

# Simple task
result = estimate_issue(
    files=1,
    complexity="low",
    tests="low",
    docs="none"
)

print(f"Estimated tokens: {result.total_estimate:,}")
print(f"Recommended agent: {result.recommended_agent}")
# Output:
# Estimated tokens: 28,600
# Recommended agent: haiku
```

### Detailed Estimation (Class-based)

```python
from issue_estimator import ContextEstimator, EstimationInput
from models import ComplexityLevel, TestLevel, DocLevel

estimator = ContextEstimator()

input_data = EstimationInput(
    files_to_modify=2,
    implementation_complexity=ComplexityLevel.MEDIUM,
    test_requirements=TestLevel.MEDIUM,
    documentation=DocLevel.LIGHT
)

result = estimator.estimate(input_data)

print(f"Files context: {result.files_context:,} tokens")
print(f"Implementation: {result.implementation_tokens:,} tokens")
print(f"Tests: {result.test_tokens:,} tokens")
print(f"Docs: {result.doc_tokens:,} tokens")
print(f"Base estimate: {result.base_estimate:,} tokens")
print(f"Safety buffer: {result.buffer_tokens:,} tokens")
print(f"Total estimate: {result.total_estimate:,} tokens")
print(f"Recommended agent: {result.recommended_agent}")

# Output:
# Files context: 14,000 tokens
# Implementation: 20,000 tokens
# Tests: 10,000 tokens
# Docs: 2,000 tokens
# Base estimate: 46,000 tokens
# Safety buffer: 13,800 tokens
# Total estimate: 59,800 tokens
# Recommended agent: sonnet
```

### Validation Against Actual Usage

```python
from issue_estimator import ContextEstimator, EstimationInput
from models import ComplexityLevel, TestLevel, DocLevel

estimator = ContextEstimator()

input_data = EstimationInput(
    files_to_modify=2,
    implementation_complexity=ComplexityLevel.MEDIUM,
    test_requirements=TestLevel.MEDIUM,
    documentation=DocLevel.LIGHT
)

# Validate against actual token usage
validation = estimator.validate_against_actual(
    input_data,
    issue_number=154,
    actual_tokens=58000
)

print(f"Issue: #{validation.issue_number}")
print(f"Estimated: {validation.estimated_tokens:,} tokens")
print(f"Actual: {validation.actual_tokens:,} tokens")
print(f"Error: {validation.percentage_error:.2%}")
print(f"Within tolerance (±20%): {validation.within_tolerance}")

# Output:
# Issue: #154
# Estimated: 59,800 tokens
# Actual: 58,000 tokens
# Error: 3.10%
# Within tolerance (±20%): True
```

### Serialization

Convert results to dictionaries for JSON serialization:

```python
from issue_estimator import estimate_issue

result = estimate_issue(files=2, complexity="medium")
result_dict = result.to_dict()

import json
print(json.dumps(result_dict, indent=2))

# Output:
# {
#   "files_context": 14000,
#   "implementation_tokens": 20000,
#   "test_tokens": 10000,
#   "doc_tokens": 2000,
#   "base_estimate": 46000,
#   "buffer_tokens": 13800,
#   "total_estimate": 59800,
#   "recommended_agent": "sonnet"
# }
```

## Examples

### Example 1: Quick Bug Fix

```python
result = estimate_issue(
    files=1,
    complexity="low",
    tests="low",
    docs="none"
)
# Total: 28,600 tokens → haiku
```

### Example 2: Feature Implementation

```python
result = estimate_issue(
    files=3,
    complexity="medium",
    tests="medium",
    docs="light"
)
# Total: 63,700 tokens → sonnet
```

### Example 3: Complex Integration

```python
result = estimate_issue(
    files=10,
    complexity="high",
    tests="high",
    docs="heavy"
)
# Total: 156,000 tokens → opus
```

### Example 4: Configuration Change

```python
result = estimate_issue(
    files=0,  # No code files, just config
    complexity="low",
    tests="low",
    docs="light"
)
# Total: 22,100 tokens → haiku
```

## Running Tests

```bash
# Install dependencies
python3 -m venv venv
source venv/bin/activate  # or venv\Scripts\activate on Windows
pip install pytest pytest-cov

# Run tests
pytest test_issue_estimator.py -v

# Run with coverage
pytest test_issue_estimator.py --cov=issue_estimator --cov=models --cov-report=term-missing

# Expected: 100% coverage (35 tests passing)
```

## Validation Results

The estimator has been validated against historical issues:

| Issue | Description         | Estimated | Formula Result | Accuracy                              |
| ----- | ------------------- | --------- | -------------- | ------------------------------------- |
| #156  | Create bot user     | 15,000    | 22,100         | Formula is more conservative (better) |
| #154  | Context estimator   | 46,800    | 59,800         | Accounts for iteration                |
| #141  | Integration testing | ~80,000   | 94,900         | Accounts for E2E complexity           |

The formula tends to be conservative (estimates higher than initial rough estimates), which is intentional to prevent underestimation.

## Integration with Coordinator

The estimator is used by the coordinator to:

1. **Pre-estimate issues** - Calculate token requirements before assignment
2. **Agent selection** - Recommend appropriate agent (haiku/sonnet/opus)
3. **Resource planning** - Allocate token budgets
4. **Accuracy tracking** - Validate estimates against actual usage

### Coordinator Integration Example

```python
# In coordinator code
from issue_estimator import estimate_issue

# Parse issue metadata
issue_data = parse_issue_description(issue_number)

# Estimate tokens
result = estimate_issue(
    files=issue_data.get("files_to_modify", 1),
    complexity=issue_data.get("complexity", "medium"),
    tests=issue_data.get("tests", "medium"),
    docs=issue_data.get("docs", "light")
)

# Assign to appropriate agent
assign_to_agent(
    issue_number=issue_number,
    agent=result.recommended_agent,
    token_budget=result.total_estimate
)
```

## Design Decisions

### Why 7,000 tokens per file?

Based on empirical analysis:

- Average file: 200-400 lines
- With context (imports, related code): ~500-800 lines
- At ~10 tokens per line: 5,000-8,000 tokens
- Using 7,000 as a conservative middle ground

### Why 30% safety buffer?

Accounts for:

- Iteration and refactoring (10-15%)
- Debugging and troubleshooting (5-10%)
- Unexpected edge cases (5-10%)
- Total: ~30%

### Why these complexity levels?

- **LOW (10K)** - Straightforward CRUD, simple logic
- **MEDIUM (20K)** - Business logic, state management, algorithms
- **HIGH (30K)** - Complex algorithms, distributed systems, optimization

### Why these test levels?

- **LOW (5K)** - Basic happy path tests
- **MEDIUM (10K)** - Happy + sad paths, edge cases
- **HIGH (15K)** - Comprehensive E2E, integration, performance

## API Reference

### Classes

#### `ContextEstimator`

Main estimator class.

**Methods:**

- `estimate(input_data: EstimationInput) -> EstimationResult` - Estimate tokens
- `validate_against_actual(input_data, issue_number, actual_tokens) -> ValidationResult` - Validate estimate

#### `EstimationInput`

Input parameters for estimation.

**Fields:**

- `files_to_modify: int` - Number of files to modify
- `implementation_complexity: ComplexityLevel` - Complexity level
- `test_requirements: TestLevel` - Test level
- `documentation: DocLevel` - Documentation level

#### `EstimationResult`

Result of estimation.

**Fields:**

- `files_context: int` - Tokens for file context
- `implementation_tokens: int` - Tokens for implementation
- `test_tokens: int` - Tokens for tests
- `doc_tokens: int` - Tokens for documentation
- `base_estimate: int` - Sum before buffer
- `buffer_tokens: int` - Safety buffer tokens
- `total_estimate: int` - Final estimate with buffer
- `recommended_agent: str` - Recommended agent (haiku/sonnet/opus)

**Methods:**

- `to_dict() -> dict` - Convert to dictionary

#### `ValidationResult`

Result of validation against actual usage.

**Fields:**

- `issue_number: int` - Issue number
- `estimated_tokens: int` - Estimated tokens
- `actual_tokens: int` - Actual tokens used
- `percentage_error: float` - Error percentage
- `within_tolerance: bool` - Whether within ±20%
- `notes: str` - Optional notes

**Methods:**

- `to_dict() -> dict` - Convert to dictionary

### Enums

#### `ComplexityLevel`

Implementation complexity levels.

- `LOW = 10000`
- `MEDIUM = 20000`
- `HIGH = 30000`

#### `TestLevel`

Test requirement levels.

- `LOW = 5000`
- `MEDIUM = 10000`
- `HIGH = 15000`

#### `DocLevel`

Documentation requirement levels.

- `NONE = 0`
- `LIGHT = 2000`
- `MEDIUM = 3000`
- `HEAVY = 5000`

### Functions

#### `estimate_issue(files, complexity, tests, docs)`

Convenience function for quick estimation.

**Parameters:**

- `files: int` - Number of files to modify
- `complexity: str` - "low", "medium", or "high"
- `tests: str` - "low", "medium", or "high"
- `docs: str` - "none", "light", "medium", or "heavy"

**Returns:**

- `EstimationResult` - Estimation result

## Future Enhancements

Potential improvements for future versions:

1. **Machine learning calibration** - Learn from actual usage
2. **Language-specific multipliers** - Adjust for Python vs TypeScript
3. **Historical accuracy tracking** - Track estimator accuracy over time
4. **Confidence intervals** - Provide ranges instead of point estimates
5. **Workspace-specific tuning** - Allow per-workspace calibration

## Related Documentation

- [Coordinator Architecture](../../docs/3-architecture/non-ai-coordinator-comprehensive.md)
- [Issue #154 - Context Estimator](https://git.mosaicstack.dev/mosaic/stack/issues/154)
- [Coordinator Scripts README](README.md)

## Support

For issues or questions about the context estimator:

1. Check examples in this document
2. Review test cases in `test_issue_estimator.py`
3. Open an issue in the repository