test(#143): Validate 50% rule prevents context exhaustion

Following TDD (Red-Green-Refactor):
- RED: Created comprehensive test suite with 12 test cases
- GREEN: Implemented validation logic that passes all tests
- All quality gates passed

Test Coverage:
- Oversized issue (120K) correctly rejected
- Properly sized issue (80K) correctly accepted
- Edge case at exactly 50% (100K) correctly accepted
- Sequential issues validated individually
- All agent types tested (opus, sonnet, haiku, glm, minimax)
- Edge cases covered (zero, very small, boundaries)

Implementation:
- src/validation.py: Pure validation function
- tests/test_fifty_percent_rule.py: 12 comprehensive tests
- docs/50-percent-rule-validation.md: Validation report
- 100% test coverage (14/14 statements)
- Type checking: PASS (mypy)
- Linting: PASS (ruff)

The 50% rule ensures no single issue exceeds 50% of target
agent's context limit, preventing context exhaustion while
allowing efficient capacity utilization.

Fixes #143

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
2026-02-01 17:56:04 -06:00
parent 72321f5fcd
commit a1b911d836
4 changed files with 474 additions and 0 deletions

View File

@@ -0,0 +1,146 @@
# 50% Rule Validation Report
## Overview
This document validates the effectiveness of the 50% rule in preventing agent context exhaustion.
**Date:** 2026-02-01
**Issue:** #143 [COORD-003]
**Status:** ✅ VALIDATED
## The 50% Rule
**Rule:** No single issue assignment may exceed 50% of the target agent's context limit.
**Rationale:** This ensures:
- Room for conversation history and tool use
- Buffer before hitting hard context limits
- Prevents single issues from monopolizing agent capacity
- Allows multiple issues to be processed without exhaustion
## Agent Context Limits
| Agent | Total Limit | 50% Threshold | Use Case |
| ------- | ----------- | ------------- | --------------------- |
| opus | 200,000 | 100,000 | High complexity tasks |
| sonnet | 200,000 | 100,000 | Medium complexity |
| haiku | 200,000 | 100,000 | Low complexity |
| glm | 128,000 | 64,000 | Self-hosted medium |
| minimax | 128,000 | 64,000 | Self-hosted low |
## Test Scenarios
### 1. Oversized Issue (REJECTED) ✅
**Scenario:** Issue with 120K token estimate assigned to sonnet (200K limit)
**Expected:** Rejected (60% exceeds 50% threshold)
**Result:** ✅ PASS
```
Issue context estimate (120000 tokens) exceeds 50% rule for sonnet agent.
Maximum allowed: 100000 tokens (50% of 200000 context limit).
```
### 2. Properly Sized Issue (ACCEPTED) ✅
**Scenario:** Issue with 80K token estimate assigned to sonnet
**Expected:** Accepted (40% is below 50% threshold)
**Result:** ✅ PASS - Issue accepted without warnings
### 3. Edge Case - Exactly 50% (ACCEPTED) ✅
**Scenario:** Issue with exactly 100K token estimate for sonnet
**Expected:** Accepted (exactly at threshold, not exceeding)
**Result:** ✅ PASS - Issue accepted at boundary condition
### 4. Sequential Issues Without Exhaustion ✅
**Scenario:** Three sequential 60K token issues for sonnet (30% each)
**Expected:** All accepted individually (50% rule checks individual issues, not cumulative)
**Result:** ✅ PASS - All three issues accepted
**Note:** Cumulative context tracking will be handled by runtime monitoring (COORD-002), not assignment validation.
## Implementation Details
**Module:** `src/validation.py`
**Function:** `validate_fifty_percent_rule(metadata: IssueMetadata) -> ValidationResult`
**Test Coverage:** 100% (14/14 statements)
**Test Count:** 12 comprehensive test cases
## Edge Cases Validated
1. ✅ Zero context estimate (accepted)
2. ✅ Very small issues < 1% (accepted)
3. ✅ Exactly at 50% threshold (accepted)
4. ✅ Just over 50% threshold (rejected)
5. ✅ All agent types (opus, sonnet, haiku, glm, minimax)
6. ✅ Different context limits (200K vs 128K)
## Effectiveness Analysis
### Prevention Capability
The 50% rule successfully prevents:
- ❌ Single issues consuming > 50% of agent capacity
- ❌ Context exhaustion from oversized assignments
- ❌ Agent deadlock from insufficient working memory
### What It Allows
The rule permits:
- ✅ Multiple medium-sized issues to be processed
- ✅ Efficient use of agent capacity (up to 50% per issue)
- ✅ Buffer space for conversation history and tool outputs
- ✅ Clear, predictable validation at assignment time
### Limitations
The 50% rule does NOT prevent:
- Cumulative context growth over multiple issues (requires runtime monitoring)
- Context bloat from tool outputs or conversation (requires compaction)
- Issues that grow beyond estimate during execution (requires monitoring)
These are addressed by complementary systems:
- **Runtime monitoring** (#155) - Tracks actual context usage
- **Context compaction** - Triggered at 80% threshold
- **Session rotation** - Triggered at 95% threshold
## Validation Metrics
| Metric | Target | Actual | Status |
| ----------------- | ------ | ------ | ------- |
| Test coverage | ≥85% | 100% | ✅ PASS |
| Test scenarios | 4 | 12 | ✅ PASS |
| Edge cases tested | - | 6 | ✅ PASS |
| Type safety | Pass | Pass | ✅ PASS |
| Linting | Pass | Pass | ✅ PASS |
## Recommendations
1.**Implemented:** Agent-specific limits (200K vs 128K)
2.**Implemented:** Clear rejection messages with context
3.**Implemented:** Validation at assignment time
4. 🔄 **Future:** Integrate with issue assignment workflow
5. 🔄 **Future:** Add telemetry for validation rejection rates
6. 🔄 **Future:** Consider dynamic threshold adjustment based on historical context growth
## Conclusion
The 50% rule validation is **EFFECTIVE** at preventing oversized issue assignments and context exhaustion. All test scenarios pass, edge cases are handled correctly, and the implementation achieves 100% test coverage.
**Status:** ✅ Ready for integration into coordinator workflow