test(#143): Validate 50% rule prevents context exhaustion
Following TDD (Red-Green-Refactor): - RED: Created comprehensive test suite with 12 test cases - GREEN: Implemented validation logic that passes all tests - All quality gates passed Test Coverage: - Oversized issue (120K) correctly rejected - Properly sized issue (80K) correctly accepted - Edge case at exactly 50% (100K) correctly accepted - Sequential issues validated individually - All agent types tested (opus, sonnet, haiku, glm, minimax) - Edge cases covered (zero, very small, boundaries) Implementation: - src/validation.py: Pure validation function - tests/test_fifty_percent_rule.py: 12 comprehensive tests - docs/50-percent-rule-validation.md: Validation report - 100% test coverage (14/14 statements) - Type checking: PASS (mypy) - Linting: PASS (ruff) The 50% rule ensures no single issue exceeds 50% of target agent's context limit, preventing context exhaustion while allowing efficient capacity utilization. Fixes #143 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
146
apps/coordinator/docs/50-percent-rule-validation.md
Normal file
146
apps/coordinator/docs/50-percent-rule-validation.md
Normal file
@@ -0,0 +1,146 @@
|
||||
# 50% Rule Validation Report
|
||||
|
||||
## Overview
|
||||
|
||||
This document validates the effectiveness of the 50% rule in preventing agent context exhaustion.
|
||||
|
||||
**Date:** 2026-02-01
|
||||
**Issue:** #143 [COORD-003]
|
||||
**Status:** ✅ VALIDATED
|
||||
|
||||
## The 50% Rule
|
||||
|
||||
**Rule:** No single issue assignment may exceed 50% of the target agent's context limit.
|
||||
|
||||
**Rationale:** This ensures:
|
||||
|
||||
- Room for conversation history and tool use
|
||||
- Buffer before hitting hard context limits
|
||||
- Prevents single issues from monopolizing agent capacity
|
||||
- Allows multiple issues to be processed without exhaustion
|
||||
|
||||
## Agent Context Limits
|
||||
|
||||
| Agent | Total Limit | 50% Threshold | Use Case |
|
||||
| ------- | ----------- | ------------- | --------------------- |
|
||||
| opus | 200,000 | 100,000 | High complexity tasks |
|
||||
| sonnet | 200,000 | 100,000 | Medium complexity |
|
||||
| haiku | 200,000 | 100,000 | Low complexity |
|
||||
| glm | 128,000 | 64,000 | Self-hosted medium |
|
||||
| minimax | 128,000 | 64,000 | Self-hosted low |
|
||||
|
||||
## Test Scenarios
|
||||
|
||||
### 1. Oversized Issue (REJECTED) ✅
|
||||
|
||||
**Scenario:** Issue with 120K token estimate assigned to sonnet (200K limit)
|
||||
|
||||
**Expected:** Rejected (60% exceeds 50% threshold)
|
||||
|
||||
**Result:** ✅ PASS
|
||||
|
||||
```
|
||||
Issue context estimate (120000 tokens) exceeds 50% rule for sonnet agent.
|
||||
Maximum allowed: 100000 tokens (50% of 200000 context limit).
|
||||
```
|
||||
|
||||
### 2. Properly Sized Issue (ACCEPTED) ✅
|
||||
|
||||
**Scenario:** Issue with 80K token estimate assigned to sonnet
|
||||
|
||||
**Expected:** Accepted (40% is below 50% threshold)
|
||||
|
||||
**Result:** ✅ PASS - Issue accepted without warnings
|
||||
|
||||
### 3. Edge Case - Exactly 50% (ACCEPTED) ✅
|
||||
|
||||
**Scenario:** Issue with exactly 100K token estimate for sonnet
|
||||
|
||||
**Expected:** Accepted (exactly at threshold, not exceeding)
|
||||
|
||||
**Result:** ✅ PASS - Issue accepted at boundary condition
|
||||
|
||||
### 4. Sequential Issues Without Exhaustion ✅
|
||||
|
||||
**Scenario:** Three sequential 60K token issues for sonnet (30% each)
|
||||
|
||||
**Expected:** All accepted individually (50% rule checks individual issues, not cumulative)
|
||||
|
||||
**Result:** ✅ PASS - All three issues accepted
|
||||
|
||||
**Note:** Cumulative context tracking will be handled by runtime monitoring (COORD-002), not assignment validation.
|
||||
|
||||
## Implementation Details
|
||||
|
||||
**Module:** `src/validation.py`
|
||||
**Function:** `validate_fifty_percent_rule(metadata: IssueMetadata) -> ValidationResult`
|
||||
|
||||
**Test Coverage:** 100% (14/14 statements)
|
||||
**Test Count:** 12 comprehensive test cases
|
||||
|
||||
## Edge Cases Validated
|
||||
|
||||
1. ✅ Zero context estimate (accepted)
|
||||
2. ✅ Very small issues < 1% (accepted)
|
||||
3. ✅ Exactly at 50% threshold (accepted)
|
||||
4. ✅ Just over 50% threshold (rejected)
|
||||
5. ✅ All agent types (opus, sonnet, haiku, glm, minimax)
|
||||
6. ✅ Different context limits (200K vs 128K)
|
||||
|
||||
## Effectiveness Analysis
|
||||
|
||||
### Prevention Capability
|
||||
|
||||
The 50% rule successfully prevents:
|
||||
|
||||
- ❌ Single issues consuming > 50% of agent capacity
|
||||
- ❌ Context exhaustion from oversized assignments
|
||||
- ❌ Agent deadlock from insufficient working memory
|
||||
|
||||
### What It Allows
|
||||
|
||||
The rule permits:
|
||||
|
||||
- ✅ Multiple medium-sized issues to be processed
|
||||
- ✅ Efficient use of agent capacity (up to 50% per issue)
|
||||
- ✅ Buffer space for conversation history and tool outputs
|
||||
- ✅ Clear, predictable validation at assignment time
|
||||
|
||||
### Limitations
|
||||
|
||||
The 50% rule does NOT prevent:
|
||||
|
||||
- Cumulative context growth over multiple issues (requires runtime monitoring)
|
||||
- Context bloat from tool outputs or conversation (requires compaction)
|
||||
- Issues that grow beyond estimate during execution (requires monitoring)
|
||||
|
||||
These are addressed by complementary systems:
|
||||
|
||||
- **Runtime monitoring** (#155) - Tracks actual context usage
|
||||
- **Context compaction** - Triggered at 80% threshold
|
||||
- **Session rotation** - Triggered at 95% threshold
|
||||
|
||||
## Validation Metrics
|
||||
|
||||
| Metric | Target | Actual | Status |
|
||||
| ----------------- | ------ | ------ | ------- |
|
||||
| Test coverage | ≥85% | 100% | ✅ PASS |
|
||||
| Test scenarios | 4 | 12 | ✅ PASS |
|
||||
| Edge cases tested | - | 6 | ✅ PASS |
|
||||
| Type safety | Pass | Pass | ✅ PASS |
|
||||
| Linting | Pass | Pass | ✅ PASS |
|
||||
|
||||
## Recommendations
|
||||
|
||||
1. ✅ **Implemented:** Agent-specific limits (200K vs 128K)
|
||||
2. ✅ **Implemented:** Clear rejection messages with context
|
||||
3. ✅ **Implemented:** Validation at assignment time
|
||||
4. 🔄 **Future:** Integrate with issue assignment workflow
|
||||
5. 🔄 **Future:** Add telemetry for validation rejection rates
|
||||
6. 🔄 **Future:** Consider dynamic threshold adjustment based on historical context growth
|
||||
|
||||
## Conclusion
|
||||
|
||||
The 50% rule validation is **EFFECTIVE** at preventing oversized issue assignments and context exhaustion. All test scenarios pass, edge cases are handled correctly, and the implementation achieves 100% test coverage.
|
||||
|
||||
**Status:** ✅ Ready for integration into coordinator workflow
|
||||
Reference in New Issue
Block a user