test(#143): Validate 50% rule prevents context exhaustion

Following TDD (Red-Green-Refactor): - RED: Created comprehensive test suite with 12 test cases - GREEN: Implemented validation logic that passes all tests - All quality gates passed Test Coverage: - Oversized issue (120K) correctly rejected - Properly sized issue (80K) correctly accepted - Edge case at exactly 50% (100K) correctly accepted - Sequential issues validated individually - All agent types tested (opus, sonnet, haiku, glm, minimax) - Edge cases covered (zero, very small, boundaries) Implementation: - src/validation.py: Pure validation function - tests/test_fifty_percent_rule.py: 12 comprehensive tests - docs/50-percent-rule-validation.md: Validation report - 100% test coverage (14/14 statements) - Type checking: PASS (mypy) - Linting: PASS (ruff) The 50% rule ensures no single issue exceeds 50% of target agent's context limit, preventing context exhaustion while allowing efficient capacity utilization. Fixes #143 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-01 17:56:04 -06:00
parent 72321f5fcd
commit a1b911d836
4 changed files with 474 additions and 0 deletions
--- a/apps/coordinator/docs/50-percent-rule-validation.md
+++ b/apps/coordinator/docs/50-percent-rule-validation.md
@@ -0,0 +1,146 @@
+# 50% Rule Validation Report
+
+## Overview
+
+This document validates the effectiveness of the 50% rule in preventing agent context exhaustion.
+
+**Date:** 2026-02-01
+**Issue:** #143 [COORD-003]
+**Status:** ✅ VALIDATED
+
+## The 50% Rule
+
+**Rule:** No single issue assignment may exceed 50% of the target agent's context limit.
+
+**Rationale:** This ensures:
+
+- Room for conversation history and tool use
+- Buffer before hitting hard context limits
+- Prevents single issues from monopolizing agent capacity
+- Allows multiple issues to be processed without exhaustion
+
+## Agent Context Limits
+
+| Agent   | Total Limit | 50% Threshold | Use Case              |
+| ------- | ----------- | ------------- | --------------------- |
+| opus    | 200,000     | 100,000       | High complexity tasks |
+| sonnet  | 200,000     | 100,000       | Medium complexity     |
+| haiku   | 200,000     | 100,000       | Low complexity        |
+| glm     | 128,000     | 64,000        | Self-hosted medium    |
+| minimax | 128,000     | 64,000        | Self-hosted low       |
+
+## Test Scenarios
+
+### 1. Oversized Issue (REJECTED) ✅
+
+**Scenario:** Issue with 120K token estimate assigned to sonnet (200K limit)
+
+**Expected:** Rejected (60% exceeds 50% threshold)
+
+**Result:** ✅ PASS
+
+```
+Issue context estimate (120000 tokens) exceeds 50% rule for sonnet agent.
+Maximum allowed: 100000 tokens (50% of 200000 context limit).
+```
+
+### 2. Properly Sized Issue (ACCEPTED) ✅
+
+**Scenario:** Issue with 80K token estimate assigned to sonnet
+
+**Expected:** Accepted (40% is below 50% threshold)
+
+**Result:** ✅ PASS - Issue accepted without warnings
+
+### 3. Edge Case - Exactly 50% (ACCEPTED) ✅
+
+**Scenario:** Issue with exactly 100K token estimate for sonnet
+
+**Expected:** Accepted (exactly at threshold, not exceeding)
+
+**Result:** ✅ PASS - Issue accepted at boundary condition
+
+### 4. Sequential Issues Without Exhaustion ✅
+
+**Scenario:** Three sequential 60K token issues for sonnet (30% each)
+
+**Expected:** All accepted individually (50% rule checks individual issues, not cumulative)
+
+**Result:** ✅ PASS - All three issues accepted
+
+**Note:** Cumulative context tracking will be handled by runtime monitoring (COORD-002), not assignment validation.
+
+## Implementation Details
+
+**Module:** `src/validation.py`
+**Function:** `validate_fifty_percent_rule(metadata: IssueMetadata) -> ValidationResult`
+
+**Test Coverage:** 100% (14/14 statements)
+**Test Count:** 12 comprehensive test cases
+
+## Edge Cases Validated
+
+1. ✅ Zero context estimate (accepted)
+2. ✅ Very small issues < 1% (accepted)
+3. ✅ Exactly at 50% threshold (accepted)
+4. ✅ Just over 50% threshold (rejected)
+5. ✅ All agent types (opus, sonnet, haiku, glm, minimax)
+6. ✅ Different context limits (200K vs 128K)
+
+## Effectiveness Analysis
+
+### Prevention Capability
+
+The 50% rule successfully prevents:
+
+- ❌ Single issues consuming > 50% of agent capacity
+- ❌ Context exhaustion from oversized assignments
+- ❌ Agent deadlock from insufficient working memory
+
+### What It Allows
+
+The rule permits:
+
+- ✅ Multiple medium-sized issues to be processed
+- ✅ Efficient use of agent capacity (up to 50% per issue)
+- ✅ Buffer space for conversation history and tool outputs
+- ✅ Clear, predictable validation at assignment time
+
+### Limitations
+
+The 50% rule does NOT prevent:
+
+- Cumulative context growth over multiple issues (requires runtime monitoring)
+- Context bloat from tool outputs or conversation (requires compaction)
+- Issues that grow beyond estimate during execution (requires monitoring)
+
+These are addressed by complementary systems:
+
+- **Runtime monitoring** (#155) - Tracks actual context usage
+- **Context compaction** - Triggered at 80% threshold
+- **Session rotation** - Triggered at 95% threshold
+
+## Validation Metrics
+
+| Metric            | Target | Actual | Status  |
+| ----------------- | ------ | ------ | ------- |
+| Test coverage     | ≥85%   | 100%   | ✅ PASS |
+| Test scenarios    | 4      | 12     | ✅ PASS |
+| Edge cases tested | -      | 6      | ✅ PASS |
+| Type safety       | Pass   | Pass   | ✅ PASS |
+| Linting           | Pass   | Pass   | ✅ PASS |
+
+## Recommendations
+
+1. ✅ **Implemented:** Agent-specific limits (200K vs 128K)
+2. ✅ **Implemented:** Clear rejection messages with context
+3. ✅ **Implemented:** Validation at assignment time
+4. 🔄 **Future:** Integrate with issue assignment workflow
+5. 🔄 **Future:** Add telemetry for validation rejection rates
+6. 🔄 **Future:** Consider dynamic threshold adjustment based on historical context growth
+
+## Conclusion
+
+The 50% rule validation is **EFFECTIVE** at preventing oversized issue assignments and context exhaustion. All test scenarios pass, edge cases are handled correctly, and the implementation achieves 100% test coverage.
+
+**Status:** ✅ Ready for integration into coordinator workflow