Files

Jason Woltje a1b911d836 test(#143 ): Validate 50% rule prevents context exhaustion

Following TDD (Red-Green-Refactor):
- RED: Created comprehensive test suite with 12 test cases
- GREEN: Implemented validation logic that passes all tests
- All quality gates passed

Test Coverage:
- Oversized issue (120K) correctly rejected
- Properly sized issue (80K) correctly accepted
- Edge case at exactly 50% (100K) correctly accepted
- Sequential issues validated individually
- All agent types tested (opus, sonnet, haiku, glm, minimax)
- Edge cases covered (zero, very small, boundaries)

Implementation:
- src/validation.py: Pure validation function
- tests/test_fifty_percent_rule.py: 12 comprehensive tests
- docs/50-percent-rule-validation.md: Validation report
- 100% test coverage (14/14 statements)
- Type checking: PASS (mypy)
- Linting: PASS (ruff)

The 50% rule ensures no single issue exceeds 50% of target
agent's context limit, preventing context exhaustion while
allowing efficient capacity utilization.

Fixes #143

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

2026-02-01 17:56:04 -06:00

4.8 KiB

Raw Permalink Blame History

50% Rule Validation Report

Overview

This document validates the effectiveness of the 50% rule in preventing agent context exhaustion.

Date: 2026-02-01 Issue: #143 [COORD-003] Status: ✅ VALIDATED

The 50% Rule

Rule: No single issue assignment may exceed 50% of the target agent's context limit.

Rationale: This ensures:

Room for conversation history and tool use
Buffer before hitting hard context limits
Prevents single issues from monopolizing agent capacity
Allows multiple issues to be processed without exhaustion

Agent Context Limits

Agent	Total Limit	50% Threshold	Use Case
opus	200,000	100,000	High complexity tasks
sonnet	200,000	100,000	Medium complexity
haiku	200,000	100,000	Low complexity
glm	128,000	64,000	Self-hosted medium
minimax	128,000	64,000	Self-hosted low

Test Scenarios

1. Oversized Issue (REJECTED) ✅

Scenario: Issue with 120K token estimate assigned to sonnet (200K limit)

Expected: Rejected (60% exceeds 50% threshold)

Result: ✅ PASS

Issue context estimate (120000 tokens) exceeds 50% rule for sonnet agent.
Maximum allowed: 100000 tokens (50% of 200000 context limit).

2. Properly Sized Issue (ACCEPTED) ✅

Scenario: Issue with 80K token estimate assigned to sonnet

Expected: Accepted (40% is below 50% threshold)

Result: ✅ PASS - Issue accepted without warnings

3. Edge Case - Exactly 50% (ACCEPTED) ✅

Scenario: Issue with exactly 100K token estimate for sonnet

Expected: Accepted (exactly at threshold, not exceeding)

Result: ✅ PASS - Issue accepted at boundary condition

4. Sequential Issues Without Exhaustion ✅

Scenario: Three sequential 60K token issues for sonnet (30% each)

Expected: All accepted individually (50% rule checks individual issues, not cumulative)

Result: ✅ PASS - All three issues accepted

Note: Cumulative context tracking will be handled by runtime monitoring (COORD-002), not assignment validation.

Implementation Details

Module: src/validation.py Function: validate_fifty_percent_rule(metadata: IssueMetadata) -> ValidationResult

Test Coverage: 100% (14/14 statements) Test Count: 12 comprehensive test cases

Edge Cases Validated

✅ Zero context estimate (accepted)
✅ Very small issues < 1% (accepted)
✅ Exactly at 50% threshold (accepted)
✅ Just over 50% threshold (rejected)
✅ All agent types (opus, sonnet, haiku, glm, minimax)
✅ Different context limits (200K vs 128K)

Effectiveness Analysis

Prevention Capability

The 50% rule successfully prevents:

❌ Single issues consuming > 50% of agent capacity
❌ Context exhaustion from oversized assignments
❌ Agent deadlock from insufficient working memory

What It Allows

The rule permits:

✅ Multiple medium-sized issues to be processed
✅ Efficient use of agent capacity (up to 50% per issue)
✅ Buffer space for conversation history and tool outputs
✅ Clear, predictable validation at assignment time

Limitations

The 50% rule does NOT prevent:

Cumulative context growth over multiple issues (requires runtime monitoring)
Context bloat from tool outputs or conversation (requires compaction)
Issues that grow beyond estimate during execution (requires monitoring)

These are addressed by complementary systems:

Runtime monitoring (#155) - Tracks actual context usage
Context compaction - Triggered at 80% threshold
Session rotation - Triggered at 95% threshold

Validation Metrics

Metric	Target	Actual	Status
Test coverage	≥85%	100%	✅ PASS
Test scenarios	4	12	✅ PASS
Edge cases tested	-	6	✅ PASS
Type safety	Pass	Pass	✅ PASS
Linting	Pass	Pass	✅ PASS

Recommendations

✅ Implemented: Agent-specific limits (200K vs 128K)
✅ Implemented: Clear rejection messages with context
✅ Implemented: Validation at assignment time
🔄 Future: Integrate with issue assignment workflow
🔄 Future: Add telemetry for validation rejection rates
🔄 Future: Consider dynamic threshold adjustment based on historical context growth

Conclusion

The 50% rule validation is EFFECTIVE at preventing oversized issue assignments and context exhaustion. All test scenarios pass, edge cases are handled correctly, and the implementation achieves 100% test coverage.

Status: ✅ Ready for integration into coordinator workflow

4.8 KiB Raw Permalink Blame History