Integration Testing: Non-AI Coordinator E2E Validation #141

Closed
opened 2026-01-30 23:44:06 +00:00 by jason.woltje · 0 comments
Owner

Create end-to-end tests validating non-AI coordinator enforces quality gates.

Objective: Prove coordinator prevents premature completion and forces quality standards.

Test Scenarios:

  1. Agent claims done with gate failures → Rejected, forced to continue
  2. Agent claims done with all gates passing → Accepted
  3. Agent exhausts token budget with gates failing → User notified
  4. Agent loops rejections (3x) → Escalated to user
  5. Custom gates execute correctly
  6. Workspace-specific gate configs respected
  7. Multi-agent coordination with shared gates

Test Cases:

  • Build gate: Intentional compilation errors, verify rejection
  • Lint gate: Violations above threshold, verify rejection
  • Test gate: Failing tests, verify rejection
  • Coverage gate: Below threshold, verify rejection
  • Premature done: Agent says done early, verify forced continuation
  • Token budget: Done with 50% budget unused + gates failing, verify rejection
  • Success path: All gates pass, verify acceptance

E2E Test Flow:

  1. Create workspace with strict gate config
  2. Start agent task (e.g., implement feature with tests)
  3. Agent does partial work, claims done
  4. Orchestrator runs gates → failures detected
  5. Orchestrator rejects, injects continuation prompt
  6. Agent continues, fixes issues
  7. Agent claims done again
  8. Orchestrator runs gates → all pass
  9. Orchestrator accepts completion

Validation Metrics:

  • Rejection count per scenario
  • Time to completion vs. without orchestrator
  • Gate execution time
  • False positive rate (legitimate work rejected)
  • False negative rate (bad work accepted)

Related: L-015, #134 (orchestrator), #135-139 (components)

Acceptance Criteria:

  • All test scenarios pass
  • E2E flow validated end-to-end
  • Rejection/acceptance logic proven correct
  • No false positives/negatives in test runs
  • Performance acceptable (gate overhead <10%)
  • Integration with existing agent system working
  • Tests run in CI pipeline
Create end-to-end tests validating non-AI coordinator enforces quality gates. Objective: Prove coordinator prevents premature completion and forces quality standards. Test Scenarios: 1. Agent claims done with gate failures → Rejected, forced to continue 2. Agent claims done with all gates passing → Accepted 3. Agent exhausts token budget with gates failing → User notified 4. Agent loops rejections (3x) → Escalated to user 5. Custom gates execute correctly 6. Workspace-specific gate configs respected 7. Multi-agent coordination with shared gates Test Cases: - Build gate: Intentional compilation errors, verify rejection - Lint gate: Violations above threshold, verify rejection - Test gate: Failing tests, verify rejection - Coverage gate: Below threshold, verify rejection - Premature done: Agent says done early, verify forced continuation - Token budget: Done with 50% budget unused + gates failing, verify rejection - Success path: All gates pass, verify acceptance E2E Test Flow: 1. Create workspace with strict gate config 2. Start agent task (e.g., implement feature with tests) 3. Agent does partial work, claims done 4. Orchestrator runs gates → failures detected 5. Orchestrator rejects, injects continuation prompt 6. Agent continues, fixes issues 7. Agent claims done again 8. Orchestrator runs gates → all pass 9. Orchestrator accepts completion Validation Metrics: - Rejection count per scenario - Time to completion vs. without orchestrator - Gate execution time - False positive rate (legitimate work rejected) - False negative rate (bad work accepted) Related: L-015, #134 (orchestrator), #135-139 (components) Acceptance Criteria: - All test scenarios pass - E2E flow validated end-to-end - Rejection/acceptance logic proven correct - No false positives/negatives in test runs - Performance acceptable (gate overhead <10%) - Integration with existing agent system working - Tests run in CI pipeline
jason.woltje added the testingp0 labels 2026-01-30 23:44:06 +00:00
jason.woltje added this to the M4-LLM (0.0.4) milestone 2026-01-30 23:45:34 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: mosaic/stack#141