Integration Testing: Non-AI Coordinator E2E Validation #141

New Issue

jason.woltje · 2026-01-30T23:44:06Z

jason.woltje commented

2026-01-30 23:44:06 +00:00

Create end-to-end tests validating non-AI coordinator enforces quality gates.

Objective: Prove coordinator prevents premature completion and forces quality standards.

Test Scenarios:

Agent claims done with gate failures → Rejected, forced to continue
Agent claims done with all gates passing → Accepted
Agent exhausts token budget with gates failing → User notified
Agent loops rejections (3x) → Escalated to user
Custom gates execute correctly
Workspace-specific gate configs respected
Multi-agent coordination with shared gates

Test Cases:

Build gate: Intentional compilation errors, verify rejection
Lint gate: Violations above threshold, verify rejection
Test gate: Failing tests, verify rejection
Coverage gate: Below threshold, verify rejection
Premature done: Agent says done early, verify forced continuation
Token budget: Done with 50% budget unused + gates failing, verify rejection
Success path: All gates pass, verify acceptance

E2E Test Flow:

Create workspace with strict gate config
Start agent task (e.g., implement feature with tests)
Agent does partial work, claims done
Orchestrator runs gates → failures detected
Orchestrator rejects, injects continuation prompt
Agent continues, fixes issues
Agent claims done again
Orchestrator runs gates → all pass
Orchestrator accepts completion

Validation Metrics:

Rejection count per scenario
Time to completion vs. without orchestrator
Gate execution time
False positive rate (legitimate work rejected)
False negative rate (bad work accepted)

Related: L-015, #134 (orchestrator), #135-139 (components)

Acceptance Criteria:

All test scenarios pass
E2E flow validated end-to-end
Rejection/acceptance logic proven correct
No false positives/negatives in test runs
Performance acceptable (gate overhead <10%)
Integration with existing agent system working
Tests run in CI pipeline

Create end-to-end tests validating non-AI coordinator enforces quality gates. Objective: Prove coordinator prevents premature completion and forces quality standards. Test Scenarios: 1. Agent claims done with gate failures → Rejected, forced to continue 2. Agent claims done with all gates passing → Accepted 3. Agent exhausts token budget with gates failing → User notified 4. Agent loops rejections (3x) → Escalated to user 5. Custom gates execute correctly 6. Workspace-specific gate configs respected 7. Multi-agent coordination with shared gates Test Cases: - Build gate: Intentional compilation errors, verify rejection - Lint gate: Violations above threshold, verify rejection - Test gate: Failing tests, verify rejection - Coverage gate: Below threshold, verify rejection - Premature done: Agent says done early, verify forced continuation - Token budget: Done with 50% budget unused + gates failing, verify rejection - Success path: All gates pass, verify acceptance E2E Test Flow: 1. Create workspace with strict gate config 2. Start agent task (e.g., implement feature with tests) 3. Agent does partial work, claims done 4. Orchestrator runs gates → failures detected 5. Orchestrator rejects, injects continuation prompt 6. Agent continues, fixes issues 7. Agent claims done again 8. Orchestrator runs gates → all pass 9. Orchestrator accepts completion Validation Metrics: - Rejection count per scenario - Time to completion vs. without orchestrator - Gate execution time - False positive rate (legitimate work rejected) - False negative rate (bad work accepted) Related: L-015, #134 (orchestrator), #135-139 (components) Acceptance Criteria: - All test scenarios pass - E2E flow validated end-to-end - Rejection/acceptance logic proven correct - No false positives/negatives in test runs - Performance acceptable (gate overhead <10%) - Integration with existing agent system working - Tests run in CI pipeline

jason.woltje added the testing p0 labels 2026-01-30 23:44:06 +00:00

jason.woltje added this to the M4-LLM (0.0.4) milestone 2026-01-30 23:45:34 +00:00

jason.woltje closed this issue

2026-01-31 20:15:17 +00:00

jason.woltje referenced this issue from a commit

2026-01-31 20:55:08 +00:00

test(#141): add Non-AI Coordinator integration tests

Sign in to join this conversation.

Branches Tags

main

fix/ci-glibc-image

fix/dockerfile-npmrc

fix/matrix-native-binary

fix/kaniko-cache

fix/base-image-kaniko-v2

fix/base-image-kaniko

feat/custom-base-image

ci/pnpm-cache

fix/interceptor-tests

fix/kanban-tests

feat/wire-chat

feat/usage-widget

fix/security-hardening

fix/project-domain-v2

feat/kanban-add-task

fix/project-domain-attach

fix/logs-page-clean

fix/workspace-members

fix/ci-lint-632

fix/file-manager-tags

fix/csrf-debug-log

fix/controller-type-imports

fix/system-admin-env

fix/gateway-cors-trusted-origins

feat/project-detail-page

fix/fleet-provider-form-dto-v2

fix/ms22-audit

fix/orchestrator-widgets

fix/fleet-provider-form-dto

fix/csrf-bearer-bypass

fix/ms22-missing-authmodule-imports

fix/container-lifecycle-config-module

fix/swarm-compose-ms22-vars

chore/ms22-p1-complete

feat/ms22-p1h-settings-ui

feat/ms22-p1f-onboarding-ui

feat/ms22-p1i-chat-proxy

feat/ms22-p1k-idle-reaper

feat/ms22-p1j-docker

feat/ms22-p1e-onboarding-api

feat/ms22-p1g-settings-api

feat/ms22-p1d-container-mgr

feat/ms22-p1c-config-api

chore/ms22-prd-tracking

feat/ms22-p1a-schema

feat/ms22-p1b-crypto

chore/ms22-p1-tasks

docs/ms22-architecture

feat/ms22-openclaw-docker

feat/ms22-openclaw-gateway-module

chore/ms21-complete

chore/ms21-final-tasks-done

fix/ms21-ui-001-qa

test/ms21-ui-tests

chore/ms21-tasks-sync

chore/ms22-phase0-complete

feat/ms22-ingest-clean

feat/ms21-ui-users-members

feat/ms22-task-agent

chore/tasks-final

chore/tasks-update

feat/ms21-session-invalidation

feat/ms21-rbac-settings

feat/ms21-teams-page

feat/ms21-users-page

feat/ms19-terminal-persistence

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: mosaic/stack#141