# Issue ORCH-122: AI Agent Confirmation ## Objective Implement independent AI agent reviews for quality confirmation. This is the coordinator-side implementation that spawns an independent AI reviewer agent and returns confidence scores. ## Analysis ### Current State After analyzing the codebase, I found that: 1. **ORCH-114** (Quality Gate Callbacks) - ✅ COMPLETE - Orchestrator has `QualityGatesService` that calls coordinator - Pre-commit and post-commit checks implemented - Properly handles coordinator responses 2. **ORCH-116** (50% Rule Enforcement) - ✅ COMPLETE - Orchestrator properly handles AI review responses - Tests cover all AI confirmation scenarios - `hasAIConfirmation()` helper method added - 36 comprehensive test cases including 9 for 50% rule 3. **ORCH-122** (AI Agent Confirmation) - **COORDINATOR-SIDE IMPLEMENTATION NEEDED** - Technical notes state: "AI reviewer is INDEPENDENT of worker agent (no self-review)" - Technical notes state: "Coordinator calls AI reviewer" - This is a **coordinator** responsibility, not orchestrator ### Architecture Decision Based on the issue description and technical notes: ``` ┌─────────────┐ ┌──────────────┐ ┌──────────────┐ │ Orchestrator│ calls │ Coordinator │ spawns │ AI Reviewer │ │ ├────────>│ ├────────>│ Agent │ │ │ │ (Python) │ │ (Independent)│ └─────────────┘ └──────────────┘ └──────────────┘ │ │ runs mechanical gates │ + AI review │ v QualityCheckResponse { approved: bool, gate: string, details: { aiReview: { confidence: float, approved: bool, findings: string[] } } } ``` **Key Points**: 1. Orchestrator already handles AI review responses (ORCH-116 complete) 2. Coordinator needs to implement AI reviewer spawning 3. Coordinator is written in **Python** (FastAPI) 4. AI reviewer is an **independent Claude agent** (not self-review) ## Coordinator Implementation Status ### What Exists The coordinator has: - `apps/coordinator/src/quality_orchestrator.py` - Runs mechanical gates in parallel - `apps/coordinator/src/gates/` - Build, lint, test, coverage gates - Quality gate interface (GateResult model) - FastAPI application with health endpoint ### What's Missing for ORCH-122 The coordinator **DOES NOT** currently have: 1. ❌ AI reviewer agent spawning logic 2. ❌ Independent AI agent integration 3. ❌ `aiReview` field in QualityCheckResponse 4. ❌ `/api/quality/check` endpoint (orchestrator expects this) 5. ❌ Confidence score calculation 6. ❌ 50% rule detection ## Implementation Requirements Based on ORCH-122 acceptance criteria and related issues: ### Acceptance Criteria from M6-NEW-ISSUES-TEMPLATES.md - [ ] Spawn independent AI reviewer agent - [ ] Review code changes - [ ] Check for: logic errors, security issues, best practices - [ ] Return confidence score (0.0 - 1.0) - [ ] Approve if confidence >= 0.9 ### Technical Requirements **Coordinator must implement:** 1. **Quality Check Endpoint** (`/api/quality/check`) - Accepts: `QualityCheckRequest` (taskId, agentId, files, diffSummary) - Returns: `QualityCheckResponse` (approved, gate, message, details) 2. **AI Reviewer Spawner** - Spawn independent Claude agent - Pass it the diff/files to review - Parse AI agent's review findings - Calculate confidence score 3. **50% Rule Detector** - Estimate AI-generated code percentage - Reject if > 50% AI-generated - Include findings in response 4. **Response Builder** - Combine mechanical gate results - Add aiReview field with: - confidence (0.0 - 1.0) - approved (bool) - aiGeneratedPercent (int) - findings (list[str]) ### Integration Flow ```python # Coordinator endpoint handler @app.post("/api/quality/check") async def check_quality(request: QualityCheckRequest): # 1. Run mechanical gates mechanical_results = await quality_orchestrator.verify_completion() if not mechanical_results.all_passed: # Short-circuit: don't run AI review if mechanical fails return QualityCheckResponse( approved=False, gate="pre-commit", message="Mechanical gates failed", details={...mechanical_results...} ) # 2. Spawn independent AI reviewer ai_reviewer = AIReviewerService() ai_result = await ai_reviewer.review( files=request.files, diff=request.diffSummary ) # 3. Check 50% rule if ai_result.aiGeneratedPercent > 50: return QualityCheckResponse( approved=False, gate="post-commit", message="50% rule violated", details={ "aiReview": { "confidence": ai_result.confidence, "approved": False, "aiGeneratedPercent": ai_result.aiGeneratedPercent, "findings": ["Detected >50% AI-generated code"] } } ) # 4. Check AI confidence threshold if ai_result.confidence < 0.9: return QualityCheckResponse( approved=False, gate="post-commit", message="AI review confidence below threshold", details={"aiReview": {...}} ) # 5. All gates passed return QualityCheckResponse( approved=True, gate="post-commit", message="All checks passed including AI review", details={"aiReview": {...}} ) ``` ## Orchestrator Integration - Already Complete The orchestrator side is **ALREADY COMPLETE** thanks to ORCH-114 and ORCH-116: ### What Orchestrator Already Does 1. ✅ Calls `POST /api/quality/check` via CoordinatorClientService 2. ✅ Handles QualityCheckResponse with aiReview field 3. ✅ Blocks commit/push if rejected 4. ✅ Returns detailed failure reasons 5. ✅ Tests cover all AI confirmation scenarios 6. ✅ Helper method to check AI confirmation presence ### Proof: Existing Tests From `quality-gates.service.spec.ts`: - ✅ AI confirmation passes (confidence >= 0.9) - ✅ AI confidence below threshold (< 0.9) - ✅ 50% rule violated (>50% AI-generated) - ✅ Mechanical pass but AI fails - ✅ AI review with security findings - ✅ Exactly 50% AI-generated - ✅ AI review unavailable fallback - ✅ Preserve all AI review metadata All these tests pass because they mock the coordinator's response. The orchestrator is ready to consume the real AI review data. ## Conclusion ### ORCH-122 Status: Coordinator Implementation Needed This issue requires implementation in the **coordinator** (apps/coordinator), not the orchestrator (apps/orchestrator). **What needs to be done:** 1. Create `apps/coordinator/src/ai_reviewer.py` - Spawn independent Claude agent - Pass diff/files to agent - Parse agent's review - Return AIReviewResult 2. Create `apps/coordinator/src/api.py` (or update existing) - Add `/api/quality/check` endpoint - Call quality_orchestrator for mechanical gates - Call ai_reviewer for AI confirmation - Combine results into QualityCheckResponse 3. Update `apps/coordinator/src/models.py` - Add QualityCheckRequest model - Add QualityCheckResponse model - Add AIReviewResult model 4. Write tests for AI reviewer - Mock Claude API calls - Test confidence calculation - Test 50% rule detection ### Orchestrator Status: Complete ✅ The orchestrator is ready. It will work automatically once the coordinator implements the `/api/quality/check` endpoint with AI review support. **No orchestrator changes needed for ORCH-122.** ## Next Steps Since this is a coordinator implementation: 1. The coordinator is a separate FastAPI service 2. It needs Python development (not TypeScript) 3. It needs integration with Anthropic Claude API 4. It's outside the scope of orchestrator work **Recommendation**: Create a new issue or update ORCH-122 to clearly indicate this is coordinator-side work, or mark this issue as blocked pending coordinator implementation. ## Related Issues - ORCH-114: Quality gate callbacks (complete - orchestrator side) ✅ - ORCH-116: 50% rule enforcement (complete - orchestrator side) ✅ - ORCH-122: AI agent confirmation (pending - coordinator side) ⏳ - ORCH-121: Mechanical quality gates (coordinator implementation needed) ## Acceptance Criteria - Analysis For the **orchestrator** side (apps/orchestrator): - [x] Handle AI review responses from coordinator - [x] Parse aiReview field in QualityCheckResponse - [x] Block operations when AI review fails - [x] Return detailed AI findings to caller - [x] Test coverage for all AI scenarios - [x] Helper method to check AI confirmation presence For the **coordinator** side (apps/coordinator): - [ ] Spawn independent AI reviewer agent - [ ] Review code changes for logic errors, security, best practices - [ ] Calculate confidence score (0.0 - 1.0) - [ ] Approve if confidence >= 0.9 - [ ] Detect AI-generated code percentage - [ ] Enforce 50% rule - [ ] Return aiReview in QualityCheckResponse - [ ] Implement `/api/quality/check` endpoint ## Files Analyzed ### Orchestrator (TypeScript/NestJS) - `/home/localadmin/src/mosaic-stack/apps/orchestrator/src/coordinator/quality-gates.service.ts` ✅ - `/home/localadmin/src/mosaic-stack/apps/orchestrator/src/coordinator/quality-gates.service.spec.ts` ✅ - `/home/localadmin/src/mosaic-stack/apps/orchestrator/src/coordinator/coordinator-client.service.ts` ✅ ### Coordinator (Python/FastAPI) - `/home/localadmin/src/mosaic-stack/apps/coordinator/src/main.py` ⏳ (no `/api/quality/check`) - `/home/localadmin/src/mosaic-stack/apps/coordinator/src/quality_orchestrator.py` ⏳ (no AI review) - `/home/localadmin/src/mosaic-stack/apps/coordinator/src/gates/` ⏳ (mechanical only) ## Notes ### Why This Makes Sense The coordinator is responsible for quality checks because: 1. It's the control plane service 2. It orchestrates all quality gates (mechanical + AI) 3. It has access to the codebase and diff 4. It can spawn independent agents without conflict 5. The orchestrator just needs to call it and handle results ### Independent AI Agent Key requirement: "AI reviewer is INDEPENDENT of worker agent (no self-review)" This means: - Worker agent makes code changes - Coordinator spawns separate AI agent to review - Reviewer agent has no context from worker agent - Prevents self-review bias - Ensures objective code review ### Confidence Threshold - Confidence score: 0.0 (no confidence) to 1.0 (full confidence) - Approval threshold: >= 0.9 (90% confidence) - Below threshold = rejected - Reasons for low confidence: unclear logic, security risks, poor practices ### 50% Rule Details - AI-generated code should be <= 50% of PR - Coordinator estimates percentage using heuristics - Could use: comment analysis, pattern detection, AI meta-detection - If > 50%: reject with clear message - Encourages human review and understanding