stack/docs/scratchpads/orch-122-ai-review.md

# Issue ORCH-122: AI Agent Confirmation

## Objective

Implement independent AI agent reviews for quality confirmation. This is the coordinator-side implementation that spawns an independent AI reviewer agent and returns confidence scores.

## Analysis

### Current State

After analyzing the codebase, I found that:

1. **ORCH-114** (Quality Gate Callbacks) - ✅ COMPLETE
   - Orchestrator has `QualityGatesService` that calls coordinator
   - Pre-commit and post-commit checks implemented
   - Properly handles coordinator responses

2. **ORCH-116** (50% Rule Enforcement) - ✅ COMPLETE
   - Orchestrator properly handles AI review responses
   - Tests cover all AI confirmation scenarios
   - `hasAIConfirmation()` helper method added
   - 36 comprehensive test cases including 9 for 50% rule

3. **ORCH-122** (AI Agent Confirmation) - **COORDINATOR-SIDE IMPLEMENTATION NEEDED**
   - Technical notes state: "AI reviewer is INDEPENDENT of worker agent (no self-review)"
   - Technical notes state: "Coordinator calls AI reviewer"
   - This is a **coordinator** responsibility, not orchestrator

### Architecture Decision

Based on the issue description and technical notes:

```
┌─────────────┐         ┌──────────────┐         ┌──────────────┐
│ Orchestrator│  calls  │  Coordinator │  spawns │ AI Reviewer  │
│             ├────────>│              ├────────>│    Agent     │
│             │         │  (Python)    │         │ (Independent)│
└─────────────┘         └──────────────┘         └──────────────┘
                               │
                               │ runs mechanical gates
                               │ + AI review
                               │
                               v
                        QualityCheckResponse
                        {
                          approved: bool,
                          gate: string,
                          details: {
                            aiReview: {
                              confidence: float,
                              approved: bool,
                              findings: string[]
                            }
                          }
                        }
```

**Key Points**:

1. Orchestrator already handles AI review responses (ORCH-116 complete)
2. Coordinator needs to implement AI reviewer spawning
3. Coordinator is written in **Python** (FastAPI)
4. AI reviewer is an **independent Claude agent** (not self-review)

## Coordinator Implementation Status

### What Exists

The coordinator has:

- `apps/coordinator/src/quality_orchestrator.py` - Runs mechanical gates in parallel
- `apps/coordinator/src/gates/` - Build, lint, test, coverage gates
- Quality gate interface (GateResult model)
- FastAPI application with health endpoint

### What's Missing for ORCH-122

The coordinator **DOES NOT** currently have:

1. ❌ AI reviewer agent spawning logic
2. ❌ Independent AI agent integration
3. ❌ `aiReview` field in QualityCheckResponse
4. ❌ `/api/quality/check` endpoint (orchestrator expects this)
5. ❌ Confidence score calculation
6. ❌ 50% rule detection

## Implementation Requirements

Based on ORCH-122 acceptance criteria and related issues:

### Acceptance Criteria from M6-NEW-ISSUES-TEMPLATES.md

- [ ] Spawn independent AI reviewer agent
- [ ] Review code changes
- [ ] Check for: logic errors, security issues, best practices
- [ ] Return confidence score (0.0 - 1.0)
- [ ] Approve if confidence >= 0.9

### Technical Requirements

**Coordinator must implement:**

1. **Quality Check Endpoint** (`/api/quality/check`)
   - Accepts: `QualityCheckRequest` (taskId, agentId, files, diffSummary)
   - Returns: `QualityCheckResponse` (approved, gate, message, details)

2. **AI Reviewer Spawner**
   - Spawn independent Claude agent
   - Pass it the diff/files to review
   - Parse AI agent's review findings
   - Calculate confidence score

3. **50% Rule Detector**
   - Estimate AI-generated code percentage
   - Reject if > 50% AI-generated
   - Include findings in response

4. **Response Builder**
   - Combine mechanical gate results
   - Add aiReview field with:
     - confidence (0.0 - 1.0)
     - approved (bool)
     - aiGeneratedPercent (int)
     - findings (list[str])

### Integration Flow

```python
# Coordinator endpoint handler
@app.post("/api/quality/check")
async def check_quality(request: QualityCheckRequest):
    # 1. Run mechanical gates
    mechanical_results = await quality_orchestrator.verify_completion()

    if not mechanical_results.all_passed:
        # Short-circuit: don't run AI review if mechanical fails
        return QualityCheckResponse(
            approved=False,
            gate="pre-commit",
            message="Mechanical gates failed",
            details={...mechanical_results...}
        )

    # 2. Spawn independent AI reviewer
    ai_reviewer = AIReviewerService()
    ai_result = await ai_reviewer.review(
        files=request.files,
        diff=request.diffSummary
    )

    # 3. Check 50% rule
    if ai_result.aiGeneratedPercent > 50:
        return QualityCheckResponse(
            approved=False,
            gate="post-commit",
            message="50% rule violated",
            details={
                "aiReview": {
                    "confidence": ai_result.confidence,
                    "approved": False,
                    "aiGeneratedPercent": ai_result.aiGeneratedPercent,
                    "findings": ["Detected >50% AI-generated code"]
                }
            }
        )

    # 4. Check AI confidence threshold
    if ai_result.confidence < 0.9:
        return QualityCheckResponse(
            approved=False,
            gate="post-commit",
            message="AI review confidence below threshold",
            details={"aiReview": {...}}
        )

    # 5. All gates passed
    return QualityCheckResponse(
        approved=True,
        gate="post-commit",
        message="All checks passed including AI review",
        details={"aiReview": {...}}
    )
```

## Orchestrator Integration - Already Complete

The orchestrator side is **ALREADY COMPLETE** thanks to ORCH-114 and ORCH-116:

### What Orchestrator Already Does

1. ✅ Calls `POST /api/quality/check` via CoordinatorClientService
2. ✅ Handles QualityCheckResponse with aiReview field
3. ✅ Blocks commit/push if rejected
4. ✅ Returns detailed failure reasons
5. ✅ Tests cover all AI confirmation scenarios
6. ✅ Helper method to check AI confirmation presence

### Proof: Existing Tests

From `quality-gates.service.spec.ts`:

- ✅ AI confirmation passes (confidence >= 0.9)
- ✅ AI confidence below threshold (< 0.9)
- ✅ 50% rule violated (>50% AI-generated)
- ✅ Mechanical pass but AI fails
- ✅ AI review with security findings
- ✅ Exactly 50% AI-generated
- ✅ AI review unavailable fallback
- ✅ Preserve all AI review metadata

All these tests pass because they mock the coordinator's response. The orchestrator is ready to consume the real AI review data.

## Conclusion

### ORCH-122 Status: Coordinator Implementation Needed

This issue requires implementation in the **coordinator** (apps/coordinator), not the orchestrator (apps/orchestrator).

**What needs to be done:**

1. Create `apps/coordinator/src/ai_reviewer.py`
   - Spawn independent Claude agent
   - Pass diff/files to agent
   - Parse agent's review
   - Return AIReviewResult

2. Create `apps/coordinator/src/api.py` (or update existing)
   - Add `/api/quality/check` endpoint
   - Call quality_orchestrator for mechanical gates
   - Call ai_reviewer for AI confirmation
   - Combine results into QualityCheckResponse

3. Update `apps/coordinator/src/models.py`
   - Add QualityCheckRequest model
   - Add QualityCheckResponse model
   - Add AIReviewResult model

4. Write tests for AI reviewer
   - Mock Claude API calls
   - Test confidence calculation
   - Test 50% rule detection

### Orchestrator Status: Complete ✅

The orchestrator is ready. It will work automatically once the coordinator implements the `/api/quality/check` endpoint with AI review support.

**No orchestrator changes needed for ORCH-122.**

## Next Steps

Since this is a coordinator implementation:

1. The coordinator is a separate FastAPI service
2. It needs Python development (not TypeScript)
3. It needs integration with Anthropic Claude API
4. It's outside the scope of orchestrator work

**Recommendation**: Create a new issue or update ORCH-122 to clearly indicate this is coordinator-side work, or mark this issue as blocked pending coordinator implementation.

## Related Issues

- ORCH-114: Quality gate callbacks (complete - orchestrator side) ✅
- ORCH-116: 50% rule enforcement (complete - orchestrator side) ✅
- ORCH-122: AI agent confirmation (pending - coordinator side) ⏳
- ORCH-121: Mechanical quality gates (coordinator implementation needed)

## Acceptance Criteria - Analysis

For the **orchestrator** side (apps/orchestrator):

- [x] Handle AI review responses from coordinator
- [x] Parse aiReview field in QualityCheckResponse
- [x] Block operations when AI review fails
- [x] Return detailed AI findings to caller
- [x] Test coverage for all AI scenarios
- [x] Helper method to check AI confirmation presence

For the **coordinator** side (apps/coordinator):

- [ ] Spawn independent AI reviewer agent
- [ ] Review code changes for logic errors, security, best practices
- [ ] Calculate confidence score (0.0 - 1.0)
- [ ] Approve if confidence >= 0.9
- [ ] Detect AI-generated code percentage
- [ ] Enforce 50% rule
- [ ] Return aiReview in QualityCheckResponse
- [ ] Implement `/api/quality/check` endpoint

## Files Analyzed

### Orchestrator (TypeScript/NestJS)

- `/home/localadmin/src/mosaic-stack/apps/orchestrator/src/coordinator/quality-gates.service.ts` ✅
- `/home/localadmin/src/mosaic-stack/apps/orchestrator/src/coordinator/quality-gates.service.spec.ts` ✅
- `/home/localadmin/src/mosaic-stack/apps/orchestrator/src/coordinator/coordinator-client.service.ts` ✅

### Coordinator (Python/FastAPI)

- `/home/localadmin/src/mosaic-stack/apps/coordinator/src/main.py` ⏳ (no `/api/quality/check`)
- `/home/localadmin/src/mosaic-stack/apps/coordinator/src/quality_orchestrator.py` ⏳ (no AI review)
- `/home/localadmin/src/mosaic-stack/apps/coordinator/src/gates/` ⏳ (mechanical only)

## Notes

### Why This Makes Sense

The coordinator is responsible for quality checks because:

1. It's the control plane service
2. It orchestrates all quality gates (mechanical + AI)
3. It has access to the codebase and diff
4. It can spawn independent agents without conflict
5. The orchestrator just needs to call it and handle results

### Independent AI Agent

Key requirement: "AI reviewer is INDEPENDENT of worker agent (no self-review)"

This means:

- Worker agent makes code changes
- Coordinator spawns separate AI agent to review
- Reviewer agent has no context from worker agent
- Prevents self-review bias
- Ensures objective code review

### Confidence Threshold

- Confidence score: 0.0 (no confidence) to 1.0 (full confidence)
- Approval threshold: >= 0.9 (90% confidence)
- Below threshold = rejected
- Reasons for low confidence: unclear logic, security risks, poor practices

### 50% Rule Details

- AI-generated code should be <= 50% of PR
- Coordinator estimates percentage using heuristics
- Could use: comment analysis, pattern detection, AI meta-detection
- If > 50%: reject with clear message
- Encourages human review and understanding