Files
stack/docs/scratchpads/orch-122-ai-review.md
Jason Woltje 12abdfe81d feat(#93): implement agent spawn via federation
Implements FED-010: Agent Spawn via Federation feature that enables
spawning and managing Claude agents on remote federated Mosaic Stack
instances via COMMAND message type.

Features:
- Federation agent command types (spawn, status, kill)
- FederationAgentService for handling agent operations
- Integration with orchestrator's agent spawner/lifecycle services
- API endpoints for spawning, querying status, and killing agents
- Full command routing through federation COMMAND infrastructure
- Comprehensive test coverage (12/12 tests passing)

Architecture:
- Hub → Spoke: Spawn agents on remote instances
- Command flow: FederationController → FederationAgentService →
  CommandService → Remote Orchestrator
- Response handling: Remote orchestrator returns agent status/results
- Security: Connection validation, signature verification

Files created:
- apps/api/src/federation/types/federation-agent.types.ts
- apps/api/src/federation/federation-agent.service.ts
- apps/api/src/federation/federation-agent.service.spec.ts

Files modified:
- apps/api/src/federation/command.service.ts (agent command routing)
- apps/api/src/federation/federation.controller.ts (agent endpoints)
- apps/api/src/federation/federation.module.ts (service registration)
- apps/orchestrator/src/api/agents/agents.controller.ts (status endpoint)
- apps/orchestrator/src/api/agents/agents.module.ts (lifecycle integration)

Testing:
- 12/12 tests passing for FederationAgentService
- All command service tests passing
- TypeScript compilation successful
- Linting passed

Refs #93

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-03 14:37:06 -06:00

12 KiB

Issue ORCH-122: AI Agent Confirmation

Objective

Implement independent AI agent reviews for quality confirmation. This is the coordinator-side implementation that spawns an independent AI reviewer agent and returns confidence scores.

Analysis

Current State

After analyzing the codebase, I found that:

  1. ORCH-114 (Quality Gate Callbacks) - COMPLETE

    • Orchestrator has QualityGatesService that calls coordinator
    • Pre-commit and post-commit checks implemented
    • Properly handles coordinator responses
  2. ORCH-116 (50% Rule Enforcement) - COMPLETE

    • Orchestrator properly handles AI review responses
    • Tests cover all AI confirmation scenarios
    • hasAIConfirmation() helper method added
    • 36 comprehensive test cases including 9 for 50% rule
  3. ORCH-122 (AI Agent Confirmation) - COORDINATOR-SIDE IMPLEMENTATION NEEDED

    • Technical notes state: "AI reviewer is INDEPENDENT of worker agent (no self-review)"
    • Technical notes state: "Coordinator calls AI reviewer"
    • This is a coordinator responsibility, not orchestrator

Architecture Decision

Based on the issue description and technical notes:

┌─────────────┐         ┌──────────────┐         ┌──────────────┐
│ Orchestrator│  calls  │  Coordinator │  spawns │ AI Reviewer  │
│             ├────────>│              ├────────>│    Agent     │
│             │         │  (Python)    │         │ (Independent)│
└─────────────┘         └──────────────┘         └──────────────┘
                               │
                               │ runs mechanical gates
                               │ + AI review
                               │
                               v
                        QualityCheckResponse
                        {
                          approved: bool,
                          gate: string,
                          details: {
                            aiReview: {
                              confidence: float,
                              approved: bool,
                              findings: string[]
                            }
                          }
                        }

Key Points:

  1. Orchestrator already handles AI review responses (ORCH-116 complete)
  2. Coordinator needs to implement AI reviewer spawning
  3. Coordinator is written in Python (FastAPI)
  4. AI reviewer is an independent Claude agent (not self-review)

Coordinator Implementation Status

What Exists

The coordinator has:

  • apps/coordinator/src/quality_orchestrator.py - Runs mechanical gates in parallel
  • apps/coordinator/src/gates/ - Build, lint, test, coverage gates
  • Quality gate interface (GateResult model)
  • FastAPI application with health endpoint

What's Missing for ORCH-122

The coordinator DOES NOT currently have:

  1. AI reviewer agent spawning logic
  2. Independent AI agent integration
  3. aiReview field in QualityCheckResponse
  4. /api/quality/check endpoint (orchestrator expects this)
  5. Confidence score calculation
  6. 50% rule detection

Implementation Requirements

Based on ORCH-122 acceptance criteria and related issues:

Acceptance Criteria from M6-NEW-ISSUES-TEMPLATES.md

  • Spawn independent AI reviewer agent
  • Review code changes
  • Check for: logic errors, security issues, best practices
  • Return confidence score (0.0 - 1.0)
  • Approve if confidence >= 0.9

Technical Requirements

Coordinator must implement:

  1. Quality Check Endpoint (/api/quality/check)

    • Accepts: QualityCheckRequest (taskId, agentId, files, diffSummary)
    • Returns: QualityCheckResponse (approved, gate, message, details)
  2. AI Reviewer Spawner

    • Spawn independent Claude agent
    • Pass it the diff/files to review
    • Parse AI agent's review findings
    • Calculate confidence score
  3. 50% Rule Detector

    • Estimate AI-generated code percentage
    • Reject if > 50% AI-generated
    • Include findings in response
  4. Response Builder

    • Combine mechanical gate results
    • Add aiReview field with:
      • confidence (0.0 - 1.0)
      • approved (bool)
      • aiGeneratedPercent (int)
      • findings (list[str])

Integration Flow

# Coordinator endpoint handler
@app.post("/api/quality/check")
async def check_quality(request: QualityCheckRequest):
    # 1. Run mechanical gates
    mechanical_results = await quality_orchestrator.verify_completion()

    if not mechanical_results.all_passed:
        # Short-circuit: don't run AI review if mechanical fails
        return QualityCheckResponse(
            approved=False,
            gate="pre-commit",
            message="Mechanical gates failed",
            details={...mechanical_results...}
        )

    # 2. Spawn independent AI reviewer
    ai_reviewer = AIReviewerService()
    ai_result = await ai_reviewer.review(
        files=request.files,
        diff=request.diffSummary
    )

    # 3. Check 50% rule
    if ai_result.aiGeneratedPercent > 50:
        return QualityCheckResponse(
            approved=False,
            gate="post-commit",
            message="50% rule violated",
            details={
                "aiReview": {
                    "confidence": ai_result.confidence,
                    "approved": False,
                    "aiGeneratedPercent": ai_result.aiGeneratedPercent,
                    "findings": ["Detected >50% AI-generated code"]
                }
            }
        )

    # 4. Check AI confidence threshold
    if ai_result.confidence < 0.9:
        return QualityCheckResponse(
            approved=False,
            gate="post-commit",
            message="AI review confidence below threshold",
            details={"aiReview": {...}}
        )

    # 5. All gates passed
    return QualityCheckResponse(
        approved=True,
        gate="post-commit",
        message="All checks passed including AI review",
        details={"aiReview": {...}}
    )

Orchestrator Integration - Already Complete

The orchestrator side is ALREADY COMPLETE thanks to ORCH-114 and ORCH-116:

What Orchestrator Already Does

  1. Calls POST /api/quality/check via CoordinatorClientService
  2. Handles QualityCheckResponse with aiReview field
  3. Blocks commit/push if rejected
  4. Returns detailed failure reasons
  5. Tests cover all AI confirmation scenarios
  6. Helper method to check AI confirmation presence

Proof: Existing Tests

From quality-gates.service.spec.ts:

  • AI confirmation passes (confidence >= 0.9)
  • AI confidence below threshold (< 0.9)
  • 50% rule violated (>50% AI-generated)
  • Mechanical pass but AI fails
  • AI review with security findings
  • Exactly 50% AI-generated
  • AI review unavailable fallback
  • Preserve all AI review metadata

All these tests pass because they mock the coordinator's response. The orchestrator is ready to consume the real AI review data.

Conclusion

ORCH-122 Status: Coordinator Implementation Needed

This issue requires implementation in the coordinator (apps/coordinator), not the orchestrator (apps/orchestrator).

What needs to be done:

  1. Create apps/coordinator/src/ai_reviewer.py

    • Spawn independent Claude agent
    • Pass diff/files to agent
    • Parse agent's review
    • Return AIReviewResult
  2. Create apps/coordinator/src/api.py (or update existing)

    • Add /api/quality/check endpoint
    • Call quality_orchestrator for mechanical gates
    • Call ai_reviewer for AI confirmation
    • Combine results into QualityCheckResponse
  3. Update apps/coordinator/src/models.py

    • Add QualityCheckRequest model
    • Add QualityCheckResponse model
    • Add AIReviewResult model
  4. Write tests for AI reviewer

    • Mock Claude API calls
    • Test confidence calculation
    • Test 50% rule detection

Orchestrator Status: Complete

The orchestrator is ready. It will work automatically once the coordinator implements the /api/quality/check endpoint with AI review support.

No orchestrator changes needed for ORCH-122.

Next Steps

Since this is a coordinator implementation:

  1. The coordinator is a separate FastAPI service
  2. It needs Python development (not TypeScript)
  3. It needs integration with Anthropic Claude API
  4. It's outside the scope of orchestrator work

Recommendation: Create a new issue or update ORCH-122 to clearly indicate this is coordinator-side work, or mark this issue as blocked pending coordinator implementation.

  • ORCH-114: Quality gate callbacks (complete - orchestrator side)
  • ORCH-116: 50% rule enforcement (complete - orchestrator side)
  • ORCH-122: AI agent confirmation (pending - coordinator side)
  • ORCH-121: Mechanical quality gates (coordinator implementation needed)

Acceptance Criteria - Analysis

For the orchestrator side (apps/orchestrator):

  • Handle AI review responses from coordinator
  • Parse aiReview field in QualityCheckResponse
  • Block operations when AI review fails
  • Return detailed AI findings to caller
  • Test coverage for all AI scenarios
  • Helper method to check AI confirmation presence

For the coordinator side (apps/coordinator):

  • Spawn independent AI reviewer agent
  • Review code changes for logic errors, security, best practices
  • Calculate confidence score (0.0 - 1.0)
  • Approve if confidence >= 0.9
  • Detect AI-generated code percentage
  • Enforce 50% rule
  • Return aiReview in QualityCheckResponse
  • Implement /api/quality/check endpoint

Files Analyzed

Orchestrator (TypeScript/NestJS)

  • /home/localadmin/src/mosaic-stack/apps/orchestrator/src/coordinator/quality-gates.service.ts
  • /home/localadmin/src/mosaic-stack/apps/orchestrator/src/coordinator/quality-gates.service.spec.ts
  • /home/localadmin/src/mosaic-stack/apps/orchestrator/src/coordinator/coordinator-client.service.ts

Coordinator (Python/FastAPI)

  • /home/localadmin/src/mosaic-stack/apps/coordinator/src/main.py (no /api/quality/check)
  • /home/localadmin/src/mosaic-stack/apps/coordinator/src/quality_orchestrator.py (no AI review)
  • /home/localadmin/src/mosaic-stack/apps/coordinator/src/gates/ (mechanical only)

Notes

Why This Makes Sense

The coordinator is responsible for quality checks because:

  1. It's the control plane service
  2. It orchestrates all quality gates (mechanical + AI)
  3. It has access to the codebase and diff
  4. It can spawn independent agents without conflict
  5. The orchestrator just needs to call it and handle results

Independent AI Agent

Key requirement: "AI reviewer is INDEPENDENT of worker agent (no self-review)"

This means:

  • Worker agent makes code changes
  • Coordinator spawns separate AI agent to review
  • Reviewer agent has no context from worker agent
  • Prevents self-review bias
  • Ensures objective code review

Confidence Threshold

  • Confidence score: 0.0 (no confidence) to 1.0 (full confidence)
  • Approval threshold: >= 0.9 (90% confidence)
  • Below threshold = rejected
  • Reasons for low confidence: unclear logic, security risks, poor practices

50% Rule Details

  • AI-generated code should be <= 50% of PR
  • Coordinator estimates percentage using heuristics
  • Could use: comment analysis, pattern detection, AI meta-detection
  • If > 50%: reject with clear message
  • Encourages human review and understanding