Files

Jason Woltje 12abdfe81d feat(#93 ): implement agent spawn via federation

Implements FED-010: Agent Spawn via Federation feature that enables
spawning and managing Claude agents on remote federated Mosaic Stack
instances via COMMAND message type.

Features:
- Federation agent command types (spawn, status, kill)
- FederationAgentService for handling agent operations
- Integration with orchestrator's agent spawner/lifecycle services
- API endpoints for spawning, querying status, and killing agents
- Full command routing through federation COMMAND infrastructure
- Comprehensive test coverage (12/12 tests passing)

Architecture:
- Hub → Spoke: Spawn agents on remote instances
- Command flow: FederationController → FederationAgentService →
  CommandService → Remote Orchestrator
- Response handling: Remote orchestrator returns agent status/results
- Security: Connection validation, signature verification

Files created:
- apps/api/src/federation/types/federation-agent.types.ts
- apps/api/src/federation/federation-agent.service.ts
- apps/api/src/federation/federation-agent.service.spec.ts

Files modified:
- apps/api/src/federation/command.service.ts (agent command routing)
- apps/api/src/federation/federation.controller.ts (agent endpoints)
- apps/api/src/federation/federation.module.ts (service registration)
- apps/orchestrator/src/api/agents/agents.controller.ts (status endpoint)
- apps/orchestrator/src/api/agents/agents.module.ts (lifecycle integration)

Testing:
- 12/12 tests passing for FederationAgentService
- All command service tests passing
- TypeScript compilation successful
- Linting passed

Refs #93

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

2026-02-03 14:37:06 -06:00

12 KiB

Raw Blame History

Issue ORCH-122: AI Agent Confirmation

Objective

Implement independent AI agent reviews for quality confirmation. This is the coordinator-side implementation that spawns an independent AI reviewer agent and returns confidence scores.

Analysis

Current State

After analyzing the codebase, I found that:

ORCH-114 (Quality Gate Callbacks) - ✅ COMPLETE
- Orchestrator has QualityGatesService that calls coordinator
- Pre-commit and post-commit checks implemented
- Properly handles coordinator responses
ORCH-116 (50% Rule Enforcement) - ✅ COMPLETE
- Orchestrator properly handles AI review responses
- Tests cover all AI confirmation scenarios
- hasAIConfirmation() helper method added
- 36 comprehensive test cases including 9 for 50% rule
ORCH-122 (AI Agent Confirmation) - COORDINATOR-SIDE IMPLEMENTATION NEEDED
- Technical notes state: "AI reviewer is INDEPENDENT of worker agent (no self-review)"
- Technical notes state: "Coordinator calls AI reviewer"
- This is a coordinator responsibility, not orchestrator

Architecture Decision

Based on the issue description and technical notes:

┌─────────────┐         ┌──────────────┐         ┌──────────────┐
│ Orchestrator│  calls  │  Coordinator │  spawns │ AI Reviewer  │
│             ├────────>│              ├────────>│    Agent     │
│             │         │  (Python)    │         │ (Independent)│
└─────────────┘         └──────────────┘         └──────────────┘
                               │
                               │ runs mechanical gates
                               │ + AI review
                               │
                               v
                        QualityCheckResponse
                        {
                          approved: bool,
                          gate: string,
                          details: {
                            aiReview: {
                              confidence: float,
                              approved: bool,
                              findings: string[]
                            }
                          }
                        }

Key Points:

Orchestrator already handles AI review responses (ORCH-116 complete)
Coordinator needs to implement AI reviewer spawning
Coordinator is written in Python (FastAPI)
AI reviewer is an independent Claude agent (not self-review)

Coordinator Implementation Status

What Exists

The coordinator has:

apps/coordinator/src/quality_orchestrator.py - Runs mechanical gates in parallel
apps/coordinator/src/gates/ - Build, lint, test, coverage gates
Quality gate interface (GateResult model)
FastAPI application with health endpoint

What's Missing for ORCH-122

The coordinator DOES NOT currently have:

❌ AI reviewer agent spawning logic
❌ Independent AI agent integration
❌ aiReview field in QualityCheckResponse
❌ /api/quality/check endpoint (orchestrator expects this)
❌ Confidence score calculation
❌ 50% rule detection

Implementation Requirements

Based on ORCH-122 acceptance criteria and related issues:

Acceptance Criteria from M6-NEW-ISSUES-TEMPLATES.md

Spawn independent AI reviewer agent
Review code changes
Check for: logic errors, security issues, best practices
Return confidence score (0.0 - 1.0)
Approve if confidence >= 0.9

Technical Requirements

Coordinator must implement:

Quality Check Endpoint (/api/quality/check)
- Accepts: QualityCheckRequest (taskId, agentId, files, diffSummary)
- Returns: QualityCheckResponse (approved, gate, message, details)
AI Reviewer Spawner
- Spawn independent Claude agent
- Pass it the diff/files to review
- Parse AI agent's review findings
- Calculate confidence score
50% Rule Detector
- Estimate AI-generated code percentage
- Reject if > 50% AI-generated
- Include findings in response
Response Builder
- Combine mechanical gate results
- Add aiReview field with:
  - confidence (0.0 - 1.0)
  - approved (bool)
  - aiGeneratedPercent (int)
  - findings (list[str])

Integration Flow

# Coordinator endpoint handler
@app.post("/api/quality/check")
async def check_quality(request: QualityCheckRequest):
    # 1. Run mechanical gates
    mechanical_results = await quality_orchestrator.verify_completion()

    if not mechanical_results.all_passed:
        # Short-circuit: don't run AI review if mechanical fails
        return QualityCheckResponse(
            approved=False,
            gate="pre-commit",
            message="Mechanical gates failed",
            details={...mechanical_results...}
        )

    # 2. Spawn independent AI reviewer
    ai_reviewer = AIReviewerService()
    ai_result = await ai_reviewer.review(
        files=request.files,
        diff=request.diffSummary
    )

    # 3. Check 50% rule
    if ai_result.aiGeneratedPercent > 50:
        return QualityCheckResponse(
            approved=False,
            gate="post-commit",
            message="50% rule violated",
            details={
                "aiReview": {
                    "confidence": ai_result.confidence,
                    "approved": False,
                    "aiGeneratedPercent": ai_result.aiGeneratedPercent,
                    "findings": ["Detected >50% AI-generated code"]
                }
            }
        )

    # 4. Check AI confidence threshold
    if ai_result.confidence < 0.9:
        return QualityCheckResponse(
            approved=False,
            gate="post-commit",
            message="AI review confidence below threshold",
            details={"aiReview": {...}}
        )

    # 5. All gates passed
    return QualityCheckResponse(
        approved=True,
        gate="post-commit",
        message="All checks passed including AI review",
        details={"aiReview": {...}}
    )

Orchestrator Integration - Already Complete

The orchestrator side is ALREADY COMPLETE thanks to ORCH-114 and ORCH-116:

What Orchestrator Already Does

✅ Calls POST /api/quality/check via CoordinatorClientService
✅ Handles QualityCheckResponse with aiReview field
✅ Blocks commit/push if rejected
✅ Returns detailed failure reasons
✅ Tests cover all AI confirmation scenarios
✅ Helper method to check AI confirmation presence

Proof: Existing Tests

From quality-gates.service.spec.ts:

✅ AI confirmation passes (confidence >= 0.9)
✅ AI confidence below threshold (< 0.9)
✅ 50% rule violated (>50% AI-generated)
✅ Mechanical pass but AI fails
✅ AI review with security findings
✅ Exactly 50% AI-generated
✅ AI review unavailable fallback
✅ Preserve all AI review metadata

All these tests pass because they mock the coordinator's response. The orchestrator is ready to consume the real AI review data.

Conclusion

ORCH-122 Status: Coordinator Implementation Needed

This issue requires implementation in the coordinator (apps/coordinator), not the orchestrator (apps/orchestrator).

What needs to be done:

Create apps/coordinator/src/ai_reviewer.py
- Spawn independent Claude agent
- Pass diff/files to agent
- Parse agent's review
- Return AIReviewResult
Create apps/coordinator/src/api.py (or update existing)
- Add /api/quality/check endpoint
- Call quality_orchestrator for mechanical gates
- Call ai_reviewer for AI confirmation
- Combine results into QualityCheckResponse
Update apps/coordinator/src/models.py
- Add QualityCheckRequest model
- Add QualityCheckResponse model
- Add AIReviewResult model
Write tests for AI reviewer
- Mock Claude API calls
- Test confidence calculation
- Test 50% rule detection

Orchestrator Status: Complete ✅

The orchestrator is ready. It will work automatically once the coordinator implements the /api/quality/check endpoint with AI review support.

No orchestrator changes needed for ORCH-122.

Next Steps

Since this is a coordinator implementation:

The coordinator is a separate FastAPI service
It needs Python development (not TypeScript)
It needs integration with Anthropic Claude API
It's outside the scope of orchestrator work

Recommendation: Create a new issue or update ORCH-122 to clearly indicate this is coordinator-side work, or mark this issue as blocked pending coordinator implementation.

ORCH-114: Quality gate callbacks (complete - orchestrator side) ✅
ORCH-116: 50% rule enforcement (complete - orchestrator side) ✅
ORCH-122: AI agent confirmation (pending - coordinator side) ⏳
ORCH-121: Mechanical quality gates (coordinator implementation needed)

Acceptance Criteria - Analysis

For the orchestrator side (apps/orchestrator):

Handle AI review responses from coordinator
Parse aiReview field in QualityCheckResponse
Block operations when AI review fails
Return detailed AI findings to caller
Test coverage for all AI scenarios
Helper method to check AI confirmation presence

For the coordinator side (apps/coordinator):

Spawn independent AI reviewer agent
Review code changes for logic errors, security, best practices
Calculate confidence score (0.0 - 1.0)
Approve if confidence >= 0.9
Detect AI-generated code percentage
Enforce 50% rule
Return aiReview in QualityCheckResponse
Implement /api/quality/check endpoint

Files Analyzed

Orchestrator (TypeScript/NestJS)

/home/localadmin/src/mosaic-stack/apps/orchestrator/src/coordinator/quality-gates.service.ts ✅
/home/localadmin/src/mosaic-stack/apps/orchestrator/src/coordinator/quality-gates.service.spec.ts ✅
/home/localadmin/src/mosaic-stack/apps/orchestrator/src/coordinator/coordinator-client.service.ts ✅

Coordinator (Python/FastAPI)

/home/localadmin/src/mosaic-stack/apps/coordinator/src/main.py ⏳ (no /api/quality/check)
/home/localadmin/src/mosaic-stack/apps/coordinator/src/quality_orchestrator.py ⏳ (no AI review)
/home/localadmin/src/mosaic-stack/apps/coordinator/src/gates/ ⏳ (mechanical only)

Notes

Why This Makes Sense

The coordinator is responsible for quality checks because:

It's the control plane service
It orchestrates all quality gates (mechanical + AI)
It has access to the codebase and diff
It can spawn independent agents without conflict
The orchestrator just needs to call it and handle results

Independent AI Agent

Key requirement: "AI reviewer is INDEPENDENT of worker agent (no self-review)"

This means:

Worker agent makes code changes
Coordinator spawns separate AI agent to review
Reviewer agent has no context from worker agent
Prevents self-review bias
Ensures objective code review

Confidence Threshold

Confidence score: 0.0 (no confidence) to 1.0 (full confidence)
Approval threshold: >= 0.9 (90% confidence)
Below threshold = rejected
Reasons for low confidence: unclear logic, security risks, poor practices

50% Rule Details

AI-generated code should be <= 50% of PR
Coordinator estimates percentage using heuristics
Could use: comment analysis, pattern detection, AI meta-detection
If > 50%: reject with clear message
Encourages human review and understanding

12 KiB Raw Blame History