Files
stack/docs/scratchpads/orch-122-ai-review.md
Jason Woltje 12abdfe81d feat(#93): implement agent spawn via federation
Implements FED-010: Agent Spawn via Federation feature that enables
spawning and managing Claude agents on remote federated Mosaic Stack
instances via COMMAND message type.

Features:
- Federation agent command types (spawn, status, kill)
- FederationAgentService for handling agent operations
- Integration with orchestrator's agent spawner/lifecycle services
- API endpoints for spawning, querying status, and killing agents
- Full command routing through federation COMMAND infrastructure
- Comprehensive test coverage (12/12 tests passing)

Architecture:
- Hub → Spoke: Spawn agents on remote instances
- Command flow: FederationController → FederationAgentService →
  CommandService → Remote Orchestrator
- Response handling: Remote orchestrator returns agent status/results
- Security: Connection validation, signature verification

Files created:
- apps/api/src/federation/types/federation-agent.types.ts
- apps/api/src/federation/federation-agent.service.ts
- apps/api/src/federation/federation-agent.service.spec.ts

Files modified:
- apps/api/src/federation/command.service.ts (agent command routing)
- apps/api/src/federation/federation.controller.ts (agent endpoints)
- apps/api/src/federation/federation.module.ts (service registration)
- apps/orchestrator/src/api/agents/agents.controller.ts (status endpoint)
- apps/orchestrator/src/api/agents/agents.module.ts (lifecycle integration)

Testing:
- 12/12 tests passing for FederationAgentService
- All command service tests passing
- TypeScript compilation successful
- Linting passed

Refs #93

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-03 14:37:06 -06:00

341 lines
12 KiB
Markdown

# Issue ORCH-122: AI Agent Confirmation
## Objective
Implement independent AI agent reviews for quality confirmation. This is the coordinator-side implementation that spawns an independent AI reviewer agent and returns confidence scores.
## Analysis
### Current State
After analyzing the codebase, I found that:
1. **ORCH-114** (Quality Gate Callbacks) - ✅ COMPLETE
- Orchestrator has `QualityGatesService` that calls coordinator
- Pre-commit and post-commit checks implemented
- Properly handles coordinator responses
2. **ORCH-116** (50% Rule Enforcement) - ✅ COMPLETE
- Orchestrator properly handles AI review responses
- Tests cover all AI confirmation scenarios
- `hasAIConfirmation()` helper method added
- 36 comprehensive test cases including 9 for 50% rule
3. **ORCH-122** (AI Agent Confirmation) - **COORDINATOR-SIDE IMPLEMENTATION NEEDED**
- Technical notes state: "AI reviewer is INDEPENDENT of worker agent (no self-review)"
- Technical notes state: "Coordinator calls AI reviewer"
- This is a **coordinator** responsibility, not orchestrator
### Architecture Decision
Based on the issue description and technical notes:
```
┌─────────────┐ ┌──────────────┐ ┌──────────────┐
│ Orchestrator│ calls │ Coordinator │ spawns │ AI Reviewer │
│ ├────────>│ ├────────>│ Agent │
│ │ │ (Python) │ │ (Independent)│
└─────────────┘ └──────────────┘ └──────────────┘
│ runs mechanical gates
│ + AI review
v
QualityCheckResponse
{
approved: bool,
gate: string,
details: {
aiReview: {
confidence: float,
approved: bool,
findings: string[]
}
}
}
```
**Key Points**:
1. Orchestrator already handles AI review responses (ORCH-116 complete)
2. Coordinator needs to implement AI reviewer spawning
3. Coordinator is written in **Python** (FastAPI)
4. AI reviewer is an **independent Claude agent** (not self-review)
## Coordinator Implementation Status
### What Exists
The coordinator has:
- `apps/coordinator/src/quality_orchestrator.py` - Runs mechanical gates in parallel
- `apps/coordinator/src/gates/` - Build, lint, test, coverage gates
- Quality gate interface (GateResult model)
- FastAPI application with health endpoint
### What's Missing for ORCH-122
The coordinator **DOES NOT** currently have:
1. ❌ AI reviewer agent spawning logic
2. ❌ Independent AI agent integration
3.`aiReview` field in QualityCheckResponse
4.`/api/quality/check` endpoint (orchestrator expects this)
5. ❌ Confidence score calculation
6. ❌ 50% rule detection
## Implementation Requirements
Based on ORCH-122 acceptance criteria and related issues:
### Acceptance Criteria from M6-NEW-ISSUES-TEMPLATES.md
- [ ] Spawn independent AI reviewer agent
- [ ] Review code changes
- [ ] Check for: logic errors, security issues, best practices
- [ ] Return confidence score (0.0 - 1.0)
- [ ] Approve if confidence >= 0.9
### Technical Requirements
**Coordinator must implement:**
1. **Quality Check Endpoint** (`/api/quality/check`)
- Accepts: `QualityCheckRequest` (taskId, agentId, files, diffSummary)
- Returns: `QualityCheckResponse` (approved, gate, message, details)
2. **AI Reviewer Spawner**
- Spawn independent Claude agent
- Pass it the diff/files to review
- Parse AI agent's review findings
- Calculate confidence score
3. **50% Rule Detector**
- Estimate AI-generated code percentage
- Reject if > 50% AI-generated
- Include findings in response
4. **Response Builder**
- Combine mechanical gate results
- Add aiReview field with:
- confidence (0.0 - 1.0)
- approved (bool)
- aiGeneratedPercent (int)
- findings (list[str])
### Integration Flow
```python
# Coordinator endpoint handler
@app.post("/api/quality/check")
async def check_quality(request: QualityCheckRequest):
# 1. Run mechanical gates
mechanical_results = await quality_orchestrator.verify_completion()
if not mechanical_results.all_passed:
# Short-circuit: don't run AI review if mechanical fails
return QualityCheckResponse(
approved=False,
gate="pre-commit",
message="Mechanical gates failed",
details={...mechanical_results...}
)
# 2. Spawn independent AI reviewer
ai_reviewer = AIReviewerService()
ai_result = await ai_reviewer.review(
files=request.files,
diff=request.diffSummary
)
# 3. Check 50% rule
if ai_result.aiGeneratedPercent > 50:
return QualityCheckResponse(
approved=False,
gate="post-commit",
message="50% rule violated",
details={
"aiReview": {
"confidence": ai_result.confidence,
"approved": False,
"aiGeneratedPercent": ai_result.aiGeneratedPercent,
"findings": ["Detected >50% AI-generated code"]
}
}
)
# 4. Check AI confidence threshold
if ai_result.confidence < 0.9:
return QualityCheckResponse(
approved=False,
gate="post-commit",
message="AI review confidence below threshold",
details={"aiReview": {...}}
)
# 5. All gates passed
return QualityCheckResponse(
approved=True,
gate="post-commit",
message="All checks passed including AI review",
details={"aiReview": {...}}
)
```
## Orchestrator Integration - Already Complete
The orchestrator side is **ALREADY COMPLETE** thanks to ORCH-114 and ORCH-116:
### What Orchestrator Already Does
1. ✅ Calls `POST /api/quality/check` via CoordinatorClientService
2. ✅ Handles QualityCheckResponse with aiReview field
3. ✅ Blocks commit/push if rejected
4. ✅ Returns detailed failure reasons
5. ✅ Tests cover all AI confirmation scenarios
6. ✅ Helper method to check AI confirmation presence
### Proof: Existing Tests
From `quality-gates.service.spec.ts`:
- ✅ AI confirmation passes (confidence >= 0.9)
- ✅ AI confidence below threshold (< 0.9)
- ✅ 50% rule violated (>50% AI-generated)
- ✅ Mechanical pass but AI fails
- ✅ AI review with security findings
- ✅ Exactly 50% AI-generated
- ✅ AI review unavailable fallback
- ✅ Preserve all AI review metadata
All these tests pass because they mock the coordinator's response. The orchestrator is ready to consume the real AI review data.
## Conclusion
### ORCH-122 Status: Coordinator Implementation Needed
This issue requires implementation in the **coordinator** (apps/coordinator), not the orchestrator (apps/orchestrator).
**What needs to be done:**
1. Create `apps/coordinator/src/ai_reviewer.py`
- Spawn independent Claude agent
- Pass diff/files to agent
- Parse agent's review
- Return AIReviewResult
2. Create `apps/coordinator/src/api.py` (or update existing)
- Add `/api/quality/check` endpoint
- Call quality_orchestrator for mechanical gates
- Call ai_reviewer for AI confirmation
- Combine results into QualityCheckResponse
3. Update `apps/coordinator/src/models.py`
- Add QualityCheckRequest model
- Add QualityCheckResponse model
- Add AIReviewResult model
4. Write tests for AI reviewer
- Mock Claude API calls
- Test confidence calculation
- Test 50% rule detection
### Orchestrator Status: Complete ✅
The orchestrator is ready. It will work automatically once the coordinator implements the `/api/quality/check` endpoint with AI review support.
**No orchestrator changes needed for ORCH-122.**
## Next Steps
Since this is a coordinator implementation:
1. The coordinator is a separate FastAPI service
2. It needs Python development (not TypeScript)
3. It needs integration with Anthropic Claude API
4. It's outside the scope of orchestrator work
**Recommendation**: Create a new issue or update ORCH-122 to clearly indicate this is coordinator-side work, or mark this issue as blocked pending coordinator implementation.
## Related Issues
- ORCH-114: Quality gate callbacks (complete - orchestrator side) ✅
- ORCH-116: 50% rule enforcement (complete - orchestrator side) ✅
- ORCH-122: AI agent confirmation (pending - coordinator side) ⏳
- ORCH-121: Mechanical quality gates (coordinator implementation needed)
## Acceptance Criteria - Analysis
For the **orchestrator** side (apps/orchestrator):
- [x] Handle AI review responses from coordinator
- [x] Parse aiReview field in QualityCheckResponse
- [x] Block operations when AI review fails
- [x] Return detailed AI findings to caller
- [x] Test coverage for all AI scenarios
- [x] Helper method to check AI confirmation presence
For the **coordinator** side (apps/coordinator):
- [ ] Spawn independent AI reviewer agent
- [ ] Review code changes for logic errors, security, best practices
- [ ] Calculate confidence score (0.0 - 1.0)
- [ ] Approve if confidence >= 0.9
- [ ] Detect AI-generated code percentage
- [ ] Enforce 50% rule
- [ ] Return aiReview in QualityCheckResponse
- [ ] Implement `/api/quality/check` endpoint
## Files Analyzed
### Orchestrator (TypeScript/NestJS)
- `/home/localadmin/src/mosaic-stack/apps/orchestrator/src/coordinator/quality-gates.service.ts`
- `/home/localadmin/src/mosaic-stack/apps/orchestrator/src/coordinator/quality-gates.service.spec.ts`
- `/home/localadmin/src/mosaic-stack/apps/orchestrator/src/coordinator/coordinator-client.service.ts`
### Coordinator (Python/FastAPI)
- `/home/localadmin/src/mosaic-stack/apps/coordinator/src/main.py` ⏳ (no `/api/quality/check`)
- `/home/localadmin/src/mosaic-stack/apps/coordinator/src/quality_orchestrator.py` ⏳ (no AI review)
- `/home/localadmin/src/mosaic-stack/apps/coordinator/src/gates/` ⏳ (mechanical only)
## Notes
### Why This Makes Sense
The coordinator is responsible for quality checks because:
1. It's the control plane service
2. It orchestrates all quality gates (mechanical + AI)
3. It has access to the codebase and diff
4. It can spawn independent agents without conflict
5. The orchestrator just needs to call it and handle results
### Independent AI Agent
Key requirement: "AI reviewer is INDEPENDENT of worker agent (no self-review)"
This means:
- Worker agent makes code changes
- Coordinator spawns separate AI agent to review
- Reviewer agent has no context from worker agent
- Prevents self-review bias
- Ensures objective code review
### Confidence Threshold
- Confidence score: 0.0 (no confidence) to 1.0 (full confidence)
- Approval threshold: >= 0.9 (90% confidence)
- Below threshold = rejected
- Reasons for low confidence: unclear logic, security risks, poor practices
### 50% Rule Details
- AI-generated code should be <= 50% of PR
- Coordinator estimates percentage using heuristics
- Could use: comment analysis, pattern detection, AI meta-detection
- If > 50%: reject with clear message
- Encourages human review and understanding