Implements FED-010: Agent Spawn via Federation feature that enables spawning and managing Claude agents on remote federated Mosaic Stack instances via COMMAND message type. Features: - Federation agent command types (spawn, status, kill) - FederationAgentService for handling agent operations - Integration with orchestrator's agent spawner/lifecycle services - API endpoints for spawning, querying status, and killing agents - Full command routing through federation COMMAND infrastructure - Comprehensive test coverage (12/12 tests passing) Architecture: - Hub → Spoke: Spawn agents on remote instances - Command flow: FederationController → FederationAgentService → CommandService → Remote Orchestrator - Response handling: Remote orchestrator returns agent status/results - Security: Connection validation, signature verification Files created: - apps/api/src/federation/types/federation-agent.types.ts - apps/api/src/federation/federation-agent.service.ts - apps/api/src/federation/federation-agent.service.spec.ts Files modified: - apps/api/src/federation/command.service.ts (agent command routing) - apps/api/src/federation/federation.controller.ts (agent endpoints) - apps/api/src/federation/federation.module.ts (service registration) - apps/orchestrator/src/api/agents/agents.controller.ts (status endpoint) - apps/orchestrator/src/api/agents/agents.module.ts (lifecycle integration) Testing: - 12/12 tests passing for FederationAgentService - All command service tests passing - TypeScript compilation successful - Linting passed Refs #93 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
341 lines
12 KiB
Markdown
341 lines
12 KiB
Markdown
# Issue ORCH-122: AI Agent Confirmation
|
|
|
|
## Objective
|
|
|
|
Implement independent AI agent reviews for quality confirmation. This is the coordinator-side implementation that spawns an independent AI reviewer agent and returns confidence scores.
|
|
|
|
## Analysis
|
|
|
|
### Current State
|
|
|
|
After analyzing the codebase, I found that:
|
|
|
|
1. **ORCH-114** (Quality Gate Callbacks) - ✅ COMPLETE
|
|
- Orchestrator has `QualityGatesService` that calls coordinator
|
|
- Pre-commit and post-commit checks implemented
|
|
- Properly handles coordinator responses
|
|
|
|
2. **ORCH-116** (50% Rule Enforcement) - ✅ COMPLETE
|
|
- Orchestrator properly handles AI review responses
|
|
- Tests cover all AI confirmation scenarios
|
|
- `hasAIConfirmation()` helper method added
|
|
- 36 comprehensive test cases including 9 for 50% rule
|
|
|
|
3. **ORCH-122** (AI Agent Confirmation) - **COORDINATOR-SIDE IMPLEMENTATION NEEDED**
|
|
- Technical notes state: "AI reviewer is INDEPENDENT of worker agent (no self-review)"
|
|
- Technical notes state: "Coordinator calls AI reviewer"
|
|
- This is a **coordinator** responsibility, not orchestrator
|
|
|
|
### Architecture Decision
|
|
|
|
Based on the issue description and technical notes:
|
|
|
|
```
|
|
┌─────────────┐ ┌──────────────┐ ┌──────────────┐
|
|
│ Orchestrator│ calls │ Coordinator │ spawns │ AI Reviewer │
|
|
│ ├────────>│ ├────────>│ Agent │
|
|
│ │ │ (Python) │ │ (Independent)│
|
|
└─────────────┘ └──────────────┘ └──────────────┘
|
|
│
|
|
│ runs mechanical gates
|
|
│ + AI review
|
|
│
|
|
v
|
|
QualityCheckResponse
|
|
{
|
|
approved: bool,
|
|
gate: string,
|
|
details: {
|
|
aiReview: {
|
|
confidence: float,
|
|
approved: bool,
|
|
findings: string[]
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
**Key Points**:
|
|
|
|
1. Orchestrator already handles AI review responses (ORCH-116 complete)
|
|
2. Coordinator needs to implement AI reviewer spawning
|
|
3. Coordinator is written in **Python** (FastAPI)
|
|
4. AI reviewer is an **independent Claude agent** (not self-review)
|
|
|
|
## Coordinator Implementation Status
|
|
|
|
### What Exists
|
|
|
|
The coordinator has:
|
|
|
|
- `apps/coordinator/src/quality_orchestrator.py` - Runs mechanical gates in parallel
|
|
- `apps/coordinator/src/gates/` - Build, lint, test, coverage gates
|
|
- Quality gate interface (GateResult model)
|
|
- FastAPI application with health endpoint
|
|
|
|
### What's Missing for ORCH-122
|
|
|
|
The coordinator **DOES NOT** currently have:
|
|
|
|
1. ❌ AI reviewer agent spawning logic
|
|
2. ❌ Independent AI agent integration
|
|
3. ❌ `aiReview` field in QualityCheckResponse
|
|
4. ❌ `/api/quality/check` endpoint (orchestrator expects this)
|
|
5. ❌ Confidence score calculation
|
|
6. ❌ 50% rule detection
|
|
|
|
## Implementation Requirements
|
|
|
|
Based on ORCH-122 acceptance criteria and related issues:
|
|
|
|
### Acceptance Criteria from M6-NEW-ISSUES-TEMPLATES.md
|
|
|
|
- [ ] Spawn independent AI reviewer agent
|
|
- [ ] Review code changes
|
|
- [ ] Check for: logic errors, security issues, best practices
|
|
- [ ] Return confidence score (0.0 - 1.0)
|
|
- [ ] Approve if confidence >= 0.9
|
|
|
|
### Technical Requirements
|
|
|
|
**Coordinator must implement:**
|
|
|
|
1. **Quality Check Endpoint** (`/api/quality/check`)
|
|
- Accepts: `QualityCheckRequest` (taskId, agentId, files, diffSummary)
|
|
- Returns: `QualityCheckResponse` (approved, gate, message, details)
|
|
|
|
2. **AI Reviewer Spawner**
|
|
- Spawn independent Claude agent
|
|
- Pass it the diff/files to review
|
|
- Parse AI agent's review findings
|
|
- Calculate confidence score
|
|
|
|
3. **50% Rule Detector**
|
|
- Estimate AI-generated code percentage
|
|
- Reject if > 50% AI-generated
|
|
- Include findings in response
|
|
|
|
4. **Response Builder**
|
|
- Combine mechanical gate results
|
|
- Add aiReview field with:
|
|
- confidence (0.0 - 1.0)
|
|
- approved (bool)
|
|
- aiGeneratedPercent (int)
|
|
- findings (list[str])
|
|
|
|
### Integration Flow
|
|
|
|
```python
|
|
# Coordinator endpoint handler
|
|
@app.post("/api/quality/check")
|
|
async def check_quality(request: QualityCheckRequest):
|
|
# 1. Run mechanical gates
|
|
mechanical_results = await quality_orchestrator.verify_completion()
|
|
|
|
if not mechanical_results.all_passed:
|
|
# Short-circuit: don't run AI review if mechanical fails
|
|
return QualityCheckResponse(
|
|
approved=False,
|
|
gate="pre-commit",
|
|
message="Mechanical gates failed",
|
|
details={...mechanical_results...}
|
|
)
|
|
|
|
# 2. Spawn independent AI reviewer
|
|
ai_reviewer = AIReviewerService()
|
|
ai_result = await ai_reviewer.review(
|
|
files=request.files,
|
|
diff=request.diffSummary
|
|
)
|
|
|
|
# 3. Check 50% rule
|
|
if ai_result.aiGeneratedPercent > 50:
|
|
return QualityCheckResponse(
|
|
approved=False,
|
|
gate="post-commit",
|
|
message="50% rule violated",
|
|
details={
|
|
"aiReview": {
|
|
"confidence": ai_result.confidence,
|
|
"approved": False,
|
|
"aiGeneratedPercent": ai_result.aiGeneratedPercent,
|
|
"findings": ["Detected >50% AI-generated code"]
|
|
}
|
|
}
|
|
)
|
|
|
|
# 4. Check AI confidence threshold
|
|
if ai_result.confidence < 0.9:
|
|
return QualityCheckResponse(
|
|
approved=False,
|
|
gate="post-commit",
|
|
message="AI review confidence below threshold",
|
|
details={"aiReview": {...}}
|
|
)
|
|
|
|
# 5. All gates passed
|
|
return QualityCheckResponse(
|
|
approved=True,
|
|
gate="post-commit",
|
|
message="All checks passed including AI review",
|
|
details={"aiReview": {...}}
|
|
)
|
|
```
|
|
|
|
## Orchestrator Integration - Already Complete
|
|
|
|
The orchestrator side is **ALREADY COMPLETE** thanks to ORCH-114 and ORCH-116:
|
|
|
|
### What Orchestrator Already Does
|
|
|
|
1. ✅ Calls `POST /api/quality/check` via CoordinatorClientService
|
|
2. ✅ Handles QualityCheckResponse with aiReview field
|
|
3. ✅ Blocks commit/push if rejected
|
|
4. ✅ Returns detailed failure reasons
|
|
5. ✅ Tests cover all AI confirmation scenarios
|
|
6. ✅ Helper method to check AI confirmation presence
|
|
|
|
### Proof: Existing Tests
|
|
|
|
From `quality-gates.service.spec.ts`:
|
|
|
|
- ✅ AI confirmation passes (confidence >= 0.9)
|
|
- ✅ AI confidence below threshold (< 0.9)
|
|
- ✅ 50% rule violated (>50% AI-generated)
|
|
- ✅ Mechanical pass but AI fails
|
|
- ✅ AI review with security findings
|
|
- ✅ Exactly 50% AI-generated
|
|
- ✅ AI review unavailable fallback
|
|
- ✅ Preserve all AI review metadata
|
|
|
|
All these tests pass because they mock the coordinator's response. The orchestrator is ready to consume the real AI review data.
|
|
|
|
## Conclusion
|
|
|
|
### ORCH-122 Status: Coordinator Implementation Needed
|
|
|
|
This issue requires implementation in the **coordinator** (apps/coordinator), not the orchestrator (apps/orchestrator).
|
|
|
|
**What needs to be done:**
|
|
|
|
1. Create `apps/coordinator/src/ai_reviewer.py`
|
|
- Spawn independent Claude agent
|
|
- Pass diff/files to agent
|
|
- Parse agent's review
|
|
- Return AIReviewResult
|
|
|
|
2. Create `apps/coordinator/src/api.py` (or update existing)
|
|
- Add `/api/quality/check` endpoint
|
|
- Call quality_orchestrator for mechanical gates
|
|
- Call ai_reviewer for AI confirmation
|
|
- Combine results into QualityCheckResponse
|
|
|
|
3. Update `apps/coordinator/src/models.py`
|
|
- Add QualityCheckRequest model
|
|
- Add QualityCheckResponse model
|
|
- Add AIReviewResult model
|
|
|
|
4. Write tests for AI reviewer
|
|
- Mock Claude API calls
|
|
- Test confidence calculation
|
|
- Test 50% rule detection
|
|
|
|
### Orchestrator Status: Complete ✅
|
|
|
|
The orchestrator is ready. It will work automatically once the coordinator implements the `/api/quality/check` endpoint with AI review support.
|
|
|
|
**No orchestrator changes needed for ORCH-122.**
|
|
|
|
## Next Steps
|
|
|
|
Since this is a coordinator implementation:
|
|
|
|
1. The coordinator is a separate FastAPI service
|
|
2. It needs Python development (not TypeScript)
|
|
3. It needs integration with Anthropic Claude API
|
|
4. It's outside the scope of orchestrator work
|
|
|
|
**Recommendation**: Create a new issue or update ORCH-122 to clearly indicate this is coordinator-side work, or mark this issue as blocked pending coordinator implementation.
|
|
|
|
## Related Issues
|
|
|
|
- ORCH-114: Quality gate callbacks (complete - orchestrator side) ✅
|
|
- ORCH-116: 50% rule enforcement (complete - orchestrator side) ✅
|
|
- ORCH-122: AI agent confirmation (pending - coordinator side) ⏳
|
|
- ORCH-121: Mechanical quality gates (coordinator implementation needed)
|
|
|
|
## Acceptance Criteria - Analysis
|
|
|
|
For the **orchestrator** side (apps/orchestrator):
|
|
|
|
- [x] Handle AI review responses from coordinator
|
|
- [x] Parse aiReview field in QualityCheckResponse
|
|
- [x] Block operations when AI review fails
|
|
- [x] Return detailed AI findings to caller
|
|
- [x] Test coverage for all AI scenarios
|
|
- [x] Helper method to check AI confirmation presence
|
|
|
|
For the **coordinator** side (apps/coordinator):
|
|
|
|
- [ ] Spawn independent AI reviewer agent
|
|
- [ ] Review code changes for logic errors, security, best practices
|
|
- [ ] Calculate confidence score (0.0 - 1.0)
|
|
- [ ] Approve if confidence >= 0.9
|
|
- [ ] Detect AI-generated code percentage
|
|
- [ ] Enforce 50% rule
|
|
- [ ] Return aiReview in QualityCheckResponse
|
|
- [ ] Implement `/api/quality/check` endpoint
|
|
|
|
## Files Analyzed
|
|
|
|
### Orchestrator (TypeScript/NestJS)
|
|
|
|
- `/home/localadmin/src/mosaic-stack/apps/orchestrator/src/coordinator/quality-gates.service.ts` ✅
|
|
- `/home/localadmin/src/mosaic-stack/apps/orchestrator/src/coordinator/quality-gates.service.spec.ts` ✅
|
|
- `/home/localadmin/src/mosaic-stack/apps/orchestrator/src/coordinator/coordinator-client.service.ts` ✅
|
|
|
|
### Coordinator (Python/FastAPI)
|
|
|
|
- `/home/localadmin/src/mosaic-stack/apps/coordinator/src/main.py` ⏳ (no `/api/quality/check`)
|
|
- `/home/localadmin/src/mosaic-stack/apps/coordinator/src/quality_orchestrator.py` ⏳ (no AI review)
|
|
- `/home/localadmin/src/mosaic-stack/apps/coordinator/src/gates/` ⏳ (mechanical only)
|
|
|
|
## Notes
|
|
|
|
### Why This Makes Sense
|
|
|
|
The coordinator is responsible for quality checks because:
|
|
|
|
1. It's the control plane service
|
|
2. It orchestrates all quality gates (mechanical + AI)
|
|
3. It has access to the codebase and diff
|
|
4. It can spawn independent agents without conflict
|
|
5. The orchestrator just needs to call it and handle results
|
|
|
|
### Independent AI Agent
|
|
|
|
Key requirement: "AI reviewer is INDEPENDENT of worker agent (no self-review)"
|
|
|
|
This means:
|
|
|
|
- Worker agent makes code changes
|
|
- Coordinator spawns separate AI agent to review
|
|
- Reviewer agent has no context from worker agent
|
|
- Prevents self-review bias
|
|
- Ensures objective code review
|
|
|
|
### Confidence Threshold
|
|
|
|
- Confidence score: 0.0 (no confidence) to 1.0 (full confidence)
|
|
- Approval threshold: >= 0.9 (90% confidence)
|
|
- Below threshold = rejected
|
|
- Reasons for low confidence: unclear logic, security risks, poor practices
|
|
|
|
### 50% Rule Details
|
|
|
|
- AI-generated code should be <= 50% of PR
|
|
- Coordinator estimates percentage using heuristics
|
|
- Could use: comment analysis, pattern detection, AI meta-detection
|
|
- If > 50%: reject with clear message
|
|
- Encourages human review and understanding
|