feat(#93): implement agent spawn via federation
Implements FED-010: Agent Spawn via Federation feature that enables spawning and managing Claude agents on remote federated Mosaic Stack instances via COMMAND message type. Features: - Federation agent command types (spawn, status, kill) - FederationAgentService for handling agent operations - Integration with orchestrator's agent spawner/lifecycle services - API endpoints for spawning, querying status, and killing agents - Full command routing through federation COMMAND infrastructure - Comprehensive test coverage (12/12 tests passing) Architecture: - Hub → Spoke: Spawn agents on remote instances - Command flow: FederationController → FederationAgentService → CommandService → Remote Orchestrator - Response handling: Remote orchestrator returns agent status/results - Security: Connection validation, signature verification Files created: - apps/api/src/federation/types/federation-agent.types.ts - apps/api/src/federation/federation-agent.service.ts - apps/api/src/federation/federation-agent.service.spec.ts Files modified: - apps/api/src/federation/command.service.ts (agent command routing) - apps/api/src/federation/federation.controller.ts (agent endpoints) - apps/api/src/federation/federation.module.ts (service registration) - apps/orchestrator/src/api/agents/agents.controller.ts (status endpoint) - apps/orchestrator/src/api/agents/agents.module.ts (lifecycle integration) Testing: - 12/12 tests passing for FederationAgentService - All command service tests passing - TypeScript compilation successful - Linting passed Refs #93 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
340
docs/scratchpads/orch-122-ai-review.md
Normal file
340
docs/scratchpads/orch-122-ai-review.md
Normal file
@@ -0,0 +1,340 @@
|
||||
# Issue ORCH-122: AI Agent Confirmation
|
||||
|
||||
## Objective
|
||||
|
||||
Implement independent AI agent reviews for quality confirmation. This is the coordinator-side implementation that spawns an independent AI reviewer agent and returns confidence scores.
|
||||
|
||||
## Analysis
|
||||
|
||||
### Current State
|
||||
|
||||
After analyzing the codebase, I found that:
|
||||
|
||||
1. **ORCH-114** (Quality Gate Callbacks) - ✅ COMPLETE
|
||||
- Orchestrator has `QualityGatesService` that calls coordinator
|
||||
- Pre-commit and post-commit checks implemented
|
||||
- Properly handles coordinator responses
|
||||
|
||||
2. **ORCH-116** (50% Rule Enforcement) - ✅ COMPLETE
|
||||
- Orchestrator properly handles AI review responses
|
||||
- Tests cover all AI confirmation scenarios
|
||||
- `hasAIConfirmation()` helper method added
|
||||
- 36 comprehensive test cases including 9 for 50% rule
|
||||
|
||||
3. **ORCH-122** (AI Agent Confirmation) - **COORDINATOR-SIDE IMPLEMENTATION NEEDED**
|
||||
- Technical notes state: "AI reviewer is INDEPENDENT of worker agent (no self-review)"
|
||||
- Technical notes state: "Coordinator calls AI reviewer"
|
||||
- This is a **coordinator** responsibility, not orchestrator
|
||||
|
||||
### Architecture Decision
|
||||
|
||||
Based on the issue description and technical notes:
|
||||
|
||||
```
|
||||
┌─────────────┐ ┌──────────────┐ ┌──────────────┐
|
||||
│ Orchestrator│ calls │ Coordinator │ spawns │ AI Reviewer │
|
||||
│ ├────────>│ ├────────>│ Agent │
|
||||
│ │ │ (Python) │ │ (Independent)│
|
||||
└─────────────┘ └──────────────┘ └──────────────┘
|
||||
│
|
||||
│ runs mechanical gates
|
||||
│ + AI review
|
||||
│
|
||||
v
|
||||
QualityCheckResponse
|
||||
{
|
||||
approved: bool,
|
||||
gate: string,
|
||||
details: {
|
||||
aiReview: {
|
||||
confidence: float,
|
||||
approved: bool,
|
||||
findings: string[]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Key Points**:
|
||||
|
||||
1. Orchestrator already handles AI review responses (ORCH-116 complete)
|
||||
2. Coordinator needs to implement AI reviewer spawning
|
||||
3. Coordinator is written in **Python** (FastAPI)
|
||||
4. AI reviewer is an **independent Claude agent** (not self-review)
|
||||
|
||||
## Coordinator Implementation Status
|
||||
|
||||
### What Exists
|
||||
|
||||
The coordinator has:
|
||||
|
||||
- `apps/coordinator/src/quality_orchestrator.py` - Runs mechanical gates in parallel
|
||||
- `apps/coordinator/src/gates/` - Build, lint, test, coverage gates
|
||||
- Quality gate interface (GateResult model)
|
||||
- FastAPI application with health endpoint
|
||||
|
||||
### What's Missing for ORCH-122
|
||||
|
||||
The coordinator **DOES NOT** currently have:
|
||||
|
||||
1. ❌ AI reviewer agent spawning logic
|
||||
2. ❌ Independent AI agent integration
|
||||
3. ❌ `aiReview` field in QualityCheckResponse
|
||||
4. ❌ `/api/quality/check` endpoint (orchestrator expects this)
|
||||
5. ❌ Confidence score calculation
|
||||
6. ❌ 50% rule detection
|
||||
|
||||
## Implementation Requirements
|
||||
|
||||
Based on ORCH-122 acceptance criteria and related issues:
|
||||
|
||||
### Acceptance Criteria from M6-NEW-ISSUES-TEMPLATES.md
|
||||
|
||||
- [ ] Spawn independent AI reviewer agent
|
||||
- [ ] Review code changes
|
||||
- [ ] Check for: logic errors, security issues, best practices
|
||||
- [ ] Return confidence score (0.0 - 1.0)
|
||||
- [ ] Approve if confidence >= 0.9
|
||||
|
||||
### Technical Requirements
|
||||
|
||||
**Coordinator must implement:**
|
||||
|
||||
1. **Quality Check Endpoint** (`/api/quality/check`)
|
||||
- Accepts: `QualityCheckRequest` (taskId, agentId, files, diffSummary)
|
||||
- Returns: `QualityCheckResponse` (approved, gate, message, details)
|
||||
|
||||
2. **AI Reviewer Spawner**
|
||||
- Spawn independent Claude agent
|
||||
- Pass it the diff/files to review
|
||||
- Parse AI agent's review findings
|
||||
- Calculate confidence score
|
||||
|
||||
3. **50% Rule Detector**
|
||||
- Estimate AI-generated code percentage
|
||||
- Reject if > 50% AI-generated
|
||||
- Include findings in response
|
||||
|
||||
4. **Response Builder**
|
||||
- Combine mechanical gate results
|
||||
- Add aiReview field with:
|
||||
- confidence (0.0 - 1.0)
|
||||
- approved (bool)
|
||||
- aiGeneratedPercent (int)
|
||||
- findings (list[str])
|
||||
|
||||
### Integration Flow
|
||||
|
||||
```python
|
||||
# Coordinator endpoint handler
|
||||
@app.post("/api/quality/check")
|
||||
async def check_quality(request: QualityCheckRequest):
|
||||
# 1. Run mechanical gates
|
||||
mechanical_results = await quality_orchestrator.verify_completion()
|
||||
|
||||
if not mechanical_results.all_passed:
|
||||
# Short-circuit: don't run AI review if mechanical fails
|
||||
return QualityCheckResponse(
|
||||
approved=False,
|
||||
gate="pre-commit",
|
||||
message="Mechanical gates failed",
|
||||
details={...mechanical_results...}
|
||||
)
|
||||
|
||||
# 2. Spawn independent AI reviewer
|
||||
ai_reviewer = AIReviewerService()
|
||||
ai_result = await ai_reviewer.review(
|
||||
files=request.files,
|
||||
diff=request.diffSummary
|
||||
)
|
||||
|
||||
# 3. Check 50% rule
|
||||
if ai_result.aiGeneratedPercent > 50:
|
||||
return QualityCheckResponse(
|
||||
approved=False,
|
||||
gate="post-commit",
|
||||
message="50% rule violated",
|
||||
details={
|
||||
"aiReview": {
|
||||
"confidence": ai_result.confidence,
|
||||
"approved": False,
|
||||
"aiGeneratedPercent": ai_result.aiGeneratedPercent,
|
||||
"findings": ["Detected >50% AI-generated code"]
|
||||
}
|
||||
}
|
||||
)
|
||||
|
||||
# 4. Check AI confidence threshold
|
||||
if ai_result.confidence < 0.9:
|
||||
return QualityCheckResponse(
|
||||
approved=False,
|
||||
gate="post-commit",
|
||||
message="AI review confidence below threshold",
|
||||
details={"aiReview": {...}}
|
||||
)
|
||||
|
||||
# 5. All gates passed
|
||||
return QualityCheckResponse(
|
||||
approved=True,
|
||||
gate="post-commit",
|
||||
message="All checks passed including AI review",
|
||||
details={"aiReview": {...}}
|
||||
)
|
||||
```
|
||||
|
||||
## Orchestrator Integration - Already Complete
|
||||
|
||||
The orchestrator side is **ALREADY COMPLETE** thanks to ORCH-114 and ORCH-116:
|
||||
|
||||
### What Orchestrator Already Does
|
||||
|
||||
1. ✅ Calls `POST /api/quality/check` via CoordinatorClientService
|
||||
2. ✅ Handles QualityCheckResponse with aiReview field
|
||||
3. ✅ Blocks commit/push if rejected
|
||||
4. ✅ Returns detailed failure reasons
|
||||
5. ✅ Tests cover all AI confirmation scenarios
|
||||
6. ✅ Helper method to check AI confirmation presence
|
||||
|
||||
### Proof: Existing Tests
|
||||
|
||||
From `quality-gates.service.spec.ts`:
|
||||
|
||||
- ✅ AI confirmation passes (confidence >= 0.9)
|
||||
- ✅ AI confidence below threshold (< 0.9)
|
||||
- ✅ 50% rule violated (>50% AI-generated)
|
||||
- ✅ Mechanical pass but AI fails
|
||||
- ✅ AI review with security findings
|
||||
- ✅ Exactly 50% AI-generated
|
||||
- ✅ AI review unavailable fallback
|
||||
- ✅ Preserve all AI review metadata
|
||||
|
||||
All these tests pass because they mock the coordinator's response. The orchestrator is ready to consume the real AI review data.
|
||||
|
||||
## Conclusion
|
||||
|
||||
### ORCH-122 Status: Coordinator Implementation Needed
|
||||
|
||||
This issue requires implementation in the **coordinator** (apps/coordinator), not the orchestrator (apps/orchestrator).
|
||||
|
||||
**What needs to be done:**
|
||||
|
||||
1. Create `apps/coordinator/src/ai_reviewer.py`
|
||||
- Spawn independent Claude agent
|
||||
- Pass diff/files to agent
|
||||
- Parse agent's review
|
||||
- Return AIReviewResult
|
||||
|
||||
2. Create `apps/coordinator/src/api.py` (or update existing)
|
||||
- Add `/api/quality/check` endpoint
|
||||
- Call quality_orchestrator for mechanical gates
|
||||
- Call ai_reviewer for AI confirmation
|
||||
- Combine results into QualityCheckResponse
|
||||
|
||||
3. Update `apps/coordinator/src/models.py`
|
||||
- Add QualityCheckRequest model
|
||||
- Add QualityCheckResponse model
|
||||
- Add AIReviewResult model
|
||||
|
||||
4. Write tests for AI reviewer
|
||||
- Mock Claude API calls
|
||||
- Test confidence calculation
|
||||
- Test 50% rule detection
|
||||
|
||||
### Orchestrator Status: Complete ✅
|
||||
|
||||
The orchestrator is ready. It will work automatically once the coordinator implements the `/api/quality/check` endpoint with AI review support.
|
||||
|
||||
**No orchestrator changes needed for ORCH-122.**
|
||||
|
||||
## Next Steps
|
||||
|
||||
Since this is a coordinator implementation:
|
||||
|
||||
1. The coordinator is a separate FastAPI service
|
||||
2. It needs Python development (not TypeScript)
|
||||
3. It needs integration with Anthropic Claude API
|
||||
4. It's outside the scope of orchestrator work
|
||||
|
||||
**Recommendation**: Create a new issue or update ORCH-122 to clearly indicate this is coordinator-side work, or mark this issue as blocked pending coordinator implementation.
|
||||
|
||||
## Related Issues
|
||||
|
||||
- ORCH-114: Quality gate callbacks (complete - orchestrator side) ✅
|
||||
- ORCH-116: 50% rule enforcement (complete - orchestrator side) ✅
|
||||
- ORCH-122: AI agent confirmation (pending - coordinator side) ⏳
|
||||
- ORCH-121: Mechanical quality gates (coordinator implementation needed)
|
||||
|
||||
## Acceptance Criteria - Analysis
|
||||
|
||||
For the **orchestrator** side (apps/orchestrator):
|
||||
|
||||
- [x] Handle AI review responses from coordinator
|
||||
- [x] Parse aiReview field in QualityCheckResponse
|
||||
- [x] Block operations when AI review fails
|
||||
- [x] Return detailed AI findings to caller
|
||||
- [x] Test coverage for all AI scenarios
|
||||
- [x] Helper method to check AI confirmation presence
|
||||
|
||||
For the **coordinator** side (apps/coordinator):
|
||||
|
||||
- [ ] Spawn independent AI reviewer agent
|
||||
- [ ] Review code changes for logic errors, security, best practices
|
||||
- [ ] Calculate confidence score (0.0 - 1.0)
|
||||
- [ ] Approve if confidence >= 0.9
|
||||
- [ ] Detect AI-generated code percentage
|
||||
- [ ] Enforce 50% rule
|
||||
- [ ] Return aiReview in QualityCheckResponse
|
||||
- [ ] Implement `/api/quality/check` endpoint
|
||||
|
||||
## Files Analyzed
|
||||
|
||||
### Orchestrator (TypeScript/NestJS)
|
||||
|
||||
- `/home/localadmin/src/mosaic-stack/apps/orchestrator/src/coordinator/quality-gates.service.ts` ✅
|
||||
- `/home/localadmin/src/mosaic-stack/apps/orchestrator/src/coordinator/quality-gates.service.spec.ts` ✅
|
||||
- `/home/localadmin/src/mosaic-stack/apps/orchestrator/src/coordinator/coordinator-client.service.ts` ✅
|
||||
|
||||
### Coordinator (Python/FastAPI)
|
||||
|
||||
- `/home/localadmin/src/mosaic-stack/apps/coordinator/src/main.py` ⏳ (no `/api/quality/check`)
|
||||
- `/home/localadmin/src/mosaic-stack/apps/coordinator/src/quality_orchestrator.py` ⏳ (no AI review)
|
||||
- `/home/localadmin/src/mosaic-stack/apps/coordinator/src/gates/` ⏳ (mechanical only)
|
||||
|
||||
## Notes
|
||||
|
||||
### Why This Makes Sense
|
||||
|
||||
The coordinator is responsible for quality checks because:
|
||||
|
||||
1. It's the control plane service
|
||||
2. It orchestrates all quality gates (mechanical + AI)
|
||||
3. It has access to the codebase and diff
|
||||
4. It can spawn independent agents without conflict
|
||||
5. The orchestrator just needs to call it and handle results
|
||||
|
||||
### Independent AI Agent
|
||||
|
||||
Key requirement: "AI reviewer is INDEPENDENT of worker agent (no self-review)"
|
||||
|
||||
This means:
|
||||
|
||||
- Worker agent makes code changes
|
||||
- Coordinator spawns separate AI agent to review
|
||||
- Reviewer agent has no context from worker agent
|
||||
- Prevents self-review bias
|
||||
- Ensures objective code review
|
||||
|
||||
### Confidence Threshold
|
||||
|
||||
- Confidence score: 0.0 (no confidence) to 1.0 (full confidence)
|
||||
- Approval threshold: >= 0.9 (90% confidence)
|
||||
- Below threshold = rejected
|
||||
- Reasons for low confidence: unclear logic, security risks, poor practices
|
||||
|
||||
### 50% Rule Details
|
||||
|
||||
- AI-generated code should be <= 50% of PR
|
||||
- Coordinator estimates percentage using heuristics
|
||||
- Could use: comment analysis, pattern detection, AI meta-detection
|
||||
- If > 50%: reject with clear message
|
||||
- Encourages human review and understanding
|
||||
Reference in New Issue
Block a user