Files

Jason Woltje a69904a47b docs(#344 ): Add CI verification to orchestrator guide

- Document CI configuration requirements
- Add CI verification step to execution loop
- Document auto-diagnosis categories and patterns
- Add CLI integration examples
- Add service integration code examples

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

2026-02-07 11:22:58 -06:00

25 KiB

Raw Permalink Blame History

Mosaic Stack Orchestrator Guide

Platform-specific orchestrator protocol for Mosaic Stack.

Overview

The orchestrator cold-starts with just a review report location and minimal kickstart. It autonomously:

Parses review reports to extract findings
Categorizes findings into phases by severity
Estimates token usage per task
Creates Gitea issues (phase-level)
Bootstraps docs/tasks.md from scratch
Coordinates completion using worker agents

Key principle: The orchestrator is the sole writer of docs/tasks.md. Worker agents execute tasks and report results — they never modify the tracking file.

Orchestrator Boundaries (CRITICAL)

The orchestrator NEVER:

Edits source code directly (_.ts, _.tsx, *.js, etc.)
Runs quality gates itself (that's the worker's job)
Makes commits containing code changes
"Quickly fixes" something to save time — this is how drift starts

The orchestrator ONLY:

Reads/writes docs/tasks.md
Reads/writes docs/orchestrator-learnings.json
Spawns workers via the Task tool for ALL code changes
Parses worker JSON results
Commits task tracking updates (tasks.md, learnings)
Outputs status reports and handoff messages

If you find yourself about to edit source code, STOP. Spawn a worker instead. No exceptions. No "quick fixes."

Worker Limits:

Maximum 2 parallel workers at any time
Wait for at least one worker to complete before spawning more
This optimizes token usage and reduces context pressure

Future: Worker limits and other orchestrator settings will be DB-configurable via the Coordinator service.

Bootstrap Templates

Use templates from docs/templates/ (relative to repo root):

# Set environment variables
export PROJECT="mosaic-stack"
export MILESTONE="M6-Feature"
export CURRENT_DATETIME=$(date -Iseconds)
export TASK_PREFIX="MS-SEC"
export PHASE_ISSUE="#337"
export PHASE_BRANCH="fix/security"

# Create tasks.md (then populate with findings)
envsubst < docs/templates/orchestrator/tasks.md.template > docs/tasks.md

# Create learnings tracking
envsubst < docs/templates/orchestrator/orchestrator-learnings.json.template > docs/orchestrator-learnings.json

# Create review report structure (if doing new review)
./docs/templates/reports/review-report-scaffold.sh codebase-review mosaic-stack

Available templates:

Template	Purpose
`orchestrator/tasks.md.template`	Task tracking table with schema
`orchestrator/orchestrator-learnings.json.template`	Variance tracking
`orchestrator/phase-issue-body.md.template`	Gitea issue body
`orchestrator/compaction-summary.md.template`	60% checkpoint format
`reports/review-report-scaffold.sh`	Creates report directory
`scratchpad.md.template`	Per-task working document

See docs/templates/README.md for full documentation.

CLI Tools

Git and CI operations use @mosaic/cli-tools package:

# Issue operations (auto-detects Gitea vs GitHub)
pnpm exec mosaic-issue-create -t "Title" -b "Body" -m "Milestone"
pnpm exec mosaic-issue-list -s open -m "Milestone"

# PR operations (auto-detects Gitea vs GitHub)
pnpm exec mosaic-pr-create -t "Title" -b "Body" -B develop
pnpm exec mosaic-pr-merge -n 42 -m squash -d

# Milestone operations (auto-detects Gitea vs GitHub)
pnpm exec mosaic-milestone-create -t "M7-Feature" -d "Description"

# CI/CD operations (Woodpecker)
pnpm exec mosaic-ci-pipeline-status --latest
pnpm exec mosaic-ci-pipeline-wait -n 42
pnpm exec mosaic-ci-pipeline-logs -n 42

See packages/cli-tools/README.md for full command reference.

CI Configuration

Set these environment variables for Woodpecker CI integration:

export WOODPECKER_SERVER="https://ci.mosaicstack.dev"
export WOODPECKER_TOKEN="your-token-here"  # Get from ci.mosaicstack.dev/user

Phase 1: Bootstrap

Step 1: Parse Review Reports

Review reports follow this structure:

docs/reports/{report-name}/
├── 00-executive-summary.md   # Start here - overview and counts
├── 01-security-review.md     # Security findings with IDs like SEC-*
├── 02-code-quality-review.md # Code quality findings like CQ-*
├── 03-qa-test-coverage.md    # Test coverage gaps like TEST-*
└── ...

Extract findings by looking for:

Finding IDs (e.g., SEC-API-1, CQ-WEB-3, TEST-001)
Severity labels: Critical, High, Medium, Low
Affected files/components (use for repo column)
Specific line numbers or code patterns

Step 2: Categorize into Phases

Severity	Phase	Focus	Branch Pattern
Critical	1	Security vulnerabilities, data exposure	`fix/security`
High	2	Security hardening, auth gaps	`fix/security`
Medium	3	Code quality, performance, bugs	`fix/code-quality`
Low	4	Tests, documentation, cleanup	`fix/test-coverage`

Within each phase, order tasks by:

Blockers first (tasks that unblock others)
Same-file tasks grouped together
Simpler fixes before complex ones

Step 3: Estimate Token Usage

Task Type	Estimate	Examples
Single-line fix	3-5K	Typo, wrong operator, missing null check
Add guard/validation	5-8K	Add auth decorator, input validation
Fix error handling	8-12K	Proper try/catch, error propagation
Refactor pattern	10-15K	Replace KEYS with SCAN, fix memory leak
Add new functionality	15-25K	New service method, new component
Write tests	15-25K	Unit tests for untested service
Complex refactor	25-40K	Architectural change, multi-file refactor

Adjust estimates based on:

Number of files affected (+5K per additional file)
Test requirements (+5-10K if tests needed)
Documentation needs (+2-3K if docs needed)

Step 4: Determine Dependencies

Automatic dependency rules:

All tasks in Phase N depend on the Phase N-1 verification task
Tasks touching the same file should be sequential (earlier blocks later)
Auth/security foundation tasks block tasks that rely on them
Each phase ends with a verification task that depends on all phase tasks

Step 5: Create Gitea Issues (Phase-Level)

Create ONE issue per phase using @mosaic/cli-tools:

# Use mosaic CLI tools (auto-detects Gitea vs GitHub)
pnpm exec mosaic-issue-create \
  -t "Phase 1: Critical Security Fixes" \
  -b "## Findings
- SEC-API-1: Description
- SEC-WEB-2: Description

## Acceptance Criteria
- [ ] All critical findings remediated
- [ ] Quality gates passing" \
  -l "security" \
  -m "{milestone-name}"

CLI tools location: packages/cli-tools/bin/ - see packages/cli-tools/README.md for full documentation.

Step 6: Create docs/tasks.md

Create the file with this exact schema:

# Tasks

| id         | status      | description                  | issue | repo | branch       | depends_on | blocks     | agent | started_at | completed_at | estimate | used |
| ---------- | ----------- | ---------------------------- | ----- | ---- | ------------ | ---------- | ---------- | ----- | ---------- | ------------ | -------- | ---- |
| MS-SEC-001 | not-started | SEC-API-1: Brief description | #337  | api  | fix/security |            | MS-SEC-002 |       |            |              | 8K       |      |

Column definitions:

Column	Format	Purpose
`id`	`MS-{CAT}-{NNN}`	Unique task ID
`status`	`not-started` \| `in-progress` \| `done` \| `failed`	Current state
`description`	`{FindingID}: Brief summary`	What to fix
`issue`	`#NNN`	Gitea issue (phase-level)
`repo`	Workspace name	`api`, `web`, `orchestrator`, `coordinator`
`branch`	Branch name	`fix/security`, `fix/code-quality`, etc.
`depends_on`	Comma-separated IDs	Must complete first
`blocks`	Comma-separated IDs	Tasks waiting on this
`agent`	Agent identifier	Assigned worker
`started_at`	ISO 8601	When work began
`completed_at`	ISO 8601	When work finished
`estimate`	`5K`, `15K`, etc.	Predicted token usage
`used`	`4.2K`, `12.8K`, etc.	Actual usage

Step 7: Commit Bootstrap

git add docs/tasks.md docs/orchestrator-learnings.json
git commit -m "chore(orchestrator): Bootstrap tasks.md from review report

Parsed {N} findings into {M} tasks across {P} phases.
Estimated total: {X}K tokens."
git push

Phase 2: Execution Loop

1. git pull --rebase
2. Read docs/tasks.md
3. Find next task: status=not-started AND all depends_on are done
4. If no task available:
   - All done? → Report success, run final retrospective, STOP
   - Some blocked? → Report deadlock, STOP
5. Update tasks.md: status=in-progress, agent={identifier}, started_at={now}
6. Spawn worker agent (Task tool) with task details
7. Wait for worker completion
8. Parse worker result (JSON)
9. Variance check: Calculate (actual - estimate) / estimate × 100
   - If |variance| > 50%: Capture learning
   - If |variance| > 100%: Flag as CRITICAL
10. Update tasks.md: status=done/failed, completed_at={now}, used={actual}
11. Cleanup reports: Remove processed report files
12. Commit + push: git add docs/tasks.md && git commit && git push
13. CI verification (if configured):
    - Check latest pipeline status for the branch
    - Wait for pipeline completion (timeout: 30min)
    - On failure: Fetch logs and auto-diagnose
    - Common failures: lint, type-check, test, build, security
    - If CI fails: Mark task as failed, update tasks.md, restart from step 1
14. If phase verification task: Run phase retrospective
15. Check context usage
16. If >= 55%: Output COMPACTION REQUIRED checkpoint, STOP, wait for user
17. If < 55%: Go to step 1
18. After user runs /compact and says "continue": Go to step 1

Worker Prompt Template

## Task Assignment: {id}

**Description:** {description}
**Repository:** apps/{repo}
**Branch:** {branch}

**Reference:** See `docs/reports/` for detailed finding description. Search for the finding ID.

## Workflow

1. Checkout branch: `git checkout {branch} || git checkout -b {branch} develop && git pull`
2. Read the finding details from the report
3. Implement the fix following existing code patterns
4. Run quality gates (ALL must pass):
   ```bash
   pnpm lint && pnpm typecheck && pnpm test
   ```
5. If gates fail: Fix and retry. Do NOT report success with failures.
6. Commit: `git commit -m "fix({finding_id}): brief description"`
7. Push: `git push origin {branch}`
8. Report result as JSON (see format below)

## Result Format (MANDATORY)

```json
{
  "task_id": "{id}",
  "status": "success|failed",
  "used": "5.2K",
  "commit_sha": "abc123",
  "notes": "Brief summary of what was done"
}
```

## Rules

- DO NOT modify docs/tasks.md
- DO NOT claim other tasks
- Complete this single task, report results, done

CI Verification (Step 13)

After pushing code, the orchestrator can optionally monitor CI pipeline status to catch failures early.

Configuration

CI monitoring requires environment variables:

export WOODPECKER_SERVER="https://ci.mosaicstack.dev"
export WOODPECKER_TOKEN="your-token-here"

If not configured, CI verification is skipped (orchestrator logs a warning).

Verification Process

1. After git push, get latest pipeline for the branch
2. Wait for pipeline to complete (timeout: 30 minutes, configurable)
3. Poll every 10 seconds (configurable)
4. On completion:
   - Success: Continue to next step
   - Failure: Fetch logs and auto-diagnose

Auto-Diagnosis

When a pipeline fails, the orchestrator fetches logs and categorizes the failure:

Category	Pattern	Suggestion
Lint	`eslint`, `lint.*error`	Run `pnpm lint` locally
Type Check	`type.error`, `tsc.error`	Run `pnpm typecheck` locally
Test	`test.fail`, `vitest.fail`	Run `pnpm test` locally
Build	`build.fail`, `compilation.fail`	Run `pnpm build` locally
Security	`secret`, `security`, `vulnerability`	Review security scan, remove secrets
Unknown	(fallback)	Review full logs

Failure Handling

When CI fails:

Log the failure category and diagnosis
Update task status to failed in tasks.md
Include diagnosis in task notes
Commit the task update
Options:
- Re-spawn worker with error context to fix
- Skip task and continue (if non-critical)
- Stop and alert (if critical blocker)

CLI Integration

Use @mosaic/cli-tools for CI operations:

# Check latest pipeline status
pnpm exec mosaic-ci-pipeline-status --latest

# Wait for specific pipeline
pnpm exec mosaic-ci-pipeline-wait -n 42 -t 1800

# Get logs on failure
pnpm exec mosaic-ci-pipeline-logs -n 42

Service Integration

The CIOperationsService (in apps/orchestrator/src/ci/) provides:

getLatestPipeline(repo) - Get most recent pipeline
getPipeline(repo, number) - Get specific pipeline
waitForPipeline(repo, number, options) - Wait with auto-diagnosis
getPipelineLogs(repo, number) - Fetch logs

Example usage in orchestrator:

// After git push
const repo = "mosaic/stack";
const pipeline = await ciService.getLatestPipeline(repo);

if (pipeline) {
  const result = await ciService.waitForPipeline(repo, pipeline.number, {
    timeout: 1800,
    fetchLogsOnFailure: true,
  });

  if (!result.success && result.diagnosis) {
    this.logger.warn(`CI failed: ${result.diagnosis.category}`);
    this.logger.warn(`Suggestion: ${result.diagnosis.suggestion}`);
    // Handle failure...
  }
}

Context Threshold Protocol (Orchestrator Replacement)

Threshold: 55-60% context usage

Why replacement, not compaction?

Compaction causes protocol drift — agent "remembers" gist but loses specifics
Post-compaction agents may violate core rules (e.g., letting workers modify tasks.md)
Fresh orchestrator has 100% protocol fidelity
All state lives in docs/tasks.md — the orchestrator is stateless and replaceable

At threshold (55-60%):

Complete current task
Persist all state:
- Update docs/tasks.md with all progress
- Update docs/orchestrator-learnings.json with variances
- Commit and push both files
Output ORCHESTRATOR HANDOFF message with ready-to-use takeover kickstart
STOP COMPLETELY — do not continue working

Handoff message format:

---
⚠️ ORCHESTRATOR HANDOFF REQUIRED

Context: {X}% — Replacement recommended to prevent drift

Progress: {completed}/{total} tasks ({percentage}%)
Current phase: Phase {N} ({phase_name})

State persisted:
- docs/tasks.md ✓
- docs/orchestrator-learnings.json ✓

## Takeover Kickstart

Copy and paste this to spawn a fresh orchestrator:

---
## Continuation Mission

Continue {mission_description} from existing state.

## Setup
- Project: /home/localadmin/src/mosaic-stack
- State: docs/tasks.md (already populated)
- Protocol: docs/claude/orchestrator.md
- Quality gates: pnpm lint && pnpm typecheck && pnpm test

## Resume Point
- Next task: {task_id}
- Phase: {current_phase}
- Progress: {completed}/{total} tasks ({percentage}%)

## Instructions
1. Read docs/claude/orchestrator.md for protocol
2. Read docs/tasks.md to understand current state
3. Continue execution from task {task_id}
4. Follow Two-Phase Completion Protocol
5. You are the SOLE writer of docs/tasks.md
---

STOP: Terminate this session and spawn fresh orchestrator with the kickstart above.
---

Future: Coordinator Automation

When the Mosaic Stack Coordinator service is implemented, it will:

Monitor orchestrator stdout for context percentage
Detect the handoff checkpoint message
Parse the takeover kickstart
Automatically spawn fresh orchestrator
Log handoff events for debugging

For now, the human acts as Coordinator.

Rules:

Do NOT attempt to compact yourself — compaction causes drift
Do NOT continue past 60%
Do NOT claim you can "just continue" — protocol drift is real
STOP means STOP — the user (Coordinator) will spawn your replacement

Two-Phase Completion Protocol

Each major phase uses a two-phase approach to maximize completion while managing diminishing returns.

Bulk Phase (Target: 90%)

Focus on tractable errors
Parallelize where possible
When 90% reached, transition to Polish (do NOT declare success)

Polish Phase (Target: 100%)

Inventory: List all remaining errors with file:line

Categorize:

Category	Criteria	Action
Quick-win	<5 min, straightforward	Fix immediately
Medium	5-30 min, clear path	Fix in order
Hard	>30 min or uncertain	Attempt 15 min, then document
Architectural	Requires design change	Document and defer

Work priority: Quick-win → Medium → Hard

Document deferrals in docs/deferred-errors.md:

## MS-XXX: [Error description]

- File: path/to/file.ts:123
- Error: [exact error message]
- Category: Hard | Architectural | Framework Limitation
- Reason: [why this is non-trivial]
- Suggested approach: [how to fix in future]
- Risk: Low | Medium | High

Phase complete when:
- All Quick-win/Medium fixed
- All Hard attempted (fixed or documented)
- Architectural items documented with justification

Phase Boundary Rule

Do NOT proceed to the next major phase until the current phase reaches Polish completion:

✅ Phase 2 Bulk: 91%
✅ Phase 2 Polish: 118 errors triaged
   - 40 medium → fixed
   - 78 low → EACH documented with rationale
✅ Phase 2 Complete: Created docs/deferred-errors.md
→ NOW proceed to Phase 3

❌ WRONG: Phase 2 at 91%, "low priority acceptable", starting Phase 3

Reporting

When transitioning from Bulk to Polish:

Phase X Bulk Complete: {N}% ({fixed}/{total})
Entering Polish Phase: {remaining} errors to triage

When Polish Phase complete:

Phase X Complete: {final_pct}% ({fixed}/{total})
- Quick-wins: {n} fixed
- Medium: {n} fixed
- Hard: {n} fixed, {n} documented
- Framework limitations: {n} documented

Learning & Retrospective

Variance Thresholds

Variance	Action
0-30%	Log only (acceptable)
30-50%	Flag for review
50-100%	Capture learning to `docs/orchestrator-learnings.json`
>100%	CRITICAL — review task classification

Task Type Classification

Type	Keywords	Base Estimate
STYLE_FIX	"formatting", "prettier", "lint"	3-5K
BULK_CLEANUP	"unused", "warnings", "~N files"	file_count × 550
GUARD_ADD	"add guard", "decorator", "validation"	5-8K
SECURITY_FIX	"sanitize", "injection", "XSS"	8-12K × 2.5
AUTH_ADD	"authentication", "auth"	15-25K
REFACTOR	"refactor", "replace", "migrate"	10-15K
TEST_ADD	"add tests", "coverage"	15-25K

Report Cleanup

QA automation generates report files in docs/reports/qa-automation/pending/. Clean up after processing.

Event	Action
Task success	Delete matching reports from `pending/`
Task failed	Move reports to `escalated/`
Phase verification	Clean up all `pending/` reports
Milestone complete	Archive or delete `escalated/`

Stopping Criteria

ONLY stop if:

All tasks in docs/tasks.md are done
Critical blocker preventing progress (document and alert)
Context usage >= 55% — output COMPACTION REQUIRED checkpoint and wait
Absolute context limit reached AND cannot compact further

DO NOT stop to ask "should I continue?" — the answer is always YES. DO stop at 55-60% — output the compaction checkpoint and wait for user to run /compact.

Sprint Completion Protocol

When all tasks in docs/tasks.md are done (or triaged as deferred), archive the sprint artifacts before stopping. This preserves them for post-mortems, variance calibration, and historical reference.

Archive Steps

Create archive directory (if it doesn't exist):
```
mkdir -p docs/tasks/
```
Move tasks.md to archive:
```
mv docs/tasks.md docs/tasks/{milestone-name}-tasks.md
```
Example: docs/tasks/M6-AgentOrchestration-Fixes-tasks.md

Move learnings to archive:

mv docs/orchestrator-learnings.json docs/tasks/{milestone-name}-learnings.json

Commit the archive:

git add docs/tasks/
git rm docs/tasks.md docs/orchestrator-learnings.json 2>/dev/null || true
git commit -m "chore(orchestrator): Archive {milestone-name} sprint artifacts

{completed}/{total} tasks completed, {deferred} deferred.
Archived to docs/tasks/ for post-mortem reference."
git push

Run final retrospective — review variance patterns and propose updates to estimation heuristics.

Recovery

If an orchestrator starts and docs/tasks.md does not exist, check docs/tasks/ for the most recent archive:

ls -t docs/tasks/*-tasks.md 2>/dev/null | head -1

If found, this may indicate another session archived the file. The orchestrator should:

Report what it found in docs/tasks/
Ask whether to resume from the archived file or bootstrap fresh
If resuming: copy the archive back to docs/tasks.md and continue

Retention Policy

Keep all archived sprints indefinitely. They are small text files and valuable for:

Post-mortem analysis
Estimation variance calibration across milestones
Understanding what was deferred and why
Onboarding new orchestrators to project history

Kickstart Message Format

## Mission

Remediate findings from the codebase review.

## Setup

- Project: /home/localadmin/src/mosaic-stack
- Review: docs/reports/{report-name}/
- Quality gates: pnpm lint && pnpm typecheck && pnpm test
- Milestone: {milestone-name}
- Task prefix: MS

## Protocol

Read docs/claude/orchestrator.md for full instructions.

## Start

Bootstrap from the review report, then execute until complete.

Quick Reference

Phase	Action
Bootstrap	Parse reports → Categorize → Estimate → Create issues → Create tasks.md
Execute	Loop: claim → spawn worker → update → commit
Compact	At 60%: summarize, clear history, continue
Stop	Queue empty, blocker, or context limit

Orchestrator owns tasks.md. Workers execute and report. Single writer eliminates conflicts.

25 KiB Raw Permalink Blame History Unescape Escape