Work packages completed: - WP1: packages/forge — pipeline runner, stage adapter, board tasks, brief classifier, persona loader with project-level overrides. 89 tests, 95.62% coverage. - WP2: packages/macp — credential resolver, gate runner, event emitter, protocol types. 65 tests, 96.24% coverage. Full Python-to-TS port preserving all behavior. - WP3: plugins/mosaic-framework — OC rails injection plugin (before_agent_start + subagent_spawning hooks for Mosaic contract enforcement). - WP4: profiles/ (domains, tech-stacks, workflows), guides/ (17 docs), skills/ (5 universal skills), forge pipeline assets (48 markdown files). Board deliberation: docs/reviews/consolidation-board-memo.md Brief: briefs/monorepo-consolidation.md Consolidates mosaic/stack (forge, MACP, bootstrap framework) into mosaic/mosaic-stack. 154 new tests total. Zero Python — all TypeScript/ESM.
542 lines
34 KiB
Markdown
542 lines
34 KiB
Markdown
# Specialist Pipeline — Progressive Refinement Architecture
|
|
|
|
**Status:** DRAFT v4 — post architecture review
|
|
**Created:** 2026-03-24
|
|
**Last Updated:** 2026-03-24 20:40 CDT
|
|
|
|
---
|
|
|
|
## Vision
|
|
|
|
Replace "throw it at a Codex worker and hope" with a **railed pipeline** where each stage narrows scope, increases precision, and catches mistakes before they compound. Spend more time up-front declaring requirements; spend less time at the end fixing broken output.
|
|
|
|
**Core principles:**
|
|
|
|
- One agent, one specialty. No generalists pretending to be experts.
|
|
- Agents must be willing to **argue, debate, and push back** — not eagerly agree and move on.
|
|
- The pipeline is a set of **customizable rails** — agents stay on track, don't get sidetracked or derailed.
|
|
- Dynamic composition — only relevant specialists are called in per task.
|
|
- Hard gates between stages — mechanical checks + agent oversight for final decision.
|
|
- Minimal human oversight once the PRD is declared.
|
|
|
|
---
|
|
|
|
## The Pipeline
|
|
|
|
```
|
|
PRD.md (human declares requirements)
|
|
│
|
|
▼
|
|
BRIEFS (PRD decomposed into discrete work units)
|
|
│
|
|
▼
|
|
BOARD OF DIRECTORS (strategic go/no-go per brief)
|
|
│ Static composition. CEO, CTO, CFO, COO.
|
|
│ Output: Approved brief with business constraints, priority, budget
|
|
│ Board does NOT select technical participants — that's the Brief Analyzer's job
|
|
│ Gate: Board consensus required to proceed
|
|
│ REJECTED → archive + notify human. NEEDS REVISION → back to Intake.
|
|
│
|
|
│ POST-RUN REVIEW: Board reviews memos from completed pipeline
|
|
│ runs. Analyzes for conflicts, adjusts strategy, feeds learnings
|
|
│ back into future briefs. The Board is not fire-and-forget.
|
|
│
|
|
▼
|
|
BRIEF ANALYZER (technical composition)
|
|
│ Sonnet agent analyzes approved brief + project context
|
|
│ Selects which generalists/specialists participate in each planning stage
|
|
│ Separates strategic decisions (Board) from technical composition
|
|
│
|
|
▼
|
|
PLANNING 1 — Architecture (Domain Generalists)
|
|
│ Dynamic composition based on brief requirements.
|
|
│ Software Architect + relevant generalists only.
|
|
│ Output: Architecture Decision Record (ADR)
|
|
│ Agents MUST debate trade-offs. No rubber-stamping.
|
|
│ Gate: ADR approved, all dissents resolved or recorded
|
|
│
|
|
▼
|
|
PLANNING 2 — Implementation Design (Language/Domain Specialists)
|
|
│ Dynamic composition — only languages/domains in the ADR.
|
|
│ Output: Implementation spec per component
|
|
│ Each specialist argues for their domain's best practices.
|
|
│ Gate: All specs reviewed by Architecture, no conflicts
|
|
│
|
|
▼
|
|
PLANNING 3 — Task Decomposition & Estimation
|
|
│ Context Manager + Task Distributor
|
|
│ Output: Task breakdown with dependency graph, estimates,
|
|
│ context packets per worker, acceptance criteria
|
|
│ Gate: Every task has one owner, one completion condition,
|
|
│ estimated rounds, and explicit test criteria
|
|
│
|
|
▼
|
|
CODING (Workers execute)
|
|
│ Codex/Claude workers with specialist subagents loaded
|
|
│ Each worker gets: context packet + implementation spec + acceptance criteria
|
|
│ Workers stay in their lane — the rails prevent drift
|
|
│ Gate: Code compiles, lints, passes unit tests
|
|
│
|
|
▼
|
|
REVIEW (Specialist review)
|
|
│ Code reviewer (evidence-driven, severity-ranked)
|
|
│ Security auditor (attack paths, secrets, auth)
|
|
│ Language specialist for the relevant language
|
|
│ Gate: All findings addressed or explicitly accepted with rationale
|
|
│
|
|
▼
|
|
REMEDIATE (if review finds issues)
|
|
│ Worker fixes based on review findings
|
|
│ Loops back to REVIEW
|
|
│ Gate: Same as REVIEW — clean pass required
|
|
│
|
|
▼
|
|
TEST (Integration + acceptance)
|
|
│ QA Strategist validates against acceptance criteria from Planning 3
|
|
│ Gate: All acceptance criteria pass, no regressions
|
|
│
|
|
▼
|
|
DEPLOY
|
|
Infrastructure Lead handles deployment
|
|
Gate: Smoke tests pass in target environment
|
|
```
|
|
|
|
---
|
|
|
|
## Orchestration — Who Watches the Pipeline?
|
|
|
|
### The Orchestrator (Mosaic's role)
|
|
|
|
**Not me (Jarvis). Not any single agent. The Orchestrator is a dedicated, mechanical process with AI oversight.**
|
|
|
|
The Orchestrator is:
|
|
|
|
- **Primarily mechanical** — moves work through stages, enforces gates, tracks state
|
|
- **AI-assisted at decision points** — an agent reviews gate results and makes go/no-go calls
|
|
- **The thing Mosaic Stack productizes** — this IS the engine from the North Star vision
|
|
|
|
How it works:
|
|
|
|
1. **Stage Runner** (mechanical): Advances work through the pipeline. Checks gate conditions. Purely deterministic — "did all gate criteria pass? yes → advance. no → hold."
|
|
2. **Gate Reviewer** (AI agent): When a gate's mechanical checks pass, the Gate Reviewer does a final sanity check. "The code lints and tests pass, but does this actually solve the problem?" This is the lightweight oversight layer.
|
|
3. **Escalation** (to human): If the Gate Reviewer is uncertain, or if debate in a planning stage is unresolved after N rounds, escalate to Jason.
|
|
|
|
### What Sends a Plan Back for More Debate?
|
|
|
|
Triggers for **rework/rejection**:
|
|
|
|
- **Gate failure** — mechanical checks don't pass → automatic rework
|
|
- **Gate Reviewer dissent** — AI reviewer flags a concern → sent back with specific objection
|
|
- **Unresolved debate** — planning agents can't reach consensus after N rounds → escalate or send back with the dissenting positions documented
|
|
- **Scope creep detection** — if a stage's output significantly exceeds the brief's scope → flag and return
|
|
- **Dependency conflict** — Planning 3 finds the task breakdown has circular deps or impossible ordering → return to Planning 2
|
|
- **Review severity threshold** — if Review finds CRITICAL-severity issues → auto-reject back to Coding, no discussion
|
|
|
|
### Human Touchpoints (minimal by design)
|
|
|
|
- **PRD.md** — Human writes this. This is where you spend the time.
|
|
- **Board escalation** — Only if the Board can't reach consensus on a brief.
|
|
- **Planning escalation** — Only if debate is unresolved after max rounds.
|
|
- **Deploy approval** — Optional. Could be fully automated for low-risk deploys.
|
|
|
|
Everything else runs autonomously on rails.
|
|
|
|
---
|
|
|
|
## Gate System
|
|
|
|
Every gate has **mechanical checks** (automated, deterministic) and an **agent review** (final judgment call).
|
|
|
|
| Stage → | Mechanical Checks | Agent Review |
|
|
| -------------------------------------- | ----------------------------------------------------------------- | ----------------------------------------------------------------------------- |
|
|
| **Board → Planning 1** | Brief exists, has success criteria, has budget | Gate Reviewer: "Is this brief well-scoped enough to architect?" |
|
|
| **Planning 1 → Planning 2** | ADR exists, covers all components in brief | Gate Reviewer: "Does this architecture actually solve the problem?" |
|
|
| **Planning 2 → Planning 3** | Implementation spec per component, no unresolved conflicts | Gate Reviewer: "Are the specs consistent with each other and the ADR?" |
|
|
| **Planning 3 → Coding** | Task breakdown exists, all tasks have owner + criteria + estimate | Gate Reviewer: "Is this actually implementable as decomposed?" |
|
|
| **Coding → Review** | Compiles, lints, unit tests pass | Gate Reviewer: "Does the code match the implementation spec?" |
|
|
| **Review → Test** (or **→ Remediate**) | All review findings addressed | Gate Reviewer: "Are the fixes real or did the worker just suppress warnings?" |
|
|
| **Test → Deploy** | All acceptance criteria pass, no regressions | Gate Reviewer: "Ready for production?" |
|
|
|
|
---
|
|
|
|
## Dynamic Composition
|
|
|
|
### Board of Directors — STATIC
|
|
|
|
Always the same participants. These are strategic, not technical.
|
|
|
|
| Role | Model | Personality |
|
|
| ---- | ------ | --------------------------------------------------------------------------------------------------------------------------- |
|
|
| CEO | Opus | Visionary, asks "does this serve the mission?" |
|
|
| CTO | Opus | Technical realist, asks "can we actually build this?" |
|
|
| CFO | Sonnet | Cost-conscious, asks "what does this cost vs return?" — needs real analytical depth for budget/ROI, not a lightweight model |
|
|
| COO | Sonnet | Operational, asks "what's the timeline and resource impact?" |
|
|
|
|
### Planning Stages — DYNAMIC
|
|
|
|
**The Orchestrator selects participants based on the brief's requirements.** Not every specialist is needed for every task.
|
|
|
|
Selection logic:
|
|
|
|
1. Parse the brief/ADR for **languages mentioned** → include those Language Specialists
|
|
2. Parse for **infrastructure concerns** → include Infra Lead, Docker/Swarm, CI/CD as needed
|
|
3. Parse for **data concerns** → include Data Architect, SQL Pro
|
|
4. Parse for **UI concerns** → include UX Strategist, Web Design, React/RN Specialist
|
|
5. Parse for **security concerns** → include Security Architect
|
|
6. **Always include:** Software Architect (Planning 1), QA Strategist (Planning 3)
|
|
|
|
Example: A TypeScript NestJS API endpoint with Prisma:
|
|
|
|
- Planning 1: Software Architect, Security Architect, Data Architect
|
|
- Planning 2: TypeScript Pro, NestJS Expert, SQL Pro
|
|
- Planning 3: Task Distributor, Context Manager
|
|
|
|
Example: A React dashboard with no backend changes:
|
|
|
|
- Planning 1: Software Architect, UX Strategist
|
|
- Planning 2: React Specialist, Web Design, UX/UI Design
|
|
- Planning 3: Task Distributor, Context Manager
|
|
|
|
**Go Pro doesn't sit in on a TypeScript project. Solidity Pro doesn't weigh in on a dashboard.**
|
|
|
|
---
|
|
|
|
## Debate Culture
|
|
|
|
Agents in planning stages are **required** to:
|
|
|
|
1. **State their position with reasoning** — no "sounds good to me"
|
|
2. **Challenge other positions** — "I disagree because..."
|
|
3. **Identify risks the others haven't raised** — adversarial by design
|
|
4. **Formally dissent if not convinced** — dissents are recorded in the ADR/spec
|
|
5. **Not capitulate just to move forward** — the Orchestrator tracks rounds and will call time, but agents shouldn't fold under social pressure
|
|
|
|
**Round limits:** Min 3, Max 30. The discussion must be allowed to properly work. Don't cut debate short — premature consensus produces bad architecture. The Orchestrator tracks rounds and will intervene only when debate is genuinely circular (repeating the same arguments) rather than still productive.
|
|
|
|
This is enforced via personality in the agent definitions:
|
|
|
|
- Architects are opinionated and will argue for clean boundaries
|
|
- Security Architect is paranoid by design — always looking for what can go wrong
|
|
- QA Strategist is skeptical — "prove it works, don't tell me it works"
|
|
- Language specialists are purists about their domain's best practices
|
|
|
|
**The goal:** By the time code is written, the hard decisions are already made and debated. The workers just execute a well-argued plan.
|
|
|
|
---
|
|
|
|
## Model Assignments
|
|
|
|
| Pipeline Stage | Model | Rationale |
|
|
| --------------------------- | --------------------------------- | --------------------------------------------------- |
|
|
| Board of Directors | Opus (CEO/CTO) / Sonnet (CFO/COO) | Strategic deliberation needs depth across the board |
|
|
| Planning 1 (Architecture) | Opus | Complex trade-offs, needs deep reasoning |
|
|
| Planning 2 (Implementation) | Sonnet | Domain expertise, detailed specs |
|
|
| Planning 3 (Decomposition) | Sonnet | Structured output, dependency analysis |
|
|
| Coding | Codex | Primary workhorse, separate budget |
|
|
| Review | Sonnet (code) + Opus (security) | Code review = Sonnet, security = Opus for depth |
|
|
| Remediation | Codex | Same worker, fix the issues |
|
|
| Test | Haiku | Mechanical validation, low complexity |
|
|
| Deploy | Haiku | Scripted deployment, mechanical |
|
|
| Gate Reviewer | Sonnet | Judgment calls, moderate complexity |
|
|
| Orchestrator (mechanical) | None — deterministic code | State machine, not AI |
|
|
|
|
---
|
|
|
|
## Roster
|
|
|
|
### Board of Directors (static)
|
|
|
|
| Role | Scope |
|
|
| ---- | ----------------------------------------- |
|
|
| CEO | Vision, priorities, go/no-go |
|
|
| CTO | Technical direction, risk tolerance |
|
|
| CFO | Budget, cost/benefit |
|
|
| COO | Operations, timeline, resource allocation |
|
|
|
|
### Domain Generalists (dynamic — called per brief)
|
|
|
|
| Role | Scope | Selected When |
|
|
| ----------------------- | ------------------------------------------------------------- | -------------------------------------------------------------------------- |
|
|
| **Software Architect** | System design, component boundaries, data flow, API contracts | Always in Planning 1 |
|
|
| **Security Architect** | Threat modeling, auth patterns, secrets, OWASP | **Always** — security is cross-cutting; implicit requirements are the norm |
|
|
| **Infrastructure Lead** | Deployment, networking, monitoring, scaling, DR | Brief involves deploy, infra, scaling |
|
|
| **Data Architect** | Schema design, migrations, query strategy, caching | Brief involves DB, data models, migrations |
|
|
| **QA Strategist** | Test strategy, coverage, integration test design | Always in Planning 3 |
|
|
| **UX Strategist** | User flows, information architecture, accessibility | Brief involves UI/frontend |
|
|
|
|
### Language Specialists (dynamic — one language, one agent)
|
|
|
|
| Specialist | Selected When |
|
|
| -------------------- | ------------------------------------------ |
|
|
| **TypeScript Pro** | Project uses TypeScript |
|
|
| **JavaScript Pro** | Project uses vanilla JS / Node.js |
|
|
| **Go Pro** | Project uses Go |
|
|
| **Rust Pro** | Project uses Rust |
|
|
| **Solidity Pro** | Project involves smart contracts |
|
|
| **Python Pro** | Project uses Python |
|
|
| **SQL Pro** | Project involves database queries / Prisma |
|
|
| **LangChain/AI Pro** | Project involves AI/ML/agent frameworks |
|
|
|
|
### Domain Specialists (dynamic — cross-cutting expertise)
|
|
|
|
| Specialist | Selected When |
|
|
| -------------------- | ------------------------------------ |
|
|
| **Web Design** | Frontend work involving HTML/CSS |
|
|
| **UX/UI Design** | Component design, design system work |
|
|
| **React Specialist** | Frontend uses React |
|
|
| **React Native Pro** | Mobile app work |
|
|
| **Blockchain/DeFi** | Chain interactions, DeFi protocols |
|
|
| **Docker/Swarm** | Containerization, deployment |
|
|
| **CI/CD** | Pipeline changes, deploy automation |
|
|
| **NestJS Expert** | Backend uses NestJS |
|
|
|
|
---
|
|
|
|
## Source Material — What to Pull From External Repos
|
|
|
|
### From VoltAgent/awesome-codex-subagents (`.toml` format)
|
|
|
|
| File | What We Take | What We Customize |
|
|
| -------------------------------------------------- | ----------------------------------------------------------- | ------------------------------------------------------------ |
|
|
| `09-meta-orchestration/context-manager.toml` | Context packaging for workers | Add our monorepo structure, Gitea CI, project conventions |
|
|
| `09-meta-orchestration/task-distributor.toml` | Dependency graphs, write-scope separation, output contracts | Add worktree rules, PR workflow, completion gates |
|
|
| `09-meta-orchestration/workflow-orchestrator.toml` | Stage design with explicit wait points and gates | Wire to our pipeline stages |
|
|
| `09-meta-orchestration/agent-organizer.toml` | Task decomposition by objective (not file list) | Add our agent registry, model hierarchy rules |
|
|
| `04-quality-security/reviewer.toml` | Evidence-driven review, severity ranking | Add NestJS import rules, Prisma gotchas, our recurring bugs |
|
|
| `04-quality-security/security-auditor.toml` | Attack path mapping, secrets handling review | Add our Docker Swarm patterns, credential loader conventions |
|
|
|
|
### From VoltAgent/awesome-openclaw-skills (ClawHub)
|
|
|
|
| Skill | What We Take | How We Use It |
|
|
| -------------------------- | ----------------------------------------------------- | -------------------------------------------------------- |
|
|
| `brainstorming-2` | Socratic pre-coding design workflow | Planning 1 — requirements refinement before architecture |
|
|
| `agent-estimation` | Task effort in tool-call rounds | Planning 3 — scope tasks before spawning workers |
|
|
| `agent-nestjs-skills` | 40 prioritized NestJS rules with code examples | NestJS specialist + backend workers |
|
|
| `agent-team-orchestration` | Structured handoff protocols, task state transitions | Reference for pipeline stage handoffs |
|
|
| `b3ehive` | Competitive implementation (3 agents, cross-evaluate) | Critical components: crypto strategies, auth flows |
|
|
| `agent-council` | Agent scaffolding automation | Automate specialist creation as we expand |
|
|
| `astrai-code-review` | Model routing by diff complexity | Review stage cost optimization |
|
|
| `bug-audit` | 6-phase Node.js audit methodology | Periodic codebase health checks |
|
|
|
|
### From VoltAgent/awesome-claude-code-subagents (`.md` format)
|
|
|
|
| File | What We Take | Notes |
|
|
| ------------------------------------------ | ----------------------------------------------- | ------------------------------------------------------ |
|
|
| Language specialist `.md` files | System prompts for TS, Go, Rust, Solidity, etc. | Strip generic stuff, inject project-specific knowledge |
|
|
| `09-meta-orchestration/agent-organizer.md` | Detailed organizer pattern | Reference — Codex `.toml` is tighter |
|
|
|
|
---
|
|
|
|
## Gaps This Fills
|
|
|
|
| Gap | Current State | After Pipeline |
|
|
| ------------------------------- | --------------------------------------- | ----------------------------------------------------------------- |
|
|
| No pre-coding design | Brief → Codex starts coding immediately | 3 planning stages before anyone writes code |
|
|
| Agents get sidetracked/derailed | No rails, workers drift from task | Mechanical pipeline + context packets keep workers on track |
|
|
| No debate on approach | First idea wins | Agents required to argue, dissent, challenge |
|
|
| No task estimation | Eyeball everything | Tool-call-round estimation in Planning 3 |
|
|
| Code review is a checkbox | "Did it lint? Ship it." | Evidence-driven reviewer + specialist knowledge |
|
|
| Security review is hand-waved | Never actually done | Real attack path mapping, secrets review |
|
|
| Workers get bad context | Ad-hoc prompts, stale assumptions | Context-manager produces execution-ready packets |
|
|
| Task decomposition is sloppy | "Here's a task, go do it" | Dependency graphs, write-scope separation, output contracts |
|
|
| Wrong specialists involved | Everyone weighs in on everything | Dynamic composition — only relevant experts |
|
|
| No rework mechanism | Ship it or start over | Explicit remediation loop with review re-check |
|
|
| Too much human oversight | Jason babysits every stage | Mechanical gates + AI oversight, human only at PRD and escalation |
|
|
|
|
---
|
|
|
|
## Implementation Plan
|
|
|
|
### Phase 1 — Foundation (this week)
|
|
|
|
1. Pull and customize Codex subagents: `reviewer.toml`, `security-auditor.toml`, `context-manager.toml`, `task-distributor.toml`, `workflow-orchestrator.toml`
|
|
2. Inject our project-specific knowledge
|
|
3. Install to `~/.codex/agents/`
|
|
4. Define agent personality templates for debate culture (opinionated, adversarial, skeptical)
|
|
|
|
### Phase 2 — Specialist Definitions (next week)
|
|
|
|
1. Create language specialist definitions (TS, JS, Go, Rust, Solidity, Python, SQL, LangChain, C++)
|
|
2. Create domain specialist definitions (NestJS, React, Docker/Swarm, CI/CD, Web Design, UX/UI, Blockchain/DeFi, React Native)
|
|
3. Create generalist definitions (Software Architect, Security Architect, Infra Lead, Data Architect, QA Strategist, UX Strategist)
|
|
4. Format as Codex `.toml` + OpenClaw skills
|
|
5. Test each against a real past task
|
|
|
|
### Phase 3 — Pipeline Wiring (week after)
|
|
|
|
1. Build the Orchestrator (mechanical stage runner + gate checker)
|
|
2. Build the Gate Reviewer agent
|
|
3. Wire dynamic composition (brief → participant selection)
|
|
4. Wire the debate protocol (round tracking, dissent recording, escalation rules)
|
|
5. Wire Planning 1 → 2 → 3 handoff contracts
|
|
6. Wire Review → Remediate → Review loop
|
|
7. Test end-to-end with a real feature request
|
|
|
|
### Phase 4 — Mosaic Integration (future)
|
|
|
|
1. The Orchestrator becomes a Mosaic Stack feature
|
|
2. Pipeline stages map to Mosaic task states
|
|
3. Gate results feed the Mission Control dashboard
|
|
4. This IS the engine — the dashboard is just the window
|
|
|
|
### Phase 5 — Advanced Patterns (future)
|
|
|
|
1. `b3ehive` competitive implementation for critical paths
|
|
2. `astrai-code-review` model routing for cost optimization
|
|
3. `agent-council` automated scaffolding for new specialists
|
|
4. Estimation feedback loop (compare estimates to actuals)
|
|
5. Pipeline analytics (which stages catch the most issues, where do we bottleneck)
|
|
|
|
---
|
|
|
|
## Resolved Decisions
|
|
|
|
| # | Question | Decision | Rationale |
|
|
| --- | ----------------------- | ------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
|
| 1 | **Gate Reviewer model** | Sonnet for all gates | Sufficient depth for judgment calls; Opus reserved for planning deliberation |
|
|
| 2 | **Debate rounds** | Min 3, Max 30 per stage | Let discussions work. Don't cut short. Intervene on circular repetition, not round count. |
|
|
| 3 | **PRD format** | Use existing Mosaic PRD template | `~/.config/mosaic/templates/docs/PRD.md.template` + `~/.config/mosaic/skills-local/prd/SKILL.md` already proven. Iterate from there. |
|
|
| 4 | **Small tasks** | Pipeline is for projects/features, not typo fixes | This is for getting a project or feature built smoothly. Single-file fixes go direct to a worker. Threshold: if it needs architecture decisions, it goes through the pipeline. |
|
|
| 5 | **Specialist memory** | Yes — specialists accumulate knowledge with rails | Similar to OpenClaw memory model. Specialists learn from past tasks ("last time X caused Y") but must maintain their specialty rails. Knowledge is domain-scoped, not freeform. |
|
|
| 6 | **Cost ceiling** | ~$500 per pipeline run (11+ stages) | Using subs (Anthropic, OpenAI), so API costs are minimized or eliminated. Budget is time/throughput, not dollars. |
|
|
| 7 | **Where this lives** | Standalone service, Pi under the hood | Must be standalone so it can migrate to Mosaic Stack in the future. Pi (mosaic bootstrap) provides the execution substrate. Already using Pi for BOD. Dogfood → prove → productize. |
|
|
|
|
## PRD Template
|
|
|
|
The pipeline uses the existing Mosaic PRD infrastructure:
|
|
|
|
- **Template:** `~/.config/mosaic/templates/docs/PRD.md.template`
|
|
- **Skill:** `~/.config/mosaic/skills-local/prd/SKILL.md` (guided PRD generation with clarifying questions)
|
|
- **Guide:** `~/.config/mosaic/guides/PRD.md` (hard rules — PRD must exist before coding begins)
|
|
|
|
### Required PRD Sections (from Mosaic guide)
|
|
|
|
1. Problem statement and objective
|
|
2. In-scope and out-of-scope
|
|
3. User/stakeholder requirements
|
|
4. Functional requirements
|
|
5. Non-functional requirements (security, performance, reliability, observability)
|
|
6. Acceptance criteria
|
|
7. Constraints and dependencies
|
|
8. Risks and open questions
|
|
9. Testing and verification expectations
|
|
10. Delivery/milestone intent
|
|
|
|
The PRD skill also generates user stories with specific acceptance criteria ("Button shows confirmation dialog before deleting" not "Works correctly").
|
|
|
|
**Key rule from Mosaic:** Implementation that diverges from PRD without PRD updates is a blocker. Change control: update PRD first → update plan → then implement.
|
|
|
|
## Board Post-Run Review
|
|
|
|
The Board of Directors is NOT fire-and-forget. After a pipeline run completes (deploy or failure):
|
|
|
|
1. **Memos from each stage** are compiled into a run summary
|
|
2. **Board reviews** the summary for:
|
|
- Conflicts between stage outputs
|
|
- Scope drift from original brief
|
|
- Cost/timeline variance from estimates
|
|
- Strategic alignment issues
|
|
3. **Board adjusts** strategy, priorities, or constraints for future briefs
|
|
4. **Learnings** feed back into specialist memory and Orchestrator heuristics
|
|
|
|
This closes the loop. The pipeline doesn't just ship code — it learns from every run.
|
|
|
|
## Architecture Review Fixes (v4, 2026-03-24)
|
|
|
|
Fixes applied based on Sonnet architecture review:
|
|
|
|
| Finding | Fix Applied |
|
|
| ------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------ |
|
|
| Dead-end states (REJECTED, NEEDS REVISION, CI failure, worker confusion) | All paths explicitly defined in orchestrator + Board stage |
|
|
| Security Architect conditional (keyword matching misses implicit auth) | Security Architect now ALWAYS included in Planning 1 |
|
|
| Board making technical composition decisions | New Brief Analyzer agent handles technical composition after Board approval |
|
|
| Orchestrator claimed "purely mechanical" but needs semantic analysis | Split into State Machine (mechanical) + Gate Reviewer (AI). Circularity detection is Gate Reviewer's job. |
|
|
| Test→Remediate had no loop limit | Shared 3-loop budget across Review + Test remediation |
|
|
| Open-ended debate (3-30 rounds) too loose, framing bias | Structured 3-phase debate: Independent positions → Responses → Synthesis. Tighter round limits (17-53 calls vs 12-120+). |
|
|
| Review only gets diff | Review now gets full module context + context packet, not just diff |
|
|
| Cross-brief dependency not enforced at runtime | State Machine enforces dependency ordering + file-level locking |
|
|
| Gate Reviewer reading full transcripts (context problem) | Gate Reviewer reads structured summaries, requests full transcript only on suspicion |
|
|
| No minimum specialist composition for Planning 2 | Guard added: at least 1 Language + 1 Domain specialist required |
|
|
|
|
## Remaining Open Questions
|
|
|
|
1. **Pi integration specifics:** How exactly does Pi serve as the execution substrate? Board sessions already work via `mosaic yolo pi`. Does the full pipeline run as a Pi orchestration, or does Pi just handle individual stage sessions?
|
|
2. **Specialist memory storage:** OpenBrain? Per-specialist markdown files? Scoped memory namespaces?
|
|
3. **Pipeline analytics:** What metrics do we track per run? Stage duration, rework count, gate failure rate, estimate accuracy?
|
|
4. **Parallel briefs:** Can multiple briefs from the same PRD run through the pipeline concurrently? Or strictly serial?
|
|
5. **Escalation UX:** When the pipeline escalates to Jason, where does that notification go? Discord? TUI? Both?
|
|
|
|
---
|
|
|
|
## Connection to Mosaic North Star
|
|
|
|
This pipeline IS the Mosaic vision, just running on agent infrastructure instead of a proper platform:
|
|
|
|
- **PRD.md** → Mosaic's task queue API
|
|
- **Orchestrator** → Mosaic's agent lifecycle management
|
|
- **Gates** → Mosaic's review gates
|
|
- **Pipeline stages** → Mosaic's workflow engine
|
|
- **Dynamic composition** → Mosaic's agent selection
|
|
|
|
Everything we build here gets dogfooded, refined, and eventually productized as Mosaic Stack features. We're building the engine that Mosaic will sell.
|
|
|
|
### Standalone Architecture (decided)
|
|
|
|
The pipeline is built as a **standalone service** — not embedded in OpenClaw or tightly coupled to any single agent framework. This is deliberate:
|
|
|
|
1. **Pi (mosaic bootstrap) is the execution substrate** — already proven with BOD sessions
|
|
2. **The Orchestrator is a mechanical state machine** — it doesn't need an LLM, it needs a process manager
|
|
3. **Stage sessions are Pi/agent sessions** — each planning/review stage spawns a session with the right participants
|
|
4. **Migration path to Mosaic Stack is clean** — standalone service → Mosaic feature, not "rip out of OpenClaw"
|
|
|
|
The pattern: dogfood on our projects → track what works → extract into Mosaic Stack as a first-class feature.
|
|
|
|
---
|
|
|
|
## References
|
|
|
|
- VoltAgent/awesome-codex-subagents: https://github.com/VoltAgent/awesome-codex-subagents
|
|
- VoltAgent/awesome-claude-code-subagents: https://github.com/VoltAgent/awesome-claude-code-subagents
|
|
- VoltAgent/awesome-openclaw-skills: https://github.com/VoltAgent/awesome-openclaw-skills
|
|
- Board implementation: `mosaic/board` branch (commit ad4304b)
|
|
- Mosaic North Star: `~/.openclaw/workspace/memory/mosaic-north-star.md`
|
|
- Existing agent registry: `~/.openclaw/workspace/agents/REGISTRY.yaml`
|
|
- Mosaic Queue PRD: `~/src/jarvis-brain/docs/planning/MOSAIC-QUEUE-PRD.md`
|
|
|
|
---
|
|
|
|
## Brief Classification System (skip-BOD support)
|
|
|
|
**Added:** 2026-03-26
|
|
|
|
Not every brief needs full Board of Directors review. The classification system lets briefs skip stages based on their nature.
|
|
|
|
### Classes
|
|
|
|
| Class | Pipeline | Use case |
|
|
| ----------- | ----------------------------- | -------------------------------------------------------------------- |
|
|
| `strategic` | BOD → BA → Planning 1 → 2 → 3 | New features, architecture, integrations, security, budget decisions |
|
|
| `technical` | BA → Planning 1 → 2 → 3 | Refactors, bugfixes, UI tweaks, style changes |
|
|
| `hotfix` | Planning 1 → 2 → 3 | Urgent patches — skip both BOD and BA |
|
|
|
|
### Classification priority (highest wins)
|
|
|
|
1. `--class` CLI flag on `forge run` or `forge resume`
|
|
2. YAML frontmatter `class:` field in the brief
|
|
3. Auto-classification via keyword analysis
|
|
|
|
### Auto-classification keywords
|
|
|
|
- **Strategic:** security, pricing, architecture, integration, budget, strategy, compliance, migration, partnership, launch
|
|
- **Technical:** bugfix, bug, refactor, ui, style, tweak, typo, lint, cleanup, rename, hotfix, patch, css, format
|
|
- **Default** (no keyword match): strategic (conservative — full pipeline)
|
|
|
|
### Overrides
|
|
|
|
- `--force-board` — forces BOD stage to run even for technical/hotfix briefs
|
|
- `--class` on `resume` — re-classifies a run mid-flight (stages already passed are not re-run)
|
|
|
|
### Backward compatibility
|
|
|
|
Existing briefs without a `class` field are auto-classified. The default (no matching keywords) is `strategic`, so all existing runs get the full pipeline unless keywords trigger `technical`.
|