stack/packages/forge/PLAN.md

# Specialist Pipeline — Progressive Refinement Architecture

**Status:** DRAFT v4 — post architecture review
**Created:** 2026-03-24
**Last Updated:** 2026-03-24 20:40 CDT

---

## Vision

Replace "throw it at a Codex worker and hope" with a **railed pipeline** where each stage narrows scope, increases precision, and catches mistakes before they compound. Spend more time up-front declaring requirements; spend less time at the end fixing broken output.

**Core principles:**

- One agent, one specialty. No generalists pretending to be experts.
- Agents must be willing to **argue, debate, and push back** — not eagerly agree and move on.
- The pipeline is a set of **customizable rails** — agents stay on track, don't get sidetracked or derailed.
- Dynamic composition — only relevant specialists are called in per task.
- Hard gates between stages — mechanical checks + agent oversight for final decision.
- Minimal human oversight once the PRD is declared.

---

## The Pipeline

```
PRD.md (human declares requirements)
    │
    ▼
BRIEFS (PRD decomposed into discrete work units)
    │
    ▼
BOARD OF DIRECTORS (strategic go/no-go per brief)
    │  Static composition. CEO, CTO, CFO, COO.
    │  Output: Approved brief with business constraints, priority, budget
    │  Board does NOT select technical participants — that's the Brief Analyzer's job
    │  Gate: Board consensus required to proceed
    │  REJECTED → archive + notify human. NEEDS REVISION → back to Intake.
    │
    │  POST-RUN REVIEW: Board reviews memos from completed pipeline
    │  runs. Analyzes for conflicts, adjusts strategy, feeds learnings
    │  back into future briefs. The Board is not fire-and-forget.
    │
    ▼
BRIEF ANALYZER (technical composition)
    │  Sonnet agent analyzes approved brief + project context
    │  Selects which generalists/specialists participate in each planning stage
    │  Separates strategic decisions (Board) from technical composition
    │
    ▼
PLANNING 1 — Architecture (Domain Generalists)
    │  Dynamic composition based on brief requirements.
    │  Software Architect + relevant generalists only.
    │  Output: Architecture Decision Record (ADR)
    │  Agents MUST debate trade-offs. No rubber-stamping.
    │  Gate: ADR approved, all dissents resolved or recorded
    │
    ▼
PLANNING 2 — Implementation Design (Language/Domain Specialists)
    │  Dynamic composition — only languages/domains in the ADR.
    │  Output: Implementation spec per component
    │  Each specialist argues for their domain's best practices.
    │  Gate: All specs reviewed by Architecture, no conflicts
    │
    ▼
PLANNING 3 — Task Decomposition & Estimation
    │  Context Manager + Task Distributor
    │  Output: Task breakdown with dependency graph, estimates,
    │          context packets per worker, acceptance criteria
    │  Gate: Every task has one owner, one completion condition,
    │         estimated rounds, and explicit test criteria
    │
    ▼
CODING (Workers execute)
    │  Codex/Claude workers with specialist subagents loaded
    │  Each worker gets: context packet + implementation spec + acceptance criteria
    │  Workers stay in their lane — the rails prevent drift
    │  Gate: Code compiles, lints, passes unit tests
    │
    ▼
REVIEW (Specialist review)
    │  Code reviewer (evidence-driven, severity-ranked)
    │  Security auditor (attack paths, secrets, auth)
    │  Language specialist for the relevant language
    │  Gate: All findings addressed or explicitly accepted with rationale
    │
    ▼
REMEDIATE (if review finds issues)
    │  Worker fixes based on review findings
    │  Loops back to REVIEW
    │  Gate: Same as REVIEW — clean pass required
    │
    ▼
TEST (Integration + acceptance)
    │  QA Strategist validates against acceptance criteria from Planning 3
    │  Gate: All acceptance criteria pass, no regressions
    │
    ▼
DEPLOY
       Infrastructure Lead handles deployment
       Gate: Smoke tests pass in target environment
```

---

## Orchestration — Who Watches the Pipeline?

### The Orchestrator (Mosaic's role)

**Not me (Jarvis). Not any single agent. The Orchestrator is a dedicated, mechanical process with AI oversight.**

The Orchestrator is:

- **Primarily mechanical** — moves work through stages, enforces gates, tracks state
- **AI-assisted at decision points** — an agent reviews gate results and makes go/no-go calls
- **The thing Mosaic Stack productizes** — this IS the engine from the North Star vision

How it works:

1. **Stage Runner** (mechanical): Advances work through the pipeline. Checks gate conditions. Purely deterministic — "did all gate criteria pass? yes → advance. no → hold."
2. **Gate Reviewer** (AI agent): When a gate's mechanical checks pass, the Gate Reviewer does a final sanity check. "The code lints and tests pass, but does this actually solve the problem?" This is the lightweight oversight layer.
3. **Escalation** (to human): If the Gate Reviewer is uncertain, or if debate in a planning stage is unresolved after N rounds, escalate to Jason.

### What Sends a Plan Back for More Debate?

Triggers for **rework/rejection**:

- **Gate failure** — mechanical checks don't pass → automatic rework
- **Gate Reviewer dissent** — AI reviewer flags a concern → sent back with specific objection
- **Unresolved debate** — planning agents can't reach consensus after N rounds → escalate or send back with the dissenting positions documented
- **Scope creep detection** — if a stage's output significantly exceeds the brief's scope → flag and return
- **Dependency conflict** — Planning 3 finds the task breakdown has circular deps or impossible ordering → return to Planning 2
- **Review severity threshold** — if Review finds CRITICAL-severity issues → auto-reject back to Coding, no discussion

### Human Touchpoints (minimal by design)

- **PRD.md** — Human writes this. This is where you spend the time.
- **Board escalation** — Only if the Board can't reach consensus on a brief.
- **Planning escalation** — Only if debate is unresolved after max rounds.
- **Deploy approval** — Optional. Could be fully automated for low-risk deploys.

Everything else runs autonomously on rails.

---

## Gate System

Every gate has **mechanical checks** (automated, deterministic) and an **agent review** (final judgment call).

| Stage →                                | Mechanical Checks                                                 | Agent Review                                                                  |
| -------------------------------------- | ----------------------------------------------------------------- | ----------------------------------------------------------------------------- |
| **Board → Planning 1**                 | Brief exists, has success criteria, has budget                    | Gate Reviewer: "Is this brief well-scoped enough to architect?"               |
| **Planning 1 → Planning 2**            | ADR exists, covers all components in brief                        | Gate Reviewer: "Does this architecture actually solve the problem?"           |
| **Planning 2 → Planning 3**            | Implementation spec per component, no unresolved conflicts        | Gate Reviewer: "Are the specs consistent with each other and the ADR?"        |
| **Planning 3 → Coding**                | Task breakdown exists, all tasks have owner + criteria + estimate | Gate Reviewer: "Is this actually implementable as decomposed?"                |
| **Coding → Review**                    | Compiles, lints, unit tests pass                                  | Gate Reviewer: "Does the code match the implementation spec?"                 |
| **Review → Test** (or **→ Remediate**) | All review findings addressed                                     | Gate Reviewer: "Are the fixes real or did the worker just suppress warnings?" |
| **Test → Deploy**                      | All acceptance criteria pass, no regressions                      | Gate Reviewer: "Ready for production?"                                        |

---

## Dynamic Composition

### Board of Directors — STATIC

Always the same participants. These are strategic, not technical.

| Role | Model  | Personality                                                                                                                 |
| ---- | ------ | --------------------------------------------------------------------------------------------------------------------------- |
| CEO  | Opus   | Visionary, asks "does this serve the mission?"                                                                              |
| CTO  | Opus   | Technical realist, asks "can we actually build this?"                                                                       |
| CFO  | Sonnet | Cost-conscious, asks "what does this cost vs return?" — needs real analytical depth for budget/ROI, not a lightweight model |
| COO  | Sonnet | Operational, asks "what's the timeline and resource impact?"                                                                |

### Planning Stages — DYNAMIC

**The Orchestrator selects participants based on the brief's requirements.** Not every specialist is needed for every task.

Selection logic:

1. Parse the brief/ADR for **languages mentioned** → include those Language Specialists
2. Parse for **infrastructure concerns** → include Infra Lead, Docker/Swarm, CI/CD as needed
3. Parse for **data concerns** → include Data Architect, SQL Pro
4. Parse for **UI concerns** → include UX Strategist, Web Design, React/RN Specialist
5. Parse for **security concerns** → include Security Architect
6. **Always include:** Software Architect (Planning 1), QA Strategist (Planning 3)

Example: A TypeScript NestJS API endpoint with Prisma:

- Planning 1: Software Architect, Security Architect, Data Architect
- Planning 2: TypeScript Pro, NestJS Expert, SQL Pro
- Planning 3: Task Distributor, Context Manager

Example: A React dashboard with no backend changes:

- Planning 1: Software Architect, UX Strategist
- Planning 2: React Specialist, Web Design, UX/UI Design
- Planning 3: Task Distributor, Context Manager

**Go Pro doesn't sit in on a TypeScript project. Solidity Pro doesn't weigh in on a dashboard.**

---

## Debate Culture

Agents in planning stages are **required** to:

1. **State their position with reasoning** — no "sounds good to me"
2. **Challenge other positions** — "I disagree because..."
3. **Identify risks the others haven't raised** — adversarial by design
4. **Formally dissent if not convinced** — dissents are recorded in the ADR/spec
5. **Not capitulate just to move forward** — the Orchestrator tracks rounds and will call time, but agents shouldn't fold under social pressure

**Round limits:** Min 3, Max 30. The discussion must be allowed to properly work. Don't cut debate short — premature consensus produces bad architecture. The Orchestrator tracks rounds and will intervene only when debate is genuinely circular (repeating the same arguments) rather than still productive.

This is enforced via personality in the agent definitions:

- Architects are opinionated and will argue for clean boundaries
- Security Architect is paranoid by design — always looking for what can go wrong
- QA Strategist is skeptical — "prove it works, don't tell me it works"
- Language specialists are purists about their domain's best practices

**The goal:** By the time code is written, the hard decisions are already made and debated. The workers just execute a well-argued plan.

---

## Model Assignments

| Pipeline Stage              | Model                             | Rationale                                           |
| --------------------------- | --------------------------------- | --------------------------------------------------- |
| Board of Directors          | Opus (CEO/CTO) / Sonnet (CFO/COO) | Strategic deliberation needs depth across the board |
| Planning 1 (Architecture)   | Opus                              | Complex trade-offs, needs deep reasoning            |
| Planning 2 (Implementation) | Sonnet                            | Domain expertise, detailed specs                    |
| Planning 3 (Decomposition)  | Sonnet                            | Structured output, dependency analysis              |
| Coding                      | Codex                             | Primary workhorse, separate budget                  |
| Review                      | Sonnet (code) + Opus (security)   | Code review = Sonnet, security = Opus for depth     |
| Remediation                 | Codex                             | Same worker, fix the issues                         |
| Test                        | Haiku                             | Mechanical validation, low complexity               |
| Deploy                      | Haiku                             | Scripted deployment, mechanical                     |
| Gate Reviewer               | Sonnet                            | Judgment calls, moderate complexity                 |
| Orchestrator (mechanical)   | None — deterministic code         | State machine, not AI                               |

---

## Roster

### Board of Directors (static)

| Role | Scope                                     |
| ---- | ----------------------------------------- |
| CEO  | Vision, priorities, go/no-go              |
| CTO  | Technical direction, risk tolerance       |
| CFO  | Budget, cost/benefit                      |
| COO  | Operations, timeline, resource allocation |

### Domain Generalists (dynamic — called per brief)

| Role                    | Scope                                                         | Selected When                                                              |
| ----------------------- | ------------------------------------------------------------- | -------------------------------------------------------------------------- |
| **Software Architect**  | System design, component boundaries, data flow, API contracts | Always in Planning 1                                                       |
| **Security Architect**  | Threat modeling, auth patterns, secrets, OWASP                | **Always** — security is cross-cutting; implicit requirements are the norm |
| **Infrastructure Lead** | Deployment, networking, monitoring, scaling, DR               | Brief involves deploy, infra, scaling                                      |
| **Data Architect**      | Schema design, migrations, query strategy, caching            | Brief involves DB, data models, migrations                                 |
| **QA Strategist**       | Test strategy, coverage, integration test design              | Always in Planning 3                                                       |
| **UX Strategist**       | User flows, information architecture, accessibility           | Brief involves UI/frontend                                                 |

### Language Specialists (dynamic — one language, one agent)

| Specialist           | Selected When                              |
| -------------------- | ------------------------------------------ |
| **TypeScript Pro**   | Project uses TypeScript                    |
| **JavaScript Pro**   | Project uses vanilla JS / Node.js          |
| **Go Pro**           | Project uses Go                            |
| **Rust Pro**         | Project uses Rust                          |
| **Solidity Pro**     | Project involves smart contracts           |
| **Python Pro**       | Project uses Python                        |
| **SQL Pro**          | Project involves database queries / Prisma |
| **LangChain/AI Pro** | Project involves AI/ML/agent frameworks    |

### Domain Specialists (dynamic — cross-cutting expertise)

| Specialist           | Selected When                        |
| -------------------- | ------------------------------------ |
| **Web Design**       | Frontend work involving HTML/CSS     |
| **UX/UI Design**     | Component design, design system work |
| **React Specialist** | Frontend uses React                  |
| **React Native Pro** | Mobile app work                      |
| **Blockchain/DeFi**  | Chain interactions, DeFi protocols   |
| **Docker/Swarm**     | Containerization, deployment         |
| **CI/CD**            | Pipeline changes, deploy automation  |
| **NestJS Expert**    | Backend uses NestJS                  |

---

## Source Material — What to Pull From External Repos

### From VoltAgent/awesome-codex-subagents (`.toml` format)

| File                                               | What We Take                                                | What We Customize                                            |
| -------------------------------------------------- | ----------------------------------------------------------- | ------------------------------------------------------------ |
| `09-meta-orchestration/context-manager.toml`       | Context packaging for workers                               | Add our monorepo structure, Gitea CI, project conventions    |
| `09-meta-orchestration/task-distributor.toml`      | Dependency graphs, write-scope separation, output contracts | Add worktree rules, PR workflow, completion gates            |
| `09-meta-orchestration/workflow-orchestrator.toml` | Stage design with explicit wait points and gates            | Wire to our pipeline stages                                  |
| `09-meta-orchestration/agent-organizer.toml`       | Task decomposition by objective (not file list)             | Add our agent registry, model hierarchy rules                |
| `04-quality-security/reviewer.toml`                | Evidence-driven review, severity ranking                    | Add NestJS import rules, Prisma gotchas, our recurring bugs  |
| `04-quality-security/security-auditor.toml`        | Attack path mapping, secrets handling review                | Add our Docker Swarm patterns, credential loader conventions |

### From VoltAgent/awesome-openclaw-skills (ClawHub)

| Skill                      | What We Take                                          | How We Use It                                            |
| -------------------------- | ----------------------------------------------------- | -------------------------------------------------------- |
| `brainstorming-2`          | Socratic pre-coding design workflow                   | Planning 1 — requirements refinement before architecture |
| `agent-estimation`         | Task effort in tool-call rounds                       | Planning 3 — scope tasks before spawning workers         |
| `agent-nestjs-skills`      | 40 prioritized NestJS rules with code examples        | NestJS specialist + backend workers                      |
| `agent-team-orchestration` | Structured handoff protocols, task state transitions  | Reference for pipeline stage handoffs                    |
| `b3ehive`                  | Competitive implementation (3 agents, cross-evaluate) | Critical components: crypto strategies, auth flows       |
| `agent-council`            | Agent scaffolding automation                          | Automate specialist creation as we expand                |
| `astrai-code-review`       | Model routing by diff complexity                      | Review stage cost optimization                           |
| `bug-audit`                | 6-phase Node.js audit methodology                     | Periodic codebase health checks                          |

### From VoltAgent/awesome-claude-code-subagents (`.md` format)

| File                                       | What We Take                                    | Notes                                                  |
| ------------------------------------------ | ----------------------------------------------- | ------------------------------------------------------ |
| Language specialist `.md` files            | System prompts for TS, Go, Rust, Solidity, etc. | Strip generic stuff, inject project-specific knowledge |
| `09-meta-orchestration/agent-organizer.md` | Detailed organizer pattern                      | Reference — Codex `.toml` is tighter                   |

---

## Gaps This Fills

| Gap                             | Current State                           | After Pipeline                                                    |
| ------------------------------- | --------------------------------------- | ----------------------------------------------------------------- |
| No pre-coding design            | Brief → Codex starts coding immediately | 3 planning stages before anyone writes code                       |
| Agents get sidetracked/derailed | No rails, workers drift from task       | Mechanical pipeline + context packets keep workers on track       |
| No debate on approach           | First idea wins                         | Agents required to argue, dissent, challenge                      |
| No task estimation              | Eyeball everything                      | Tool-call-round estimation in Planning 3                          |
| Code review is a checkbox       | "Did it lint? Ship it."                 | Evidence-driven reviewer + specialist knowledge                   |
| Security review is hand-waved   | Never actually done                     | Real attack path mapping, secrets review                          |
| Workers get bad context         | Ad-hoc prompts, stale assumptions       | Context-manager produces execution-ready packets                  |
| Task decomposition is sloppy    | "Here's a task, go do it"               | Dependency graphs, write-scope separation, output contracts       |
| Wrong specialists involved      | Everyone weighs in on everything        | Dynamic composition — only relevant experts                       |
| No rework mechanism             | Ship it or start over                   | Explicit remediation loop with review re-check                    |
| Too much human oversight        | Jason babysits every stage              | Mechanical gates + AI oversight, human only at PRD and escalation |

---

## Implementation Plan

### Phase 1 — Foundation (this week)

1. Pull and customize Codex subagents: `reviewer.toml`, `security-auditor.toml`, `context-manager.toml`, `task-distributor.toml`, `workflow-orchestrator.toml`
2. Inject our project-specific knowledge
3. Install to `~/.codex/agents/`
4. Define agent personality templates for debate culture (opinionated, adversarial, skeptical)

### Phase 2 — Specialist Definitions (next week)

1. Create language specialist definitions (TS, JS, Go, Rust, Solidity, Python, SQL, LangChain, C++)
2. Create domain specialist definitions (NestJS, React, Docker/Swarm, CI/CD, Web Design, UX/UI, Blockchain/DeFi, React Native)
3. Create generalist definitions (Software Architect, Security Architect, Infra Lead, Data Architect, QA Strategist, UX Strategist)
4. Format as Codex `.toml` + OpenClaw skills
5. Test each against a real past task

### Phase 3 — Pipeline Wiring (week after)

1. Build the Orchestrator (mechanical stage runner + gate checker)
2. Build the Gate Reviewer agent
3. Wire dynamic composition (brief → participant selection)
4. Wire the debate protocol (round tracking, dissent recording, escalation rules)
5. Wire Planning 1 → 2 → 3 handoff contracts
6. Wire Review → Remediate → Review loop
7. Test end-to-end with a real feature request

### Phase 4 — Mosaic Integration (future)

1. The Orchestrator becomes a Mosaic Stack feature
2. Pipeline stages map to Mosaic task states
3. Gate results feed the Mission Control dashboard
4. This IS the engine — the dashboard is just the window

### Phase 5 — Advanced Patterns (future)

1. `b3ehive` competitive implementation for critical paths
2. `astrai-code-review` model routing for cost optimization
3. `agent-council` automated scaffolding for new specialists
4. Estimation feedback loop (compare estimates to actuals)
5. Pipeline analytics (which stages catch the most issues, where do we bottleneck)

---

## Resolved Decisions

| #   | Question                | Decision                                          | Rationale                                                                                                                                                                           |
| --- | ----------------------- | ------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| 1   | **Gate Reviewer model** | Sonnet for all gates                              | Sufficient depth for judgment calls; Opus reserved for planning deliberation                                                                                                        |
| 2   | **Debate rounds**       | Min 3, Max 30 per stage                           | Let discussions work. Don't cut short. Intervene on circular repetition, not round count.                                                                                           |
| 3   | **PRD format**          | Use existing Mosaic PRD template                  | `~/.config/mosaic/templates/docs/PRD.md.template` + `~/.config/mosaic/skills-local/prd/SKILL.md` already proven. Iterate from there.                                                |
| 4   | **Small tasks**         | Pipeline is for projects/features, not typo fixes | This is for getting a project or feature built smoothly. Single-file fixes go direct to a worker. Threshold: if it needs architecture decisions, it goes through the pipeline.      |
| 5   | **Specialist memory**   | Yes — specialists accumulate knowledge with rails | Similar to OpenClaw memory model. Specialists learn from past tasks ("last time X caused Y") but must maintain their specialty rails. Knowledge is domain-scoped, not freeform.     |
| 6   | **Cost ceiling**        | ~$500 per pipeline run (11+ stages)               | Using subs (Anthropic, OpenAI), so API costs are minimized or eliminated. Budget is time/throughput, not dollars.                                                                   |
| 7   | **Where this lives**    | Standalone service, Pi under the hood             | Must be standalone so it can migrate to Mosaic Stack in the future. Pi (mosaic bootstrap) provides the execution substrate. Already using Pi for BOD. Dogfood → prove → productize. |

## PRD Template

The pipeline uses the existing Mosaic PRD infrastructure:

- **Template:** `~/.config/mosaic/templates/docs/PRD.md.template`
- **Skill:** `~/.config/mosaic/skills-local/prd/SKILL.md` (guided PRD generation with clarifying questions)
- **Guide:** `~/.config/mosaic/guides/PRD.md` (hard rules — PRD must exist before coding begins)

### Required PRD Sections (from Mosaic guide)

1. Problem statement and objective
2. In-scope and out-of-scope
3. User/stakeholder requirements
4. Functional requirements
5. Non-functional requirements (security, performance, reliability, observability)
6. Acceptance criteria
7. Constraints and dependencies
8. Risks and open questions
9. Testing and verification expectations
10. Delivery/milestone intent

The PRD skill also generates user stories with specific acceptance criteria ("Button shows confirmation dialog before deleting" not "Works correctly").

**Key rule from Mosaic:** Implementation that diverges from PRD without PRD updates is a blocker. Change control: update PRD first → update plan → then implement.

## Board Post-Run Review

The Board of Directors is NOT fire-and-forget. After a pipeline run completes (deploy or failure):

1. **Memos from each stage** are compiled into a run summary
2. **Board reviews** the summary for:
   - Conflicts between stage outputs
   - Scope drift from original brief
   - Cost/timeline variance from estimates
   - Strategic alignment issues
3. **Board adjusts** strategy, priorities, or constraints for future briefs
4. **Learnings** feed back into specialist memory and Orchestrator heuristics

This closes the loop. The pipeline doesn't just ship code — it learns from every run.

## Architecture Review Fixes (v4, 2026-03-24)

Fixes applied based on Sonnet architecture review:

| Finding                                                                  | Fix Applied                                                                                                              |
| ------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------ |
| Dead-end states (REJECTED, NEEDS REVISION, CI failure, worker confusion) | All paths explicitly defined in orchestrator + Board stage                                                               |
| Security Architect conditional (keyword matching misses implicit auth)   | Security Architect now ALWAYS included in Planning 1                                                                     |
| Board making technical composition decisions                             | New Brief Analyzer agent handles technical composition after Board approval                                              |
| Orchestrator claimed "purely mechanical" but needs semantic analysis     | Split into State Machine (mechanical) + Gate Reviewer (AI). Circularity detection is Gate Reviewer's job.                |
| Test→Remediate had no loop limit                                         | Shared 3-loop budget across Review + Test remediation                                                                    |
| Open-ended debate (3-30 rounds) too loose, framing bias                  | Structured 3-phase debate: Independent positions → Responses → Synthesis. Tighter round limits (17-53 calls vs 12-120+). |
| Review only gets diff                                                    | Review now gets full module context + context packet, not just diff                                                      |
| Cross-brief dependency not enforced at runtime                           | State Machine enforces dependency ordering + file-level locking                                                          |
| Gate Reviewer reading full transcripts (context problem)                 | Gate Reviewer reads structured summaries, requests full transcript only on suspicion                                     |
| No minimum specialist composition for Planning 2                         | Guard added: at least 1 Language + 1 Domain specialist required                                                          |

## Remaining Open Questions

1. **Pi integration specifics:** How exactly does Pi serve as the execution substrate? Board sessions already work via `mosaic yolo pi`. Does the full pipeline run as a Pi orchestration, or does Pi just handle individual stage sessions?
2. **Specialist memory storage:** OpenBrain? Per-specialist markdown files? Scoped memory namespaces?
3. **Pipeline analytics:** What metrics do we track per run? Stage duration, rework count, gate failure rate, estimate accuracy?
4. **Parallel briefs:** Can multiple briefs from the same PRD run through the pipeline concurrently? Or strictly serial?
5. **Escalation UX:** When the pipeline escalates to Jason, where does that notification go? Discord? TUI? Both?

---

## Connection to Mosaic North Star

This pipeline IS the Mosaic vision, just running on agent infrastructure instead of a proper platform:

- **PRD.md** → Mosaic's task queue API
- **Orchestrator** → Mosaic's agent lifecycle management
- **Gates** → Mosaic's review gates
- **Pipeline stages** → Mosaic's workflow engine
- **Dynamic composition** → Mosaic's agent selection

Everything we build here gets dogfooded, refined, and eventually productized as Mosaic Stack features. We're building the engine that Mosaic will sell.

### Standalone Architecture (decided)

The pipeline is built as a **standalone service** — not embedded in OpenClaw or tightly coupled to any single agent framework. This is deliberate:

1. **Pi (mosaic bootstrap) is the execution substrate** — already proven with BOD sessions
2. **The Orchestrator is a mechanical state machine** — it doesn't need an LLM, it needs a process manager
3. **Stage sessions are Pi/agent sessions** — each planning/review stage spawns a session with the right participants
4. **Migration path to Mosaic Stack is clean** — standalone service → Mosaic feature, not "rip out of OpenClaw"

The pattern: dogfood on our projects → track what works → extract into Mosaic Stack as a first-class feature.

---

## References

- VoltAgent/awesome-codex-subagents: https://github.com/VoltAgent/awesome-codex-subagents
- VoltAgent/awesome-claude-code-subagents: https://github.com/VoltAgent/awesome-claude-code-subagents
- VoltAgent/awesome-openclaw-skills: https://github.com/VoltAgent/awesome-openclaw-skills
- Board implementation: `mosaic/board` branch (commit ad4304b)
- Mosaic North Star: `~/.openclaw/workspace/memory/mosaic-north-star.md`
- Existing agent registry: `~/.openclaw/workspace/agents/REGISTRY.yaml`
- Mosaic Queue PRD: `~/src/jarvis-brain/docs/planning/MOSAIC-QUEUE-PRD.md`

---

## Brief Classification System (skip-BOD support)

**Added:** 2026-03-26

Not every brief needs full Board of Directors review. The classification system lets briefs skip stages based on their nature.

### Classes

| Class       | Pipeline                      | Use case                                                             |
| ----------- | ----------------------------- | -------------------------------------------------------------------- |
| `strategic` | BOD → BA → Planning 1 → 2 → 3 | New features, architecture, integrations, security, budget decisions |
| `technical` | BA → Planning 1 → 2 → 3       | Refactors, bugfixes, UI tweaks, style changes                        |
| `hotfix`    | Planning 1 → 2 → 3            | Urgent patches — skip both BOD and BA                                |

### Classification priority (highest wins)

1. `--class` CLI flag on `forge run` or `forge resume`
2. YAML frontmatter `class:` field in the brief
3. Auto-classification via keyword analysis

### Auto-classification keywords

- **Strategic:** security, pricing, architecture, integration, budget, strategy, compliance, migration, partnership, launch
- **Technical:** bugfix, bug, refactor, ui, style, tweak, typo, lint, cleanup, rename, hotfix, patch, css, format
- **Default** (no keyword match): strategic (conservative — full pipeline)

### Overrides

- `--force-board` — forces BOD stage to run even for technical/hotfix briefs
- `--class` on `resume` — re-classifies a run mid-flight (stages already passed are not re-run)

### Backward compatibility

Existing briefs without a `class` field are auto-classified. The default (no matching keywords) is `strategic`, so all existing runs get the full pipeline unless keywords trigger `technical`.