docs: add mission control and coordination resilience docs
This commit is contained in:
200
docs/mission-control/PRD.md
Normal file
200
docs/mission-control/PRD.md
Normal file
@@ -0,0 +1,200 @@
|
||||
# PRD: Mosaic Mission Control Plane
|
||||
|
||||
## Metadata
|
||||
|
||||
- **Owner:** Jason Woltje
|
||||
- **Date:** 2026-05-06
|
||||
- **Status:** draft
|
||||
- **Framework:** Mosaic PRDy + coord + Kanban
|
||||
- **Target Repo:** `git.mosaicstack.dev/mosaic/mosaic-stack`
|
||||
- **Primary Modules:** `packages/prdy`, `packages/coord`, `packages/queue`, `apps/gateway`, `packages/brain`, `packages/cli`
|
||||
|
||||
---
|
||||
|
||||
## Problem Statement
|
||||
|
||||
Mosaic already has the ingredients for durable agent work: PRD generation (`prdy`), mission coordination (`coord`), and task execution boards (`Kanban` / `TASKS.md`). Today those systems can still drift apart:
|
||||
|
||||
- A PRD can exist without a mission record.
|
||||
- A mission can exist without a machine-readable execution board.
|
||||
- Agents can short-cycle or compact repeatedly without a durable handoff.
|
||||
- The next session may know the goal, but not the exact next step.
|
||||
|
||||
The result is brittle overnight autonomy: work continues only as long as a single session remains healthy.
|
||||
|
||||
This feature unifies those layers into one durable workflow so a mission can survive session rotation, compaction, and restarts with minimal state loss.
|
||||
|
||||
---
|
||||
|
||||
## Goals
|
||||
|
||||
1. Create one canonical pipeline from idea → PRD → mission → board → execution.
|
||||
2. Let `prdy` generate a PRD that is immediately usable as a mission input.
|
||||
3. Let `coord` own mission state, handoffs, and session rotation.
|
||||
4. Let the board hold atomized tasks with dependencies and assignees.
|
||||
5. Let agents read the mission and board to learn the next action without extra prompting.
|
||||
6. Detect short-cycling and rotate sessions before quality degrades.
|
||||
7. Preserve useful context across handoffs with a structured summary packet.
|
||||
8. Give operators a single place to see mission status, task state, and the current session.
|
||||
|
||||
---
|
||||
|
||||
## Non-Goals
|
||||
|
||||
1. Replacing the Mosaic agent runtime or gateway architecture.
|
||||
2. Rewriting `prdy` or `coord` from scratch.
|
||||
3. Turning the board into a general project-management system.
|
||||
4. Building a full Gantt/charting product.
|
||||
5. Removing human review or approval gates.
|
||||
6. Allowing agents to create arbitrary mission state without schema.
|
||||
|
||||
---
|
||||
|
||||
## User Stories
|
||||
|
||||
### US-001: Create a mission from a feature idea
|
||||
|
||||
**Description:** As an orchestrator, I want to turn a feature idea into a PRD and mission so that agents can work from a durable spec instead of a chat transcript.
|
||||
|
||||
**Acceptance Criteria:**
|
||||
- [ ] `prdy` can emit a PRD with goals, non-goals, and requirements.
|
||||
- [ ] The PRD is linked to a mission ID.
|
||||
- [ ] The mission manifest references the PRD path.
|
||||
- [ ] The mission is readable by downstream agent sessions.
|
||||
|
||||
### US-002: Atomize work into a board
|
||||
|
||||
**Description:** As an orchestrator, I want to split a PRD into board tasks so that work can be assigned to specialists.
|
||||
|
||||
**Acceptance Criteria:**
|
||||
- [ ] Each user story can become one or more tasks.
|
||||
- [ ] Tasks have assignees, dependencies, and estimates.
|
||||
- [ ] Tasks are machine-readable and durable.
|
||||
- [ ] The board can be regenerated from the PRD without ambiguity.
|
||||
|
||||
### US-003: Rotate sessions without losing the mission
|
||||
|
||||
**Description:** As a coordinator, I want to restart or rotate a session when it short-cycles so that the mission continues with minimal loss.
|
||||
|
||||
**Acceptance Criteria:**
|
||||
- [ ] The coordinator detects compaction pressure or repeated loops.
|
||||
- [ ] The coordinator writes a handoff summary before rotation.
|
||||
- [ ] A new session can resume from the handoff packet.
|
||||
- [ ] The mission state remains intact across the rotation.
|
||||
|
||||
### US-004: Let workers read the next step automatically
|
||||
|
||||
**Description:** As a worker agent, I want to read the mission and board at startup so I can do the next useful thing without waiting for a human prompt.
|
||||
|
||||
**Acceptance Criteria:**
|
||||
- [ ] Startup loads the active mission manifest.
|
||||
- [ ] Startup loads the current board/task row.
|
||||
- [ ] Startup exposes the next action clearly in the prompt.
|
||||
- [ ] The agent can continue after compaction using the same mission context.
|
||||
|
||||
### US-005: Observe mission health from one place
|
||||
|
||||
**Description:** As an operator, I want a single view of mission health so that I can see progress, blocked tasks, and session churn.
|
||||
|
||||
**Acceptance Criteria:**
|
||||
- [ ] Mission state shows current phase and progress.
|
||||
- [ ] Board state shows task status by assignee.
|
||||
- [ ] Short-cycle/rotation events are visible.
|
||||
- [ ] Handoffs are inspectable.
|
||||
|
||||
---
|
||||
|
||||
## Functional Requirements
|
||||
|
||||
FR-1. The system must represent a mission as a durable object with an ID, goal, current phase, PRD path, board path, and active session ID.
|
||||
|
||||
FR-2. The system must represent a PRD as a markdown document with goals, user stories, functional requirements, non-goals, technical considerations, and success metrics.
|
||||
|
||||
FR-3. The system must represent execution work as a board of atomized tasks with status, assignee, dependency, and estimate fields.
|
||||
|
||||
FR-4. The coordinator must be able to derive a task board from a PRD.
|
||||
|
||||
FR-5. The coordinator must be able to write a handoff packet that includes goal, current state, completed work, blocked work, next steps, and constraints.
|
||||
|
||||
FR-6. The coordinator must detect short-cycling signals such as repeated compactions, repeated tool loops, repeated approval prompts, or no progress across several turns.
|
||||
|
||||
FR-7. The coordinator must rotate the session when the short-cycle threshold is exceeded.
|
||||
|
||||
FR-8. The coordinator must preserve mission continuity across session rotation.
|
||||
|
||||
FR-9. The worker session must read the mission state and board state at startup.
|
||||
|
||||
FR-10. The worker session must be able to resume from the last handoff summary without the operator rewriting the goal manually.
|
||||
|
||||
FR-11. The operator must be able to inspect the mission state, PRD, board, and latest handoff from one place.
|
||||
|
||||
FR-12. The mission system must keep a traceable link between PRD requirements and board tasks.
|
||||
|
||||
FR-13. The system must not allow a task to become active without a valid mission context.
|
||||
|
||||
FR-14. The system must keep durable history for rotation and handoff events.
|
||||
|
||||
---
|
||||
|
||||
## Board Discussion: Features and Needs
|
||||
|
||||
This is the feature discussion board that should drive the mission design.
|
||||
|
||||
| Card | Need | Why it matters | Proposed decision |
|
||||
| --- | --- | --- | --- |
|
||||
| Canonical mission record | One source of truth for goal/state | Prevents drift between chat, docs, and queue | Make mission manifest the durable root object |
|
||||
| PRD → board derivation | Break feature ideas into executable work | Lets the plan be assigned and tracked | Keep PRD as the spec, generate board tasks from user stories |
|
||||
| Session watchdog | Detect churn/short-cycling | Keeps overnight runs productive | Add short-cycle scoring and forced rotation |
|
||||
| Structured handoff | Preserve context across session changes | Minimizes restart loss | Use a compact JSON/MD handoff packet |
|
||||
| Worker auto-read | Let agents resume without human re-prompting | Reduces operator overhead | Load mission + board on session start |
|
||||
| Status surface | Show progress and blockers clearly | Operators need confidence | Expose mission state via CLI and dashboard |
|
||||
| Review gate | Keep quality high on autonomous work | Prevents silent regressions | Require review tasks before close |
|
||||
| Recoverability | Resume after failure or restart | Mission should outlive a process | Persist session and handoff history |
|
||||
|
||||
---
|
||||
|
||||
## Design Considerations
|
||||
|
||||
1. The PRD should stay human-readable markdown, because the board and mission references need to be reviewable in git.
|
||||
2. The board should be machine-readable enough for automation but still readable by humans.
|
||||
3. The mission manifest should point to the PRD and board, not duplicate them.
|
||||
4. Handoff packets should be compact and structured so they can be injected into a new session with minimal token cost.
|
||||
5. The coordinator should prefer rotation over forced context growth once the session is near the compaction threshold.
|
||||
6. Existing Mosaic commands should be extended, not replaced, wherever possible.
|
||||
7. The same mission should be resumable across CLI, gateway, and remote channels.
|
||||
|
||||
---
|
||||
|
||||
## Technical Considerations
|
||||
|
||||
- Likely storage split:
|
||||
- PRD/board/manifest in git-backed docs
|
||||
- mission/session state in the Mosaic data layer
|
||||
- runtime health in queue/session state
|
||||
- Worktrees and long-lived agent working directories should live under `/src/<repo>-worktrees` rather than `/tmp` so they sit on the larger persistent drive and survive longer-running missions.
|
||||
- The coordinator needs a stable session identity, even if the active session changes.
|
||||
- Task dependencies must be enforced so workers do not start early.
|
||||
- The handoff packet should include the top 3 immediate actions and the strongest constraints.
|
||||
- Rotation triggers should be configurable per profile or per mission.
|
||||
- The initial version can be file-first, with dashboard sync added later.
|
||||
|
||||
---
|
||||
|
||||
## Success Metrics
|
||||
|
||||
- A mission can rotate sessions without losing the active goal.
|
||||
- A new session can resume from the latest handoff in under one turn.
|
||||
- Board tasks remain aligned to PRD user stories.
|
||||
- Short-cycling sessions are replaced before repeated compaction harms quality.
|
||||
- Operators can find mission state without spelunking across multiple chat logs.
|
||||
|
||||
---
|
||||
|
||||
## Open Questions
|
||||
|
||||
1. What should the canonical mission ID format be?
|
||||
2. Should the board live only in git, or also in the database?
|
||||
3. Should rotation be automatic by default, or opt-in per mission?
|
||||
4. What should the short-cycle threshold be initially?
|
||||
5. Should handoffs be pure text, structured JSON, or both?
|
||||
6. Which CLI command should be the primary mission entrypoint: `mosaic mission`, `mosaic coord`, or `mosaic prdy`?
|
||||
Reference in New Issue
Block a user