diff --git a/docs/fleet/PRD.md b/docs/fleet/PRD.md index e560e65..2095a50 100644 --- a/docs/fleet/PRD.md +++ b/docs/fleet/PRD.md @@ -32,14 +32,14 @@ observability and no safe way to watch a session. ## Functional requirements -| ID | Requirement | -|---|---| +| ID | Requirement | +| ---- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | FR-1 | `mosaic fleet ps [--json]` prints one row per roster agent joining: name · tenant · host · runtime · systemd(active/enabled) · pane(alive/dead) · pid · idle · **last-heartbeat age** · **drift** flag (roster runtime ≠ actual pane command) · **boot-enable** warning (active but `UnitFileState=disabled`). | -| FR-2 | **Heartbeat protocol v1** (see below); `dogfood-agent.py` implements the responder. `fleet ps` issues probes (or reads last-seen) and reports health per FR-1. | -| FR-3 | `mosaic agent watch ` opens a **read-only** view of the pane (grouped session or `tmux attach -r`) that cannot send keystrokes and does not shrink the agent's window. | -| FR-4 | `mosaic agent attach ` remains the **explicit** interactive-takeover path (separate verb, documented as the only one that can type). | -| FR-5 | `mosaic agent send --verify` confirms the message was **accepted** (not left as an unsubmitted draft) and returns non-zero if delivery cannot be verified. | -| FR-6 | All structured output (`--json`) includes `tenant_id` and `host` fields. | +| FR-2 | **Heartbeat protocol v1** (see below); `dogfood-agent.py` implements the responder. `fleet ps` issues probes (or reads last-seen) and reports health per FR-1. | +| FR-3 | `mosaic agent watch ` opens a **read-only** view of the pane (grouped session or `tmux attach -r`) that cannot send keystrokes and does not shrink the agent's window. | +| FR-4 | `mosaic agent attach ` remains the **explicit** interactive-takeover path (separate verb, documented as the only one that can type). | +| FR-5 | `mosaic agent send --verify` confirms the message was **accepted** (not left as an unsubmitted draft) and returns non-zero if delivery cannot be verified. | +| FR-6 | All structured output (`--json`) includes `tenant_id` and `host` fields. | ## Heartbeat protocol v1 diff --git a/docs/fleet/north-star.md b/docs/fleet/north-star.md index d1cef27..22b6857 100644 --- a/docs/fleet/north-star.md +++ b/docs/fleet/north-star.md @@ -22,11 +22,11 @@ workstream makes it an official, observable, multi-tenant Mosaic Stack capabilit The Fleet has a **dual role**, and that is the point: - **As product** — a multi-tenant agent-fleet capability of Mosaic Stack (this workstream). -- **As means of production** — the orchestrator/worker fleet that *actually builds the - entire MVP* (federation W1, webUI, TUI, CLI, and the Fleet itself). +- **As means of production** — the orchestrator/worker fleet that _actually builds the + entire MVP_ (federation W1, webUI, TUI, CLI, and the Fleet itself). We are **building the system that builds the system.** Every other MVP workstream is -delivered *by* the fleet, so fleet observability and control are not merely product +delivered _by_ the fleet, so fleet observability and control are not merely product features — they are the **operational floor of the whole delivery effort**. If we cannot see and steer the agents, we cannot trust what they ship. This is why Phase 2 (observability) leads: it is the instrument panel for the factory, dogfooded on the live @@ -41,35 +41,35 @@ those gates. The Fleet inherits — does not re-invent — the MVP's hard requirements: -| MVP req | What it means for the Fleet | -|---|---| -| MVP-X1 three-surface parity | fleet observability/control reachable via **CLI + TUI + webUI** (CLI first; webUI is required for parity, not optional) | -| MVP-X2 multi-tenant isolation | one tenant = one **Linux uid** (own `systemd --user`, socket, `~/.config/mosaic`); no cross-tenant leakage | -| MVP-X3 auth (BetterAuth/SSO) | operator→fleet and cross-host views are auth-gated through the platform's existing auth | -| MVP-X4 quality gates | `pnpm typecheck`/`lint`/`format:check` green before any push | -| MVP-X5 federated topology | cross-host fleet visibility rides the **federation** boundary (W1), not a bespoke broker | -| MVP-X6 OTEL tracing | heartbeats, sends, and lifecycle events emit spans; `traceparent` crosses the federation boundary | -| MVP-X7 trunk merge | branch from `main`, squash-merge via PR, never push to `main` | +| MVP req | What it means for the Fleet | +| ----------------------------- | ----------------------------------------------------------------------------------------------------------------------- | +| MVP-X1 three-surface parity | fleet observability/control reachable via **CLI + TUI + webUI** (CLI first; webUI is required for parity, not optional) | +| MVP-X2 multi-tenant isolation | one tenant = one **Linux uid** (own `systemd --user`, socket, `~/.config/mosaic`); no cross-tenant leakage | +| MVP-X3 auth (BetterAuth/SSO) | operator→fleet and cross-host views are auth-gated through the platform's existing auth | +| MVP-X4 quality gates | `pnpm typecheck`/`lint`/`format:check` green before any push | +| MVP-X5 federated topology | cross-host fleet visibility rides the **federation** boundary (W1), not a bespoke broker | +| MVP-X6 OTEL tracing | heartbeats, sends, and lifecycle events emit spans; `traceparent` crosses the federation boundary | +| MVP-X7 trunk merge | branch from `main`, squash-merge via PR, never push to `main` | ## The stack — where every concern lives One **definition** is the source of truth; the **session** is how it runs. -| Layer | Owner | Phase-2 reality | Destination | -|---|---|---|---| -| **Definition + identity + auth** | gateway / `mosaic-as` (scoped tokens, #541) | `roster.yaml` (tenant-tagged) | one definition; `mosaic agent --new` materializes it | -| **Tenancy boundary** | **Linux uid per tenant** (linger, own `systemd --user`, own socket, own `~/.config/mosaic`) | one tenant: `jarvis` = tenant zero | uid-per-tenant; federation aggregates across hosts | -| **Runtime** | per-tenant tmux session on isolated socket | dogfood stub sessions (live now on `mosaic-factory`) | claude/codex/pi/opencode TUIs | -| **Liveness** | **heartbeat protocol** every runtime answers | protocol defined + dogfood stub answers it | all runtimes answer; "healthy" ≠ "pane alive" | -| **Observation** | read-only `watch` (native tmux) + `pipe-pane` stream | CLI `watch`/`ps`; explicit opt-in `attach` for control | + auth-gated webUI streams | -| **Control plane** | **federation** across hosts × tenants | records already carry `tenant_id` + `host` | federated gateways expose fleet state; webUI in Phase 5 | +| Layer | Owner | Phase-2 reality | Destination | +| -------------------------------- | ------------------------------------------------------------------------------------------- | ------------------------------------------------------ | ------------------------------------------------------- | +| **Definition + identity + auth** | gateway / `mosaic-as` (scoped tokens, #541) | `roster.yaml` (tenant-tagged) | one definition; `mosaic agent --new` materializes it | +| **Tenancy boundary** | **Linux uid per tenant** (linger, own `systemd --user`, own socket, own `~/.config/mosaic`) | one tenant: `jarvis` = tenant zero | uid-per-tenant; federation aggregates across hosts | +| **Runtime** | per-tenant tmux session on isolated socket | dogfood stub sessions (live now on `mosaic-factory`) | claude/codex/pi/opencode TUIs | +| **Liveness** | **heartbeat protocol** every runtime answers | protocol defined + dogfood stub answers it | all runtimes answer; "healthy" ≠ "pane alive" | +| **Observation** | read-only `watch` (native tmux) + `pipe-pane` stream | CLI `watch`/`ps`; explicit opt-in `attach` for control | + auth-gated webUI streams | +| **Control plane** | **federation** across hosts × tenants | records already carry `tenant_id` + `host` | federated gateways expose fleet state; webUI in Phase 5 | ## Operating model (inherited, not reinvented) The AI-guide law stands: one accountable **orchestrator**, isolated **workers** that stop at PR-open, the serialized **gate chain** (independent review → green CI → diff-sanity → squash-merge → verify), **decide-and-inform** cadence, and a durable -**board** so missions survive session death. The Fleet is the infrastructure *under* +**board** so missions survive session death. The Fleet is the infrastructure _under_ this model. See `mosaicstack-aiguide` whitepapers 01 (inter-agent comms) and 03 (orchestration model) for the rationale. @@ -84,12 +84,12 @@ Every artifact, starting Phase 2, MUST: ## Observation model -| Verb | Behavior | -|---|---| -| `mosaic fleet ps` | one table joining systemd + tmux + process + idle + last-heartbeat, with drift + boot-enable flags | -| `mosaic agent watch ` | **read-only** join (grouped session / `-r`), no resize tyranny, no keystrokes | -| `mosaic agent attach ` | explicit interactive takeover (the only path that can type) | -| `mosaic agent send --verify` | confirms message **accepted**, not merely keystroke-injected | +| Verb | Behavior | +| ----------------------------------- | -------------------------------------------------------------------------------------------------- | +| `mosaic fleet ps` | one table joining systemd + tmux + process + idle + last-heartbeat, with drift + boot-enable flags | +| `mosaic agent watch ` | **read-only** join (grouped session / `-r`), no resize tyranny, no keystrokes | +| `mosaic agent attach ` | explicit interactive takeover (the only path that can type) | +| `mosaic agent send --verify` | confirms message **accepted**, not merely keystroke-injected | > Why the current PoC blocks observation: sessions live on the isolated `mosaic-factory` > socket (invisible to default `tmux ls`), the only sanctioned read is `capture-pane` @@ -98,13 +98,13 @@ Every artifact, starting Phase 2, MUST: ## Phased roadmap -| Phase | Outcome | Status | -|---|---|---| -| 0–1 | tmux PoC, hardening, published CLI v0.0.34 (#565–#568) | ✅ done | -| **2 — Observability** | `fleet ps` (host+tenant aware join), heartbeat protocol + dogfood stub answers it, `agent watch` (read-only), `agent send --verify` receipts | ▶ now | -| 3 — Real runtimes | claude/codex/pi/opencode answer heartbeat; **hybrid lifecycle** (core always-on: orchestrator+reviewer; ephemeral workers per lane) | planned | -| 4 — Unified definition | one agent schema in gateway; `mosaic agent --new` → materialized per-tenant session; uid-tenant provisioning | planned | -| 5 — Control plane | federation-backed cross-host × cross-tenant fleet view; **webUI** (surface chosen then) for MVP-X1 parity | planned | +| Phase | Outcome | Status | +| ---------------------- | -------------------------------------------------------------------------------------------------------------------------------------------- | ------- | +| 0–1 | tmux PoC, hardening, published CLI v0.0.34 (#565–#568) | ✅ done | +| **2 — Observability** | `fleet ps` (host+tenant aware join), heartbeat protocol + dogfood stub answers it, `agent watch` (read-only), `agent send --verify` receipts | ▶ now | +| 3 — Real runtimes | claude/codex/pi/opencode answer heartbeat; **hybrid lifecycle** (core always-on: orchestrator+reviewer; ephemeral workers per lane) | planned | +| 4 — Unified definition | one agent schema in gateway; `mosaic agent --new` → materialized per-tenant session; uid-tenant provisioning | planned | +| 5 — Control plane | federation-backed cross-host × cross-tenant fleet view; **webUI** (surface chosen then) for MVP-X1 parity | planned | ## Decisions of record (2026-06-20, with Jason)