# PRD — Fleet Phase 2: Operator Observability > **Workstream:** W-FLEET under `mvp-20260312` · **Phase:** 2 > **North star:** [docs/fleet/north-star.md](./north-star.md) > **Source umbrella PRD:** [docs/PRD.md](../PRD.md) (Mosaic Stack v0.1.0) > **Tracks task:** `fleet-observability-1` — restore operator observability into fleet agent sessions. ## Problem The durable tmux fleet runs on the isolated `mosaic-factory` socket. That isolation (which protects the operator's default tmux) makes the fleet **invisible** to default tooling, and truth is split across three planes no single command joins — systemd (`systemctl --user`), tmux (`-L mosaic-factory`), and the process tree (`pstree`). `agent tail` (`capture-pane`) returns **blank for full-screen TUIs**, and `agent send` confirms only keystroke injection, not acceptance. Net: the operator has near-zero observability and no safe way to watch a session. ## Goals 1. One command shows the **whole fleet's** real state, joining all three planes. 2. **Liveness is truthful**: healthy = answered a heartbeat, not "pane alive". 3. The operator can **watch** any session read-only without disrupting it. 4. `send` reports **delivered-and-accepted**, not just injected. 5. Every record/address carries **`tenant_id` + `host`** (zero foreclosure for multi-tenant/multi-host). ## Non-goals (this phase) - No webUI (Phase 5; rides federation for cross-host). - No `fleetd` daemon or persistent history store. - No real-runtime swap (Phase 3) — instrument the live **dogfood stub** fleet. - No cross-host aggregation yet (addressing is host-tagged but queries stay local). ## Functional requirements | ID | Requirement | | ---- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | FR-1 | `mosaic fleet ps [--json]` prints one row per roster agent joining: name · tenant · host · runtime · systemd(active/enabled) · pane(alive/dead) · pid · idle · **last-heartbeat age** · **drift** flag (roster runtime ≠ actual pane command) · **boot-enable** warning (active but `UnitFileState=disabled`). | | FR-2 | **Heartbeat protocol v1** (see below); `dogfood-agent.py` implements the responder. `fleet ps` issues probes (or reads last-seen) and reports health per FR-1. | | FR-3 | `mosaic agent watch ` opens a **read-only** view of the pane (grouped session or `tmux attach -r`) that cannot send keystrokes and does not shrink the agent's window. | | FR-4 | `mosaic agent attach ` remains the **explicit** interactive-takeover path (separate verb, documented as the only one that can type). | | FR-5 | `mosaic agent send --verify` confirms the message was **accepted** (not left as an unsubmitted draft) and returns non-zero if delivery cannot be verified. | | FR-6 | All structured output (`--json`) includes `tenant_id` and `host` fields. | ## Heartbeat protocol v1 - **Probe:** operator/`fleet ps` writes a sentinel line to the agent's input or a well-known per-agent heartbeat file path `~/.config/mosaic/fleet/run/.hb`. - **Response:** the runtime updates `.hb` with `ts= pid= status=` on a fixed interval (default 15s) and on demand when probed. - **Health rule:** `healthy` if `now - ts <= 3 × interval`; else `stale`; missing file = `unknown`. - **Contract:** every runtime (dogfood stub now; claude/codex/pi/opencode in Phase 3) MUST emit the heartbeat. The protocol is file-based so it works for headless stubs and full-screen TUIs alike (no `capture-pane` dependency). - `ASSUMPTION:` file-based heartbeat (vs in-pane echo) — chosen because it is TUI-safe and uid-scoped, fitting per-tenant isolation. Open to an OTEL-span variant in Phase 3 (MVP-X6). ## Acceptance criteria - `mosaic fleet ps` shows all 5 live sessions on `mosaic-factory` with correct pane/pid/idle and flags the dogfood **drift** (`canary-pi` runtime=pi but pane runs `dogfood-agent.py`) and the **boot-enable** gap (active but disabled). - Killing one agent's pane flips its row to dead/stale within one `interval`. - `agent watch` shows live output and provably cannot type into the pane; detaching leaves the agent's window size unchanged. - `agent send --verify` returns success on an accepting pane and non-zero on a wedged/draft pane. - Quality gates green: `pnpm typecheck`, `pnpm lint`, `pnpm format:check`, plus `pnpm --filter @mosaicstack/mosaic test`. - Independent review passed; dogfood evidence captured against the live fleet. ## Test plan - Unit/CLI specs in `packages/mosaic/src/commands/fleet.spec.ts` (and a new `fleet-ps`/`watch`/`send-verify` spec) using the injected `CommandRunner` to assert exact tmux/systemd command construction and JSON shape (tenant+host present). - Situational: run against the live `mosaic-factory` fleet; capture `fleet ps` output, a kill-and-detect cycle, a read-only `watch`, and a `send --verify` pass/fail pair. ## Surfaces & parity (MVP-X1) CLI lands this phase. TUI surface follows in the `packages/mosaic` wizard; webUI in Phase 5 via federation. PRD records the parity debt explicitly so it is not lost.