Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01RMoEx7hfdFGjUiCHuN1RRi
129 lines
10 KiB
Markdown
129 lines
10 KiB
Markdown
# Mosaic Fleet — North Star
|
||
|
||
> **Workstream:** W-FLEET (Fleet) under mission `mvp-20260312`
|
||
> **Umbrella:** [docs/MISSION-MANIFEST.md](../MISSION-MANIFEST.md) · [docs/PRD.md](../PRD.md) (Mosaic Stack v0.1.0)
|
||
> **Status:** doctrine — authored 2026-06-20. Owner of this file: Fleet workstream lead.
|
||
> This document does **not** modify the MVP rollup; a rollup row is proposed, not written here.
|
||
|
||
## Vision
|
||
|
||
A **customizable, multi-tenant fleet of always-on AI agents** — each defined by role,
|
||
materialized as a durable, joinable runtime session, coordinated by the proven
|
||
orchestrator/worker model, and observable end-to-end across hosts. Coding today;
|
||
finance, analytics, research as roster entries tomorrow — same primitives, different
|
||
roster. The fleet is the **agent-session execution layer** of the Mosaic Stack MVP:
|
||
the thing federation makes reachable across hosts and the webUI/TUI/CLI make visible.
|
||
|
||
The USC tmux PoC (durable sessions + `agent-send` comms) proved the model. This
|
||
workstream makes it an official, observable, multi-tenant Mosaic Stack capability.
|
||
|
||
## The Fleet as means of production (bootstrapping)
|
||
|
||
The Fleet has a **dual role**, and that is the point:
|
||
|
||
- **As product** — a multi-tenant agent-fleet capability of Mosaic Stack (this workstream).
|
||
- **As means of production** — the orchestrator/worker fleet that _actually builds the
|
||
entire MVP_ (federation W1, webUI, TUI, CLI, and the Fleet itself).
|
||
|
||
We are **building the system that builds the system.** Every other MVP workstream is
|
||
delivered _by_ the fleet, so fleet observability and control are not merely product
|
||
features — they are the **operational floor of the whole delivery effort**. If we cannot
|
||
see and steer the agents, we cannot trust what they ship. This is why Phase 2
|
||
(observability) leads: it is the instrument panel for the factory, dogfooded on the live
|
||
fleet that is, recursively, building Mosaic Stack.
|
||
|
||
The discipline that makes great power safe is the same gate chain the fleet enforces:
|
||
independent review before merge, green CI, honest completion, decide-and-inform cadence,
|
||
and no irreversible action without authority. The bootstrap is only as trustworthy as
|
||
those gates.
|
||
|
||
## Alignment with MVP cross-cutting requirements
|
||
|
||
The Fleet inherits — does not re-invent — the MVP's hard requirements:
|
||
|
||
| MVP req | What it means for the Fleet |
|
||
| ----------------------------- | ----------------------------------------------------------------------------------------------------------------------- |
|
||
| MVP-X1 three-surface parity | fleet observability/control reachable via **CLI + TUI + webUI** (CLI first; webUI is required for parity, not optional) |
|
||
| MVP-X2 multi-tenant isolation | one tenant = one **Linux uid** (own `systemd --user`, socket, `~/.config/mosaic`); no cross-tenant leakage |
|
||
| MVP-X3 auth (BetterAuth/SSO) | operator→fleet and cross-host views are auth-gated through the platform's existing auth |
|
||
| MVP-X4 quality gates | `pnpm typecheck`/`lint`/`format:check` green before any push |
|
||
| MVP-X5 federated topology | cross-host fleet visibility rides the **federation** boundary (W1), not a bespoke broker |
|
||
| MVP-X6 OTEL tracing | heartbeats, sends, and lifecycle events emit spans; `traceparent` crosses the federation boundary |
|
||
| MVP-X7 trunk merge | branch from `main`, squash-merge via PR, never push to `main` |
|
||
|
||
## The stack — where every concern lives
|
||
|
||
One **definition** is the source of truth; the **session** is how it runs.
|
||
|
||
| Layer | Owner | Phase-2 reality | Destination |
|
||
| -------------------------------- | ------------------------------------------------------------------------------------------- | ------------------------------------------------------ | ------------------------------------------------------- |
|
||
| **Definition + identity + auth** | gateway / `mosaic-as` (scoped tokens, #541) | `roster.yaml` (tenant-tagged) | one definition; `mosaic agent --new` materializes it |
|
||
| **Tenancy boundary** | **Linux uid per tenant** (linger, own `systemd --user`, own socket, own `~/.config/mosaic`) | one tenant: `jarvis` = tenant zero | uid-per-tenant; federation aggregates across hosts |
|
||
| **Runtime** | per-tenant tmux session on isolated socket | dogfood stub sessions (live now on `mosaic-factory`) | claude/codex/pi/opencode TUIs |
|
||
| **Liveness** | **heartbeat protocol** every runtime answers | protocol defined + dogfood stub answers it | all runtimes answer; "healthy" ≠ "pane alive" |
|
||
| **Observation** | read-only `watch` (native tmux) + `pipe-pane` stream | CLI `watch`/`ps`; explicit opt-in `attach` for control | + auth-gated webUI streams |
|
||
| **Control plane** | **federation** across hosts × tenants | records already carry `tenant_id` + `host` | federated gateways expose fleet state; webUI in Phase 5 |
|
||
|
||
## Operating model (inherited, not reinvented)
|
||
|
||
The AI-guide law stands: one accountable **orchestrator**, isolated **workers** that
|
||
stop at PR-open, the serialized **gate chain** (independent review → green CI →
|
||
diff-sanity → squash-merge → verify), **decide-and-inform** cadence, and a durable
|
||
**board** so missions survive session death. The Fleet is the infrastructure _under_
|
||
this model. See `mosaicstack-aiguide` whitepapers 01 (inter-agent comms) and 03
|
||
(orchestration model) for the rationale.
|
||
|
||
## Invariants — "maximal vision, incremental delivery, zero foreclosure"
|
||
|
||
Every artifact, starting Phase 2, MUST:
|
||
|
||
1. Carry **`tenant_id` + `host`** in schema and message addressing — even with one of each today.
|
||
2. Treat **isolation socket ≠ invisibility** — anything isolated is surfaced by one command.
|
||
3. Define **healthy = answered a heartbeat within N seconds**, never just "pane alive".
|
||
4. Make **observation read-only by default**; control is an explicit, separate, opt-in verb.
|
||
|
||
## Observation model
|
||
|
||
| Verb | Behavior |
|
||
| ----------------------------------- | -------------------------------------------------------------------------------------------------- |
|
||
| `mosaic fleet ps` | one table joining systemd + tmux + process + idle + last-heartbeat, with drift + boot-enable flags |
|
||
| `mosaic agent watch <name>` | **read-only** join (grouped session / `-r`), no resize tyranny, no keystrokes |
|
||
| `mosaic agent attach <name>` | explicit interactive takeover (the only path that can type) |
|
||
| `mosaic agent send <name> --verify` | confirms message **accepted**, not merely keystroke-injected |
|
||
|
||
> Why the current PoC blocks observation: sessions live on the isolated `mosaic-factory`
|
||
> socket (invisible to default `tmux ls`), the only sanctioned read is `capture-pane`
|
||
> (blank for full-screen TUIs), and `attach` is read-write + resizes the session. The
|
||
> verbs above restore "join and observe" safely.
|
||
|
||
## Phased roadmap
|
||
|
||
| Phase | Outcome | Status |
|
||
| ---------------------- | -------------------------------------------------------------------------------------------------------------------------------------------- | ------- |
|
||
| 0–1 | tmux PoC, hardening, published CLI v0.0.34 (#565–#568) | ✅ done |
|
||
| **2 — Observability** | `fleet ps` (host+tenant aware join), heartbeat protocol + dogfood stub answers it, `agent watch` (read-only), `agent send --verify` receipts | ▶ now |
|
||
| 3 — Real runtimes | claude/codex/pi/opencode answer heartbeat; **hybrid lifecycle** (core always-on: orchestrator+reviewer; ephemeral workers per lane) | planned |
|
||
| 4 — Unified definition | one agent schema in gateway; `mosaic agent --new` → materialized per-tenant session; uid-tenant provisioning | planned |
|
||
| 5 — Control plane | federation-backed cross-host × cross-tenant fleet view; **webUI** (surface chosen then) for MVP-X1 parity | planned |
|
||
|
||
## Decisions of record (2026-06-20, with Jason)
|
||
|
||
- Agent model: **config defines, session runs** (gateway = definition/identity/auth; tmux = runtime).
|
||
- Tenancy: **multi-tenant from the start**; isolation = **per-tenant Linux uid**.
|
||
- Health: **heartbeat required** (dogfood stub implements the protocol now).
|
||
- Lifecycle: **hybrid** — core always-on + ephemeral workers per lane.
|
||
- Observation: **read-only default, opt-in takeover**.
|
||
- Multi-host: **designed-for from day one**; control plane **rides federation (W1)**.
|
||
- Delivery: **CLI-first now**, dogfood against the live stub fleet; webUI deferred to Phase 5.
|
||
|
||
## Assumptions (veto-able)
|
||
|
||
- `ASSUMPTION:` first-class runtimes = claude, codex, pi, opencode; a "role" (analyst,
|
||
finance, researcher) = persona + skills + tools on top of a runtime, shipped as a
|
||
starter role library in the framework.
|
||
- `ASSUMPTION:` the cross-host control plane is the **federation** layer (W1), not a
|
||
separate `fleetd` daemon.
|
||
- `ASSUMPTION:` Fleet is workstream **W-FLEET** under `mvp-20260312`; a rollup row in
|
||
`docs/TASKS.md` and a workstream declaration in `MISSION-MANIFEST.md` are proposed to
|
||
the MVP orchestrator, not written by this workstream.
|