11 KiB
Mosaic Fleet — North Star
Workstream: W-FLEET (Fleet) under mission
mvp-20260312Umbrella: docs/MISSION-MANIFEST.md · docs/PRD.md (Mosaic Stack v0.1.0) Status: doctrine — authored 2026-06-20. Owner of this file: Fleet workstream lead. This document does not modify the MVP rollup; a rollup row is proposed, not written here.
Vision
A customizable, multi-tenant fleet of always-on AI agents — each defined by role, materialized as a durable, joinable runtime session, coordinated by the proven orchestrator/worker model, and observable end-to-end across hosts. Coding today; finance, analytics, research as roster entries tomorrow — same primitives, different roster. The fleet is the agent-session execution layer of the Mosaic Stack MVP: the thing federation makes reachable across hosts and the webUI/TUI/CLI make visible.
The USC tmux PoC (durable sessions + agent-send comms) proved the model. This
workstream makes it an official, observable, multi-tenant Mosaic Stack capability.
The Fleet as means of production (bootstrapping)
The Fleet has a dual role, and that is the point:
- As product — a multi-tenant agent-fleet capability of Mosaic Stack (this workstream).
- As means of production — the orchestrator/worker fleet that actually builds the entire MVP (federation W1, webUI, TUI, CLI, and the Fleet itself).
We are building the system that builds the system. Every other MVP workstream is delivered by the fleet, so fleet observability and control are not merely product features — they are the operational floor of the whole delivery effort. If we cannot see and steer the agents, we cannot trust what they ship. This is why Phase 2 (observability) leads: it is the instrument panel for the factory, dogfooded on the live fleet that is, recursively, building Mosaic Stack.
The discipline that makes great power safe is the same gate chain the fleet enforces: independent review before merge, green CI, honest completion, decide-and-inform cadence, and no irreversible action without authority. The bootstrap is only as trustworthy as those gates.
Alignment with MVP cross-cutting requirements
The Fleet inherits — does not re-invent — the MVP's hard requirements:
| MVP req | What it means for the Fleet |
|---|---|
| MVP-X1 three-surface parity | fleet observability/control reachable via CLI + TUI + webUI (CLI first; webUI is required for parity, not optional) |
| MVP-X2 multi-tenant isolation | one tenant = one Linux uid (own systemd --user, socket, ~/.config/mosaic); no cross-tenant leakage |
| MVP-X3 auth (BetterAuth/SSO) | operator→fleet and cross-host views are auth-gated through the platform's existing auth |
| MVP-X4 quality gates | pnpm typecheck/lint/format:check green before any push |
| MVP-X5 federated topology | cross-host fleet visibility rides the federation boundary (W1), not a bespoke broker |
| MVP-X6 OTEL tracing | heartbeats, sends, and lifecycle events emit spans; traceparent crosses the federation boundary |
| MVP-X7 trunk merge | branch from main, squash-merge via PR, never push to main |
The stack — where every concern lives
One definition is the source of truth; the session is how it runs.
| Layer | Owner | Phase-2 reality | Destination |
|---|---|---|---|
| Definition + identity + auth | gateway / mosaic-as (scoped tokens, #541) |
roster.yaml (tenant-tagged) |
one definition; mosaic agent --new materializes it |
| Tenancy boundary | Linux uid per tenant (linger, own systemd --user, own socket, own ~/.config/mosaic) |
one tenant: jarvis = tenant zero |
uid-per-tenant; federation aggregates across hosts |
| Runtime | per-tenant tmux session on isolated socket | dogfood stub sessions (live now on mosaic-factory) |
claude/codex/pi/opencode TUIs |
| Liveness | heartbeat protocol every runtime answers | protocol defined + dogfood stub answers it | all runtimes answer; "healthy" ≠ "pane alive" |
| Observation | read-only watch (native tmux) + pipe-pane stream |
CLI watch/ps; explicit opt-in attach for control |
+ auth-gated webUI streams |
| Control plane | federation across hosts × tenants | records already carry tenant_id + host |
federated gateways expose fleet state; webUI in Phase 5 |
Operating model (inherited, not reinvented)
The AI-guide law stands: one accountable orchestrator, isolated workers that
stop at PR-open, the serialized gate chain (independent review → green CI →
diff-sanity → squash-merge → verify), decide-and-inform cadence, and a durable
board so missions survive session death. The Fleet is the infrastructure under
this model. See mosaicstack-aiguide whitepapers 01 (inter-agent comms) and 03
(orchestration model) for the rationale.
Invariants — "maximal vision, incremental delivery, zero foreclosure"
Every artifact, starting Phase 2, MUST:
- Carry
tenant_id+hostin schema and message addressing — even with one of each today. - Treat isolation socket ≠ invisibility — anything isolated is surfaced by one command.
- Define healthy = answered a heartbeat within N seconds, never just "pane alive".
- Make observation read-only by default; control is an explicit, separate, opt-in verb.
Observation model
| Verb | Behavior |
|---|---|
mosaic fleet ps |
one table joining systemd + tmux + process + idle + last-heartbeat, with drift + boot-enable flags |
mosaic agent watch <name> |
read-only join (grouped session / -r), no resize tyranny, no keystrokes |
mosaic agent attach <name> |
explicit interactive takeover (the only path that can type) |
mosaic agent send <name> --verify |
confirms message accepted, not merely keystroke-injected |
Why the current PoC blocks observation: sessions live on the isolated
mosaic-factorysocket (invisible to defaulttmux ls), the only sanctioned read iscapture-pane(blank for full-screen TUIs), andattachis read-write + resizes the session. The verbs above restore "join and observe" safely.
Phased roadmap
| Phase | Outcome | Status |
|---|---|---|
| 0–1 | tmux PoC, hardening, published CLI v0.0.34 (#565–#568) | ✅ done |
| 2 — Observability | fleet ps (host+tenant aware join), heartbeat protocol + dogfood stub answers it, agent watch (read-only), agent send --verify receipts |
▶ now |
| 3 — Real runtimes | claude/codex/pi/opencode answer heartbeat; hybrid lifecycle (core always-on: orchestrator+reviewer; ephemeral workers per lane) | planned |
| 4 — Unified definition | one agent schema in gateway; mosaic agent --new → materialized per-tenant session; uid-tenant provisioning |
planned |
| 5 — Control plane | federation-backed cross-host × cross-tenant fleet view; webUI (surface chosen then) for MVP-X1 parity | planned |
Decisions of record (2026-06-20, with Jason)
- Agent model: config defines, session runs (gateway = definition/identity/auth; tmux = runtime).
- Tenancy: multi-tenant from the start; isolation = per-tenant Linux uid.
- Health: heartbeat required (dogfood stub implements the protocol now).
- Lifecycle: hybrid — core always-on + ephemeral workers per lane.
- Observation: read-only default, opt-in takeover.
- Multi-host: designed-for from day one; control plane rides federation (W1).
- Delivery: CLI-first now, dogfood against the live stub fleet; webUI deferred to Phase 5.
- Runtimes: fleet agents default to Codex / pi-on-Codex; Claude is reserved for Claude
Code only (avoid alternate-harness API pricing). Validated durable recipe:
mosaic yolo pi --model openai-codex/gpt-5.5:high. Durable detached launch requires the runtime-bin on PATH (baked into the pane command) + boot-survival (enable+ linger), whichfleet initshould automate.
Assumptions (veto-able)
ASSUMPTION:first-class runtimes = claude, codex, pi, opencode; a "role" (analyst, finance, researcher) = persona + skills + tools on top of a runtime, shipped as a starter role library in the framework.ASSUMPTION:the cross-host control plane is the federation layer (W1), not a separatefleetddaemon.ASSUMPTION:Fleet is workstream W-FLEET undermvp-20260312; a rollup row indocs/TASKS.mdand a workstream declaration inMISSION-MANIFEST.mdare proposed to the MVP orchestrator, not written by this workstream.