Files
stack/docs/scratchpads/fleet-observability-phase2.md
2026-06-20 22:30:34 -05:00

3.2 KiB

Scratchpad — Fleet Phase 2: Observability (W-FLEET)

Append-only. Mission mvp-20260312 / workstream W-FLEET. Lead: Jarvis (Claude) at W-jarvis:mos-claude-18. Coordinating with jwoltje@dragon-lin:coder0-0.

Mission prompt (2026-06-20)

Establish the north star for the Mosaic Fleet feature and prepare Phase-2 observability for delivery. The USC tmux PoC is the proven base. Jason granted lead authority: "The fleet is a great way to actually build the MVP — we are building the system that builds the system." Dogfood actual agent construction + ad-hoc deployment; coordinate with a second agent on dragon-lin.

Decisions of record (with Jason, 2026-06-20)

  • Agent model: config defines, session runs (gateway = definition/identity/auth; tmux = runtime).
  • Tenancy: multi-tenant from the start; isolation = per-tenant Linux uid.
  • Health: heartbeat required; dogfood stub implements protocol now.
  • Lifecycle: hybrid (core always-on + ephemeral workers).
  • Observation: read-only default, opt-in takeover.
  • Multi-host: designed-for day one; control plane rides federation (W1), not a bespoke broker.
  • Delivery: CLI-first, dogfood on the live stub fleet; webUI deferred to Phase 5.
  • Fleet is dual-role: product AND means of production (bootstrapping the MVP).
  • Code review = dual-engine: Claude and gpt-5.5/Codex, run together (Jason: the combination produces the best results). Launch reviewers via mosaic yolo pi / codex (proven path) or ~/.config/mosaic/tools/codex/codex-code-review.sh. Applies to all code-review gates incl. FLEET-OBS-008. Per Jason 2026-06-20.
  • Worktree discipline: do fleet work in ~/src/mosaicstack-stack-worktrees/<branch>, NOT the shared main checkout — concurrent processes mutate main there (learned 2026-06-20).

Environment facts (verified 2026-06-20)

  • Fleet is live on W-jarvis (uid 1000, jarvis, Linger=yes) on tmux socket mosaic-factory: _holder, canary-pi, dogfood-coder, dogfood-orchestrator, dogfood-reviewer. All panes run ~/.config/mosaic/fleet/dogfood-agent.py (stub), including canary-pi (roster says runtime=pi → drift).
  • Holder + mosaic-agent@* units are active (exited) but UnitFileState=disabled (reboot loses fleet → boot-enable gap to surface).
  • Observation blocked by: isolated socket (hidden from default tmux ls), capture-pane blank for TUIs, attach being read-write + resizing.
  • Second agent: jwoltje@dragon-lin, session coder0-0 (group coder0), running node, default socket. ssh forward reach confirmed.

Governance / collision-safety

  • mosaicstack-stack has active mission mvp-20260312 with single-writer locks on docs/MISSION-MANIFEST.md, docs/TASKS.md, docs/scratchpads/mvp-20260312.md.
  • This workstream touches NONE of those. All Fleet docs scoped under docs/fleet/ + this scratchpad. Rollup row proposed, not written.

Session log

  • 2026-06-20: Researched AI guide + fleet code + live state. Established north star with Jason (8 forks decided). Branched feat/fleet-observability. Persisted docs/fleet/{north-star.md,PRD.md,TASKS.md} + this scratchpad. Next: establish comms with dragon-lin coder, commit docs, begin Phase-2 delivery (heartbeat + fleet ps).