diff --git a/scratchpads/2026-06-19-tmux-fleet-durable-install-plan.md b/scratchpads/2026-06-19-tmux-fleet-durable-install-plan.md new file mode 100644 index 0000000..c4c8d1b --- /dev/null +++ b/scratchpads/2026-06-19-tmux-fleet-durable-install-plan.md @@ -0,0 +1,753 @@ +# Durable tmux Fleet Installation Plan + +> **For Mosaic/Hermes:** This is an implementation plan for making the tmux-backed Mosaic software-factory fleet durable on this server and reusable in generic Mosaic Stack installs. Keep local USC/Mosaic defaults in profiles; keep framework behavior customizable. + +**Goal:** Add a supported Mosaic tmux-fleet installation path: holder-owned tmux server, per-agent reusable sessions, reliable send/reset/status tools, local roster customization, and a documented cutover for this server. + +**Architecture:** Mosaic should ship generic tmux fleet primitives in the framework, then layer local rosters through configuration. The holder service owns the tmux socket; each agent service joins the holder-owned server and runs `mosaic yolo `. The orchestrator addresses agents through `mosaic agent ...` abstractions so tmux can later be replaced by Matrix-backed agent comms without changing mission flow. + +**Tech Stack:** Bash, tmux, user systemd units, Mosaic CLI/framework installer, JSON/YAML roster config, existing `packages/mosaic/framework/tools/tmux/{agent-send.sh,send-message.sh}`. + +--- + +## Current evidence from this server + +Checked 2026-06-19: + +- Host: `W-jarvis` +- User: `jarvis` +- tmux: `/usr/bin/tmux`, version `3.4` +- user systemd: active +- existing tmux sessions: `ai-bma-0`, `dyor-1`, `melaniewoltje-3`, `sage-2` +- existing Mosaic runtime: `/home/jarvis/.npm-global/bin/mosaic`, version `0.0.31` +- installed `~/.config/mosaic/tools/tmux` was not present even though the stack repo contains `packages/mosaic/framework/tools/tmux/` + +Implication: do not kill the current tmux server casually. This server has active ad-hoc/service sessions. The durable fleet cutover must be planned, with either a separate socket first or a scheduled fleet recycle. + +## Design decisions + +### 1. Generic framework, local profile + +The Mosaic framework should ship: + +- systemd unit templates; +- tmux fleet CLI wrappers; +- roster schema and examples; +- install/enable/status/reset commands; +- docs and verification scripts. + +Local environments should provide: + +- agent names; +- runtime per slot (`claude`, `pi`, `codex`, etc.); +- default role class; +- launch directory; +- optional kickstart prompt; +- model/provider hints; +- transport selection (`tmux` now, `matrix` later). + +Do not bake the USC roster into generic install code. Ship it as an example profile. + +### 2. Durable sessions, disposable task context + +Session names are durable operational addresses. Task persona is disposable. Reusable worker slots should be reset with `/clear` or `/new` and then receive a fresh task kickstart. + +Persistent/semi-persistent personas: + +- lead orchestrator; +- final/adversarial reviewer; +- architecture/enhancement lane. + +Disposable slots: + +- implementers; +- ordinary reviewers; +- security reviewers unless actively holding a security mission. + +### 3. Transport abstraction now + +Add commands around tmux instead of calling tmux directly from orchestration: + +```bash +mosaic agent send --message "..." +mosaic agent status [--json] +mosaic agent reset [--clear|--new] +mosaic agent roster [--json] +mosaic fleet install|start|stop|restart|status|verify +``` + +Today these call tmux/systemd. Later the same command surface can target Matrix or per-agent gateways. + +### 4. Avoid shared-server ownership bug + +Use the AI Guide holder pattern: + +```text +mosaic-tmux-holder.service owns the tmux server/socket +mosaic-agent@.service joins the existing holder-owned socket +ExecStop kills only session = +``` + +Use exact tmux targets: `=`. + +### 5. Prefer separate named socket for Mosaic factory + +To avoid disturbing existing tmux work, the default fleet should use a named socket such as: + +```text +$XDG_RUNTIME_DIR/mosaic-factory.tmux +``` + +or tmux socket name: + +```bash +tmux -L mosaic-factory ... +``` + +This avoids collision with ordinary `tmux ls` sessions. The send tools need socket support. + +--- + +## Target USC-style roster example + +Ship as example only, not default: + +```yaml +version: 1 +transport: tmux +tmux: + socket_name: mosaic-factory + holder_session: _holder + working_directory: ~/src +agents: + - name: mos-claude + runtime: claude + class: orchestrator + model_hint: Claude Opus + persistent_persona: true + - name: coder0 + runtime: claude + class: implementer + model_hint: Claude Opus + reset_between_tasks: true + - name: coder1 + runtime: claude + class: implementer + model_hint: Claude Opus + reset_between_tasks: true + - name: coder2 + runtime: pi + class: implementer + model_hint: Pi GPT-5.5 + reset_between_tasks: true + - name: coder3 + runtime: pi + class: implementer + model_hint: Pi GPT-5.5 + reset_between_tasks: true + - name: coder4 + runtime: claude + class: implementer + model_hint: Claude Opus + reset_between_tasks: true + - name: coder5 + runtime: claude + class: implementer + model_hint: Claude Opus + reset_between_tasks: true + - name: enhance + runtime: claude + class: enhancer + model_hint: Claude Opus + persistent_persona: semi + - name: rev0 + runtime: pi + class: reviewer + model_hint: Pi GPT-5.5 + reset_between_tasks: true + - name: rev1 + runtime: pi + class: reviewer + model_hint: Pi GPT-5.5 + reset_between_tasks: true + - name: secrev0 + runtime: pi + class: security_reviewer + model_hint: Pi GPT-5.5 + reset_between_tasks: true + - name: secrev1 + runtime: pi + class: security_reviewer + model_hint: Pi GPT-5.5 + reset_between_tasks: true + - name: ultron + runtime: pi + class: final_reviewer + model_hint: Pi GPT-5.5 + persistent_persona: semi +``` + +--- + +## Phase 0 — Confirm install surfaces + +### Task 0.1: Inspect installer copy behavior + +**Objective:** Confirm how framework files under `packages/mosaic/framework/` become installed under `~/.config/mosaic/`. + +**Files:** + +- Read: `tools/install.sh` +- Read: `packages/mosaic/framework/install.sh` +- Read: `packages/mosaic/src/runtime/install-manifest.ts` + +**Steps:** + +1. Verify `packages/mosaic/framework/install.sh` rsyncs `tools/tmux`. +2. Verify whether npm-packaged installs include `framework/tools/tmux`. +3. Confirm whether installed hosts should run `mosaic update`, `bash tools/install.sh`, or `packages/mosaic/framework/install.sh` to receive new tmux tools. +4. Record exact propagation command in docs. + +**Verification:** + +```bash +bash packages/mosaic/framework/install.sh --help || true +npm pack --dry-run --json | jq '.[0].files[].path' | grep 'framework/tools/tmux' +``` + +Expected: tmux tools are included in installable package or packaging fix is identified. + +### Task 0.2: Inspect current yolo launch semantics + +**Objective:** Confirm `mosaic yolo claude` and `mosaic yolo pi` accept optional initial prompt text and behave well under systemd/tmux. + +**Files:** + +- Read: `packages/mosaic/src/**` +- Read: `packages/mosaic/framework/runtime/claude/RUNTIME.md` +- Read: `packages/mosaic/framework/runtime/pi/RUNTIME.md` + +**Verification commands:** + +```bash +mosaic yolo claude --help +mosaic yolo pi --help +``` + +Expected: a systemd `ExecStart` can launch the runtime either with no prompt or with a kickstart prompt file/string. + +--- + +## Phase 1 — Framework tmux primitives + +### Task 1.1: Add socket support to send tools + +**Objective:** Allow `agent-send.sh` and `send-message.sh` to target a named Mosaic tmux socket without affecting default tmux sessions. + +**Files:** + +- Modify: `packages/mosaic/framework/tools/tmux/send-message.sh` +- Modify: `packages/mosaic/framework/tools/tmux/agent-send.sh` +- Modify: `packages/mosaic/framework/tools/tmux/README.md` +- Test: `packages/mosaic/framework/tools/tmux/test-send-message.sh` (new) + +**Design:** + +Add optional flags: + +```bash +-L SOCKET_NAME # tmux -L socket name +-SOCKET PATH # optional later if needed; avoid conflict with existing -S source label in agent-send +``` + +Because `agent-send.sh` already uses `-S` for source label, prefer `-L` for socket name and `-T` or `--socket-path` only if long-option parsing is added. + +**Implementation notes:** + +- Build a tmux command array: + +```bash +tmux_cmd=(tmux) +if [ -n "$SOCKET_NAME" ]; then tmux_cmd+=( -L "$SOCKET_NAME" ); fi +``` + +- Replace raw `tmux ...` calls with `"${tmux_cmd[@]}" ...`. +- Pass `-L` through remote ssh invocation. +- Include socket name in verbose output. + +**Verification:** + +```bash +tmux -L mosaic-test new-session -d -s target 'cat' +packages/mosaic/framework/tools/tmux/send-message.sh -L mosaic-test -t target -m 'hello' +tmux -L mosaic-test capture-pane -t target -p | grep hello +tmux -L mosaic-test kill-server +``` + +Expected: message lands in the named socket session; default `tmux ls` is untouched. + +### Task 1.2: Add exact target validation helper + +**Objective:** Prevent accidental prefix targeting in all tmux fleet operations. + +**Files:** + +- Create: `packages/mosaic/framework/tools/tmux/_lib.sh` +- Modify: `send-message.sh` +- Modify: `agent-send.sh` + +**Behavior:** + +- For session-only agent names, normalize target to `=` before kill/status/reset operations. +- For explicit pane targets like `session:window.pane`, allow as advanced path but document the risk. + +**Verification:** + +Create sessions `agent` and `agent0`; verify killing/resetting `agent` does not affect `agent0`. + +--- + +## Phase 2 — systemd unit templates + +### Task 2.1: Add holder service template + +**Objective:** Ship a user systemd unit template that owns the Mosaic factory tmux server. + +**Files:** + +- Create: `packages/mosaic/framework/systemd/user/mosaic-tmux-holder.service` +- Create: `packages/mosaic/framework/tools/fleet/install-user-units.sh` + +**Unit shape:** + +```ini +[Unit] +Description=Mosaic tmux fleet holder +Documentation=https://git.mosaicstack.dev/mosaicstack/aiguide + +[Service] +Type=oneshot +RemainAfterExit=yes +Environment=MOSAIC_TMUX_SOCKET=mosaic-factory +ExecStart=/usr/bin/tmux -L ${MOSAIC_TMUX_SOCKET} new-session -d -s _holder 'while true; do sleep 3600; done' +ExecStop=-/usr/bin/tmux -L ${MOSAIC_TMUX_SOCKET} kill-server + +[Install] +WantedBy=default.target +``` + +**Important:** systemd environment expansion in `ExecStart` is limited. Verify syntax; if `%E`/environment expansion is awkward, generate concrete units from config instead of relying on dynamic expansion. + +**Verification:** + +```bash +systemd-analyze --user verify ~/.config/systemd/user/mosaic-tmux-holder.service +systemctl --user daemon-reload +systemctl --user start mosaic-tmux-holder.service +tmux -L mosaic-factory ls | grep _holder +``` + +### Task 2.2: Add agent service template + +**Objective:** Ship a user systemd template that starts one configured agent slot. + +**Files:** + +- Create: `packages/mosaic/framework/systemd/user/mosaic-agent@.service` +- Modify: `packages/mosaic/framework/tools/fleet/install-user-units.sh` + +**Unit shape:** + +```ini +[Unit] +Description=Mosaic agent session %i +Requires=mosaic-tmux-holder.service +After=mosaic-tmux-holder.service +PartOf=mosaic-tmux-holder.service + +[Service] +Type=oneshot +RemainAfterExit=yes +WorkingDirectory=%h/src +Environment=MOSAIC_TMUX_SOCKET=mosaic-factory +ExecStart=/bin/bash -lc 'tmux -L "$MOSAIC_TMUX_SOCKET" new-session -d -s "%i" "mosaic yolo $(mosaic fleet runtime %i)"' +ExecStop=-/usr/bin/tmux -L mosaic-factory kill-session -t '=%i' + +[Install] +WantedBy=default.target +``` + +**Design warning:** command substitution in unit files can become brittle. Prefer a generated per-agent EnvironmentFile: + +```text +~/.config/mosaic/fleet/agents/coder0.env +``` + +with: + +```bash +MOSAIC_AGENT_NAME=coder0 +MOSAIC_AGENT_RUNTIME=claude +MOSAIC_AGENT_WORKDIR=/home/jarvis/src +MOSAIC_TMUX_SOCKET=mosaic-factory +``` + +Then `ExecStart` calls a wrapper: + +```bash +~/.config/mosaic/tools/fleet/start-agent-session.sh +``` + +**Verification:** + +```bash +systemd-analyze --user verify ~/.config/systemd/user/mosaic-agent@.service +systemctl --user start mosaic-agent@coder0.service +tmux -L mosaic-factory has-session -t '=coder0' +systemctl --user restart mosaic-agent@coder0.service +``` + +Expected: holder server PID remains unchanged; only `coder0` session recycles. + +### Task 2.3: Add start-agent wrapper + +**Objective:** Keep systemd units simple by moving config lookup and launch command construction into a script. + +**Files:** + +- Create: `packages/mosaic/framework/tools/fleet/start-agent-session.sh` + +**Behavior:** + +Inputs: + +```bash +start-agent-session.sh +``` + +Reads: + +```text +$MOSAIC_HOME/fleet/agents/.env +``` + +Starts: + +```bash +tmux -L "$MOSAIC_TMUX_SOCKET" new-session -d -s "$MOSAIC_AGENT_NAME" -c "$MOSAIC_AGENT_WORKDIR" "mosaic yolo $MOSAIC_AGENT_RUNTIME" +``` + +Guardrails: + +- fail if runtime is empty; +- fail if workdir does not exist; +- no duplicate sessions unless `--replace` is passed; +- exact session names only. + +--- + +## Phase 3 — roster config and CLI wrappers + +### Task 3.1: Add fleet config schema and examples + +**Objective:** Define customizable install-time roster without hardcoding USC. + +**Files:** + +- Create: `packages/mosaic/framework/fleet/roster.schema.json` +- Create: `packages/mosaic/framework/fleet/examples/minimal.yaml` +- Create: `packages/mosaic/framework/fleet/examples/usc-software-factory.yaml` +- Create: `packages/mosaic/framework/fleet/README.md` + +**Schema concepts:** + +- `transport`: `tmux` now; `matrix` later. +- `tmux.socket_name` +- `tmux.holder_session` +- `defaults.working_directory` +- `agents[].name` +- `agents[].runtime` +- `agents[].class` +- `agents[].model_hint` +- `agents[].persistent_persona` +- `agents[].reset_between_tasks` +- `agents[].kickstart_template` + +**Verification:** + +Use `jq` for JSON examples or add a small Python/YAML validator if YAML is chosen. If no YAML parser is guaranteed, store examples as JSON or support both with Python stdlib JSON first. + +### Task 3.2: Add `mosaic fleet` commands + +**Objective:** Provide operator-safe commands for install/status/start/stop/restart/verify. + +**Files:** + +- Modify: `packages/mosaic/src/cli.ts` or the current commander entrypoint. +- Create scripts under: `packages/mosaic/framework/tools/fleet/` + +**Commands:** + +```bash +mosaic fleet init --profile minimal|usc --write +mosaic fleet install-systemd +mosaic fleet start [agent] +mosaic fleet stop [agent] +mosaic fleet restart [agent] +mosaic fleet status --json +mosaic fleet verify +``` + +**Implementation path:** + +Start by wrapping framework shell scripts from the TypeScript CLI. Do not overbuild a TypeScript service manager in the first pass. + +### Task 3.3: Add `mosaic agent` commands + +**Objective:** Provide transport-stable per-agent operations. + +**Files:** + +- Modify: Mosaic CLI entrypoint. +- Create: `packages/mosaic/framework/tools/agent/` or reuse `tools/tmux` + `tools/fleet`. + +**Commands:** + +```bash +mosaic agent roster [--json] +mosaic agent status [agent] [--json] +mosaic agent send --message "..." +mosaic agent reset --clear|--new +mosaic agent tail [-n 80] +``` + +**Reset behavior:** + +For tmux transport, `reset --clear` sends `/clear` then Enter through `send-message.sh`. + +For Claude/Pi differences, keep reset command configurable per runtime: + +```yaml +runtimes: + claude: + reset_command: /clear + pi: + reset_command: /new +``` + +If a runtime does not support a known reset command, restart the service and send a fresh kickstart. + +--- + +## Phase 4 — this-server rollout strategy + +### Task 4.1: Install on separate socket first + +**Objective:** Prove the holder pattern without disturbing existing sessions. + +**Commands after implementation lands locally:** + +```bash +mosaic fleet init --profile minimal --write +mosaic fleet install-systemd +systemctl --user daemon-reload +systemctl --user start mosaic-tmux-holder.service +mosaic fleet verify +``` + +Expected: + +- `tmux -L mosaic-factory ls` shows `_holder`. +- normal `tmux ls` still shows existing sessions unchanged. + +### Task 4.2: Start one canary agent + +**Objective:** Validate single-agent start/restart isolation. + +Use a harmless canary first, not the full fleet. + +Example roster addition: + +```yaml +- name: canary-pi + runtime: pi + class: canary + working_directory: /home/jarvis/src +``` + +Commands: + +```bash +systemctl --user start mosaic-agent@canary-pi.service +SRV=$(tmux -L mosaic-factory display-message -p '#{pid}') +systemctl --user restart mosaic-agent@canary-pi.service +test "$SRV" = "$(tmux -L mosaic-factory display-message -p '#{pid}')" +tmux -L mosaic-factory ls +``` + +Expected: holder PID unchanged; `_holder` remains; `canary-pi` recreated. + +### Task 4.3: Configure local Mosaic factory roster + +**Objective:** Create the actual local roster for this server after canary passes. + +Do not assume USC exact roster is desired here. Create a local profile such as: + +```text +~/.config/mosaic/fleet/roster.yaml +``` + +Initial local recommendation: + +- `mos-claude` orchestrator +- `coder0` / `coder1` implementers +- `rev0` reviewer +- `secrev0` security reviewer +- `ultron` final/adversarial reviewer + +Scale to full USC-style pool only after resource/budget behavior is understood. + +### Task 4.4: Cut over existing ad-hoc tmux sessions only if desired + +**Objective:** Avoid data loss. + +Existing sessions on this server are not on the proposed `mosaic-factory` socket. They can remain untouched. If we later want them under Mosaic fleet control: + +1. list sessions; +2. capture logs/handoffs; +3. stop old processes intentionally; +4. recreate as configured `mosaic-agent@...` services; +5. verify comms and state. + +Do not run `tmux kill-server` on the default socket unless Jason explicitly approves that outage. + +--- + +## Phase 5 — docs and AI Guide backfill + +### Task 5.1: Stack docs + +**Objective:** Document install and customization for Mosaic Stack users. + +**Files:** + +- Create: `docs/fleet/tmux-fleet.md` or `packages/mosaic/framework/tools/fleet/README.md` +- Modify: top-level `README.md` if appropriate. + +Must cover: + +- what problem holder service solves; +- install commands; +- customization file; +- example rosters; +- reset/reuse lifecycle; +- exact-target safety; +- separate socket default; +- Matrix migration path. + +### Task 5.2: AI Guide docs + +**Objective:** Keep generic guidance in AI Guide and implementation details in Stack. + +**Files in `mosaicstack/aiguide`:** + +- Update: `playbooks/tmux-fleet.md` with named socket, roster/profile, and resettable-slot pattern. +- Add or update: `reference/agent-role-matrix.md` if PR #5 lands. + +Do not put Mosaic install commands as the only path in AI Guide. Present them as one implementation profile. + +--- + +## Phase 6 — Matrix migration seam + +### Task 6.1: Add transport enum but implement tmux only + +**Objective:** Avoid hardcoding tmux into orchestration semantics. + +Roster: + +```yaml +transport: tmux +``` + +Future: + +```yaml +transport: matrix +matrix: + homeserver: https://matrix.example + room_prefix: mosaic-factory +``` + +### Task 6.2: Define transport interface docs + +**Objective:** Make Matrix plugin work a transport swap, not a rewrite. + +Minimum operations: + +```text +send(agent, message) +reset(agent, mode) +status(agent) +tail(agent) +listAgents() +``` + +Any tmux-specific concept must stay below this line. + +--- + +## Acceptance criteria + +The implementation is complete when: + +- `mosaic fleet init` can write a minimal roster. +- `mosaic fleet install-systemd` installs holder and agent units without hand editing. +- `mosaic fleet start` starts the holder and configured agents on a named tmux socket. +- Restarting one `mosaic-agent@name.service` does not change holder server PID or kill sibling sessions. +- `mosaic agent send` can deliver a message to a named agent with a self-identifying preamble. +- `mosaic agent reset` can clear/new a reusable slot and send a fresh kickstart. +- `mosaic fleet verify` proves holder ownership, exact-target safety, and per-agent restart isolation. +- Existing default tmux sessions on this server are not disturbed by default install. +- Docs explain generic customization and include USC-style roster only as an example. +- AI Guide remains generic; Mosaic Stack docs carry the concrete install path. + +## Risks and mitigations + +| Risk | Mitigation | +|---|---| +| Killing existing tmux sessions | Use named `mosaic-factory` socket; no default `tmux kill-server`. | +| systemd unit quoting/env expansion bugs | Move logic into shell wrappers; verify with `systemd-analyze --user verify`. | +| Runtime reset command mismatch | Make reset command runtime-configurable; fallback to service restart + kickstart. | +| Tool install drift | Ensure npm package includes framework tmux/fleet tools; add packaging test. | +| Mosaic-specific assumptions leak into generic guide | Keep USC roster as example profile; AI Guide documents pattern/options. | +| Matrix migration blocked by tmux coupling | Add `mosaic agent` abstraction now; keep tmux details below transport layer. | + +## Suggested first PR split + +1. **PR A — tmux tool hardening** + - socket support; + - exact target helpers; + - tests/docs. + +2. **PR B — fleet systemd primitives** + - holder unit; + - agent unit; + - start-agent wrapper; + - install-user-units script; + - verify script. + +3. **PR C — roster and CLI** + - roster schema/examples; + - `mosaic fleet ...` commands; + - `mosaic agent ...` commands. + +4. **PR D — local rollout and docs** + - local roster for this server; + - run canary; + - document verification evidence; + - update AI Guide with generic lessons. + +## Immediate next action + +Implement PR A first. It is low-risk, improves existing tools, and is required for a safe named-socket rollout on this server.