Files
stack/scratchpads/2026-06-19-tmux-fleet-durable-install-plan.md
Jarvis 250d3da12d
Some checks failed
ci/woodpecker/push/ci Pipeline was canceled
ci/woodpecker/pr/ci Pipeline was canceled
docs: plan durable tmux fleet install
2026-06-19 15:10:36 -05:00

21 KiB

Durable tmux Fleet Installation Plan

For Mosaic/Hermes: This is an implementation plan for making the tmux-backed Mosaic software-factory fleet durable on this server and reusable in generic Mosaic Stack installs. Keep local USC/Mosaic defaults in profiles; keep framework behavior customizable.

Goal: Add a supported Mosaic tmux-fleet installation path: holder-owned tmux server, per-agent reusable sessions, reliable send/reset/status tools, local roster customization, and a documented cutover for this server.

Architecture: Mosaic should ship generic tmux fleet primitives in the framework, then layer local rosters through configuration. The holder service owns the tmux socket; each agent service joins the holder-owned server and runs mosaic yolo <runtime>. The orchestrator addresses agents through mosaic agent ... abstractions so tmux can later be replaced by Matrix-backed agent comms without changing mission flow.

Tech Stack: Bash, tmux, user systemd units, Mosaic CLI/framework installer, JSON/YAML roster config, existing packages/mosaic/framework/tools/tmux/{agent-send.sh,send-message.sh}.


Current evidence from this server

Checked 2026-06-19:

  • Host: W-jarvis
  • User: jarvis
  • tmux: /usr/bin/tmux, version 3.4
  • user systemd: active
  • existing tmux sessions: ai-bma-0, dyor-1, melaniewoltje-3, sage-2
  • existing Mosaic runtime: /home/jarvis/.npm-global/bin/mosaic, version 0.0.31
  • installed ~/.config/mosaic/tools/tmux was not present even though the stack repo contains packages/mosaic/framework/tools/tmux/

Implication: do not kill the current tmux server casually. This server has active ad-hoc/service sessions. The durable fleet cutover must be planned, with either a separate socket first or a scheduled fleet recycle.

Design decisions

1. Generic framework, local profile

The Mosaic framework should ship:

  • systemd unit templates;
  • tmux fleet CLI wrappers;
  • roster schema and examples;
  • install/enable/status/reset commands;
  • docs and verification scripts.

Local environments should provide:

  • agent names;
  • runtime per slot (claude, pi, codex, etc.);
  • default role class;
  • launch directory;
  • optional kickstart prompt;
  • model/provider hints;
  • transport selection (tmux now, matrix later).

Do not bake the USC roster into generic install code. Ship it as an example profile.

2. Durable sessions, disposable task context

Session names are durable operational addresses. Task persona is disposable. Reusable worker slots should be reset with /clear or /new and then receive a fresh task kickstart.

Persistent/semi-persistent personas:

  • lead orchestrator;
  • final/adversarial reviewer;
  • architecture/enhancement lane.

Disposable slots:

  • implementers;
  • ordinary reviewers;
  • security reviewers unless actively holding a security mission.

3. Transport abstraction now

Add commands around tmux instead of calling tmux directly from orchestration:

mosaic agent send <agent> --message "..."
mosaic agent status [--json]
mosaic agent reset <agent> [--clear|--new]
mosaic agent roster [--json]
mosaic fleet install|start|stop|restart|status|verify

Today these call tmux/systemd. Later the same command surface can target Matrix or per-agent gateways.

4. Avoid shared-server ownership bug

Use the AI Guide holder pattern:

mosaic-tmux-holder.service owns the tmux server/socket
mosaic-agent@<name>.service joins the existing holder-owned socket
ExecStop kills only session =<name>

Use exact tmux targets: =<session>.

5. Prefer separate named socket for Mosaic factory

To avoid disturbing existing tmux work, the default fleet should use a named socket such as:

$XDG_RUNTIME_DIR/mosaic-factory.tmux

or tmux socket name:

tmux -L mosaic-factory ...

This avoids collision with ordinary tmux ls sessions. The send tools need socket support.


Target USC-style roster example

Ship as example only, not default:

version: 1
transport: tmux
tmux:
  socket_name: mosaic-factory
  holder_session: _holder
  working_directory: ~/src
agents:
  - name: mos-claude
    runtime: claude
    class: orchestrator
    model_hint: Claude Opus
    persistent_persona: true
  - name: coder0
    runtime: claude
    class: implementer
    model_hint: Claude Opus
    reset_between_tasks: true
  - name: coder1
    runtime: claude
    class: implementer
    model_hint: Claude Opus
    reset_between_tasks: true
  - name: coder2
    runtime: pi
    class: implementer
    model_hint: Pi GPT-5.5
    reset_between_tasks: true
  - name: coder3
    runtime: pi
    class: implementer
    model_hint: Pi GPT-5.5
    reset_between_tasks: true
  - name: coder4
    runtime: claude
    class: implementer
    model_hint: Claude Opus
    reset_between_tasks: true
  - name: coder5
    runtime: claude
    class: implementer
    model_hint: Claude Opus
    reset_between_tasks: true
  - name: enhance
    runtime: claude
    class: enhancer
    model_hint: Claude Opus
    persistent_persona: semi
  - name: rev0
    runtime: pi
    class: reviewer
    model_hint: Pi GPT-5.5
    reset_between_tasks: true
  - name: rev1
    runtime: pi
    class: reviewer
    model_hint: Pi GPT-5.5
    reset_between_tasks: true
  - name: secrev0
    runtime: pi
    class: security_reviewer
    model_hint: Pi GPT-5.5
    reset_between_tasks: true
  - name: secrev1
    runtime: pi
    class: security_reviewer
    model_hint: Pi GPT-5.5
    reset_between_tasks: true
  - name: ultron
    runtime: pi
    class: final_reviewer
    model_hint: Pi GPT-5.5
    persistent_persona: semi

Phase 0 — Confirm install surfaces

Task 0.1: Inspect installer copy behavior

Objective: Confirm how framework files under packages/mosaic/framework/ become installed under ~/.config/mosaic/.

Files:

  • Read: tools/install.sh
  • Read: packages/mosaic/framework/install.sh
  • Read: packages/mosaic/src/runtime/install-manifest.ts

Steps:

  1. Verify packages/mosaic/framework/install.sh rsyncs tools/tmux.
  2. Verify whether npm-packaged installs include framework/tools/tmux.
  3. Confirm whether installed hosts should run mosaic update, bash tools/install.sh, or packages/mosaic/framework/install.sh to receive new tmux tools.
  4. Record exact propagation command in docs.

Verification:

bash packages/mosaic/framework/install.sh --help || true
npm pack --dry-run --json | jq '.[0].files[].path' | grep 'framework/tools/tmux'

Expected: tmux tools are included in installable package or packaging fix is identified.

Task 0.2: Inspect current yolo launch semantics

Objective: Confirm mosaic yolo claude and mosaic yolo pi accept optional initial prompt text and behave well under systemd/tmux.

Files:

  • Read: packages/mosaic/src/**
  • Read: packages/mosaic/framework/runtime/claude/RUNTIME.md
  • Read: packages/mosaic/framework/runtime/pi/RUNTIME.md

Verification commands:

mosaic yolo claude --help
mosaic yolo pi --help

Expected: a systemd ExecStart can launch the runtime either with no prompt or with a kickstart prompt file/string.


Phase 1 — Framework tmux primitives

Task 1.1: Add socket support to send tools

Objective: Allow agent-send.sh and send-message.sh to target a named Mosaic tmux socket without affecting default tmux sessions.

Files:

  • Modify: packages/mosaic/framework/tools/tmux/send-message.sh
  • Modify: packages/mosaic/framework/tools/tmux/agent-send.sh
  • Modify: packages/mosaic/framework/tools/tmux/README.md
  • Test: packages/mosaic/framework/tools/tmux/test-send-message.sh (new)

Design:

Add optional flags:

-L SOCKET_NAME      # tmux -L socket name
-SOCKET PATH        # optional later if needed; avoid conflict with existing -S source label in agent-send

Because agent-send.sh already uses -S for source label, prefer -L for socket name and -T or --socket-path only if long-option parsing is added.

Implementation notes:

  • Build a tmux command array:
tmux_cmd=(tmux)
if [ -n "$SOCKET_NAME" ]; then tmux_cmd+=( -L "$SOCKET_NAME" ); fi
  • Replace raw tmux ... calls with "${tmux_cmd[@]}" ....
  • Pass -L through remote ssh invocation.
  • Include socket name in verbose output.

Verification:

tmux -L mosaic-test new-session -d -s target 'cat'
packages/mosaic/framework/tools/tmux/send-message.sh -L mosaic-test -t target -m 'hello'
tmux -L mosaic-test capture-pane -t target -p | grep hello
tmux -L mosaic-test kill-server

Expected: message lands in the named socket session; default tmux ls is untouched.

Task 1.2: Add exact target validation helper

Objective: Prevent accidental prefix targeting in all tmux fleet operations.

Files:

  • Create: packages/mosaic/framework/tools/tmux/_lib.sh
  • Modify: send-message.sh
  • Modify: agent-send.sh

Behavior:

  • For session-only agent names, normalize target to =<name> before kill/status/reset operations.
  • For explicit pane targets like session:window.pane, allow as advanced path but document the risk.

Verification:

Create sessions agent and agent0; verify killing/resetting agent does not affect agent0.


Phase 2 — systemd unit templates

Task 2.1: Add holder service template

Objective: Ship a user systemd unit template that owns the Mosaic factory tmux server.

Files:

  • Create: packages/mosaic/framework/systemd/user/mosaic-tmux-holder.service
  • Create: packages/mosaic/framework/tools/fleet/install-user-units.sh

Unit shape:

[Unit]
Description=Mosaic tmux fleet holder
Documentation=https://git.mosaicstack.dev/mosaicstack/aiguide

[Service]
Type=oneshot
RemainAfterExit=yes
Environment=MOSAIC_TMUX_SOCKET=mosaic-factory
ExecStart=/usr/bin/tmux -L ${MOSAIC_TMUX_SOCKET} new-session -d -s _holder 'while true; do sleep 3600; done'
ExecStop=-/usr/bin/tmux -L ${MOSAIC_TMUX_SOCKET} kill-server

[Install]
WantedBy=default.target

Important: systemd environment expansion in ExecStart is limited. Verify syntax; if %E/environment expansion is awkward, generate concrete units from config instead of relying on dynamic expansion.

Verification:

systemd-analyze --user verify ~/.config/systemd/user/mosaic-tmux-holder.service
systemctl --user daemon-reload
systemctl --user start mosaic-tmux-holder.service
tmux -L mosaic-factory ls | grep _holder

Task 2.2: Add agent service template

Objective: Ship a user systemd template that starts one configured agent slot.

Files:

  • Create: packages/mosaic/framework/systemd/user/mosaic-agent@.service
  • Modify: packages/mosaic/framework/tools/fleet/install-user-units.sh

Unit shape:

[Unit]
Description=Mosaic agent session %i
Requires=mosaic-tmux-holder.service
After=mosaic-tmux-holder.service
PartOf=mosaic-tmux-holder.service

[Service]
Type=oneshot
RemainAfterExit=yes
WorkingDirectory=%h/src
Environment=MOSAIC_TMUX_SOCKET=mosaic-factory
ExecStart=/bin/bash -lc 'tmux -L "$MOSAIC_TMUX_SOCKET" new-session -d -s "%i" "mosaic yolo $(mosaic fleet runtime %i)"'
ExecStop=-/usr/bin/tmux -L mosaic-factory kill-session -t '=%i'

[Install]
WantedBy=default.target

Design warning: command substitution in unit files can become brittle. Prefer a generated per-agent EnvironmentFile:

~/.config/mosaic/fleet/agents/coder0.env

with:

MOSAIC_AGENT_NAME=coder0
MOSAIC_AGENT_RUNTIME=claude
MOSAIC_AGENT_WORKDIR=/home/jarvis/src
MOSAIC_TMUX_SOCKET=mosaic-factory

Then ExecStart calls a wrapper:

~/.config/mosaic/tools/fleet/start-agent-session.sh

Verification:

systemd-analyze --user verify ~/.config/systemd/user/mosaic-agent@.service
systemctl --user start mosaic-agent@coder0.service
tmux -L mosaic-factory has-session -t '=coder0'
systemctl --user restart mosaic-agent@coder0.service

Expected: holder server PID remains unchanged; only coder0 session recycles.

Task 2.3: Add start-agent wrapper

Objective: Keep systemd units simple by moving config lookup and launch command construction into a script.

Files:

  • Create: packages/mosaic/framework/tools/fleet/start-agent-session.sh

Behavior:

Inputs:

start-agent-session.sh <agent-name>

Reads:

$MOSAIC_HOME/fleet/agents/<agent-name>.env

Starts:

tmux -L "$MOSAIC_TMUX_SOCKET" new-session -d -s "$MOSAIC_AGENT_NAME" -c "$MOSAIC_AGENT_WORKDIR" "mosaic yolo $MOSAIC_AGENT_RUNTIME"

Guardrails:

  • fail if runtime is empty;
  • fail if workdir does not exist;
  • no duplicate sessions unless --replace is passed;
  • exact session names only.

Phase 3 — roster config and CLI wrappers

Task 3.1: Add fleet config schema and examples

Objective: Define customizable install-time roster without hardcoding USC.

Files:

  • Create: packages/mosaic/framework/fleet/roster.schema.json
  • Create: packages/mosaic/framework/fleet/examples/minimal.yaml
  • Create: packages/mosaic/framework/fleet/examples/usc-software-factory.yaml
  • Create: packages/mosaic/framework/fleet/README.md

Schema concepts:

  • transport: tmux now; matrix later.
  • tmux.socket_name
  • tmux.holder_session
  • defaults.working_directory
  • agents[].name
  • agents[].runtime
  • agents[].class
  • agents[].model_hint
  • agents[].persistent_persona
  • agents[].reset_between_tasks
  • agents[].kickstart_template

Verification:

Use jq for JSON examples or add a small Python/YAML validator if YAML is chosen. If no YAML parser is guaranteed, store examples as JSON or support both with Python stdlib JSON first.

Task 3.2: Add mosaic fleet commands

Objective: Provide operator-safe commands for install/status/start/stop/restart/verify.

Files:

  • Modify: packages/mosaic/src/cli.ts or the current commander entrypoint.
  • Create scripts under: packages/mosaic/framework/tools/fleet/

Commands:

mosaic fleet init --profile minimal|usc --write
mosaic fleet install-systemd
mosaic fleet start [agent]
mosaic fleet stop [agent]
mosaic fleet restart [agent]
mosaic fleet status --json
mosaic fleet verify

Implementation path:

Start by wrapping framework shell scripts from the TypeScript CLI. Do not overbuild a TypeScript service manager in the first pass.

Task 3.3: Add mosaic agent commands

Objective: Provide transport-stable per-agent operations.

Files:

  • Modify: Mosaic CLI entrypoint.
  • Create: packages/mosaic/framework/tools/agent/ or reuse tools/tmux + tools/fleet.

Commands:

mosaic agent roster [--json]
mosaic agent status [agent] [--json]
mosaic agent send <agent> --message "..."
mosaic agent reset <agent> --clear|--new
mosaic agent tail <agent> [-n 80]

Reset behavior:

For tmux transport, reset --clear sends /clear then Enter through send-message.sh.

For Claude/Pi differences, keep reset command configurable per runtime:

runtimes:
  claude:
    reset_command: /clear
  pi:
    reset_command: /new

If a runtime does not support a known reset command, restart the service and send a fresh kickstart.


Phase 4 — this-server rollout strategy

Task 4.1: Install on separate socket first

Objective: Prove the holder pattern without disturbing existing sessions.

Commands after implementation lands locally:

mosaic fleet init --profile minimal --write
mosaic fleet install-systemd
systemctl --user daemon-reload
systemctl --user start mosaic-tmux-holder.service
mosaic fleet verify

Expected:

  • tmux -L mosaic-factory ls shows _holder.
  • normal tmux ls still shows existing sessions unchanged.

Task 4.2: Start one canary agent

Objective: Validate single-agent start/restart isolation.

Use a harmless canary first, not the full fleet.

Example roster addition:

- name: canary-pi
  runtime: pi
  class: canary
  working_directory: /home/jarvis/src

Commands:

systemctl --user start mosaic-agent@canary-pi.service
SRV=$(tmux -L mosaic-factory display-message -p '#{pid}')
systemctl --user restart mosaic-agent@canary-pi.service
test "$SRV" = "$(tmux -L mosaic-factory display-message -p '#{pid}')"
tmux -L mosaic-factory ls

Expected: holder PID unchanged; _holder remains; canary-pi recreated.

Task 4.3: Configure local Mosaic factory roster

Objective: Create the actual local roster for this server after canary passes.

Do not assume USC exact roster is desired here. Create a local profile such as:

~/.config/mosaic/fleet/roster.yaml

Initial local recommendation:

  • mos-claude orchestrator
  • coder0 / coder1 implementers
  • rev0 reviewer
  • secrev0 security reviewer
  • ultron final/adversarial reviewer

Scale to full USC-style pool only after resource/budget behavior is understood.

Task 4.4: Cut over existing ad-hoc tmux sessions only if desired

Objective: Avoid data loss.

Existing sessions on this server are not on the proposed mosaic-factory socket. They can remain untouched. If we later want them under Mosaic fleet control:

  1. list sessions;
  2. capture logs/handoffs;
  3. stop old processes intentionally;
  4. recreate as configured mosaic-agent@... services;
  5. verify comms and state.

Do not run tmux kill-server on the default socket unless Jason explicitly approves that outage.


Phase 5 — docs and AI Guide backfill

Task 5.1: Stack docs

Objective: Document install and customization for Mosaic Stack users.

Files:

  • Create: docs/fleet/tmux-fleet.md or packages/mosaic/framework/tools/fleet/README.md
  • Modify: top-level README.md if appropriate.

Must cover:

  • what problem holder service solves;
  • install commands;
  • customization file;
  • example rosters;
  • reset/reuse lifecycle;
  • exact-target safety;
  • separate socket default;
  • Matrix migration path.

Task 5.2: AI Guide docs

Objective: Keep generic guidance in AI Guide and implementation details in Stack.

Files in mosaicstack/aiguide:

  • Update: playbooks/tmux-fleet.md with named socket, roster/profile, and resettable-slot pattern.
  • Add or update: reference/agent-role-matrix.md if PR #5 lands.

Do not put Mosaic install commands as the only path in AI Guide. Present them as one implementation profile.


Phase 6 — Matrix migration seam

Task 6.1: Add transport enum but implement tmux only

Objective: Avoid hardcoding tmux into orchestration semantics.

Roster:

transport: tmux

Future:

transport: matrix
matrix:
  homeserver: https://matrix.example
  room_prefix: mosaic-factory

Task 6.2: Define transport interface docs

Objective: Make Matrix plugin work a transport swap, not a rewrite.

Minimum operations:

send(agent, message)
reset(agent, mode)
status(agent)
tail(agent)
listAgents()

Any tmux-specific concept must stay below this line.


Acceptance criteria

The implementation is complete when:

  • mosaic fleet init can write a minimal roster.
  • mosaic fleet install-systemd installs holder and agent units without hand editing.
  • mosaic fleet start starts the holder and configured agents on a named tmux socket.
  • Restarting one mosaic-agent@name.service does not change holder server PID or kill sibling sessions.
  • mosaic agent send can deliver a message to a named agent with a self-identifying preamble.
  • mosaic agent reset can clear/new a reusable slot and send a fresh kickstart.
  • mosaic fleet verify proves holder ownership, exact-target safety, and per-agent restart isolation.
  • Existing default tmux sessions on this server are not disturbed by default install.
  • Docs explain generic customization and include USC-style roster only as an example.
  • AI Guide remains generic; Mosaic Stack docs carry the concrete install path.

Risks and mitigations

Risk Mitigation
Killing existing tmux sessions Use named mosaic-factory socket; no default tmux kill-server.
systemd unit quoting/env expansion bugs Move logic into shell wrappers; verify with systemd-analyze --user verify.
Runtime reset command mismatch Make reset command runtime-configurable; fallback to service restart + kickstart.
Tool install drift Ensure npm package includes framework tmux/fleet tools; add packaging test.
Mosaic-specific assumptions leak into generic guide Keep USC roster as example profile; AI Guide documents pattern/options.
Matrix migration blocked by tmux coupling Add mosaic agent abstraction now; keep tmux details below transport layer.

Suggested first PR split

  1. PR A — tmux tool hardening

    • socket support;
    • exact target helpers;
    • tests/docs.
  2. PR B — fleet systemd primitives

    • holder unit;
    • agent unit;
    • start-agent wrapper;
    • install-user-units script;
    • verify script.
  3. PR C — roster and CLI

    • roster schema/examples;
    • mosaic fleet ... commands;
    • mosaic agent ... commands.
  4. PR D — local rollout and docs

    • local roster for this server;
    • run canary;
    • document verification evidence;
    • update AI Guide with generic lessons.

Immediate next action

Implement PR A first. It is low-risk, improves existing tools, and is required for a safe named-socket rollout on this server.