Compare commits

..

5 Commits

Author SHA1 Message Date
e30293950a fix(fleet): complete heartbeat reader/writer consistency + sidecar hardening
Some checks failed
ci/woodpecker/pr/ci Pipeline is pending
ci/woodpecker/push/ci Pipeline failed
F3 follow-on to #595 (HB consistency) — the items flagged in the #595 review:
- defaultMosaicHome() honors MOSAIC_HOME env (not just --mosaic-home flag), so the
  reader matches the writer/launcher when MOSAIC_HOME is set in the shell. The
  systemd guard now checks the LITERAL ~/.config/mosaic (units use %h paths).
- heartbeatPath() honors MOSAIC_HEARTBEAT_RUN_DIR (the writer sidecar's override).
- sidecar: printf %q the interpolated hb path / pid / interval (defense-in-depth).
- vitest: heartbeatPath env-resolution coverage.

Deferred to next F3 milestone (need deeper code work): agent-watch viewer-leak
try/finally fix, and the test-start-agent-session.sh workdir-assumption fix.

Refs #588 #542

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-21 18:31:19 -05:00
130837365f chore(release): bump @mosaicstack/mosaic 0.0.36 -> 0.0.37 (#597)
Some checks failed
ci/woodpecker/push/ci Pipeline failed
ci/woodpecker/push/publish Pipeline was successful
2026-06-21 23:27:14 +00:00
67df06f1c4 feat(fleet): orchestrator-mutable fleet — fleet add/remove (F5/R9) (#596)
Some checks are pending
ci/woodpecker/push/ci Pipeline is pending
ci/woodpecker/push/publish Pipeline is pending
2026-06-21 23:26:21 +00:00
60a309d5a4 fix(fleet): heartbeat consistency — MOSAIC_HOME path + configurable interval (#595)
Some checks are pending
ci/woodpecker/push/ci Pipeline is pending
ci/woodpecker/push/publish Pipeline is pending
Co-authored-by: Jason Woltje <jason@diversecanvas.com>
Co-committed-by: Jason Woltje <jason@diversecanvas.com>
2026-06-21 23:25:53 +00:00
2dc0f24828 docs(fleet): Fleet Suite PRD (init/configure/operate + Mos-on-Discord) (#588)
Some checks are pending
ci/woodpecker/push/ci Pipeline is pending
ci/woodpecker/push/publish Pipeline is pending
2026-06-21 23:17:10 +00:00
5 changed files with 176 additions and 8 deletions

View File

@@ -0,0 +1,105 @@
# PRD — Mosaic Fleet Suite (init, configure, operate)
> **Workstream:** W-FLEET (Fleet) under mission `mvp-20260312` · **Phase:** 3→4 productization
> **North star:** [docs/fleet/north-star.md](./north-star.md) · prior: Phase-2 observability (#579), durable launch (#581), real-agent enablement (#583/#584/#586), releases 0.0.350.0.37
> **Lead:** Jarvis @ `w-jarvis`. **Collaborator:** coder agent @ `dragon-lin` (jwoltje@10.1.10.37:coder0-0).
> Owner of this file: Fleet workstream lead. Does not modify MVP single-writer control-plane files.
## Mission
Turn the proven fleet primitives into a **user-installable, AI-free-configurable fleet product**:
a user runs `mosaic fleet init`, answers a few questions (general / coding / research / hybrid),
gets a recommended set of agents plus one always-on orchestrator wired for chat-ops, and can
operate, mutate, re-create, and observe the fleet — over tmux today and Matrix tomorrow — from
CLI/TUI and (designed-for) the webUI.
**Immediate tangible goal:** the **"Mos"** orchestrator agent running on `w-jarvis`, reachable
in **Discord channel `1517622518662434996`** (server `1112631390438166618`). Once the fleet is
functional, we use the fleet itself to continue the work.
## Requirements
### A. Configure-without-AI CLI
| ID | Requirement |
|---|---|
| R1 | `mosaic fleet` command set is functional end-to-end (init/install/start/stop/status/ps/verify + agent verbs). |
| R2 | `mosaic fleet init` is an interactive, **AI-free** CLI wizard. |
| R3 | Init asks the **configuration type**: `general`, `coding`, `research`, `hybrid`, … (extensible). |
| R4 | Based on the answer, the fleet is populated with a **recommended set of agents** (a preset). |
| R5 | **Exactly one main orchestrator agent** is always configured, regardless of type. |
| R10 | A set of **recommended configurations (presets)** ships for easy duplication. |
| R8 | User can **re-create** the fleet when config needs change (idempotent re-init / reconfigure). |
| R17 | Fleet controls are **simple and intuitive**. |
### B. Comms & orchestrator chat-ops
| ID | Requirement |
|---|---|
| R6 | Init can wire the orchestrator to a chat connector — **Telegram / Discord / Matrix / Slack** — for command + comms. |
| R7 | Designed with the end-goal of **Matrix comms on a locally-controlled server**. |
| R16 | Fleet supports **tmux AND Matrix** comms, **user-configurable** at init or any time. Not all users want Matrix. |
| R19 | **"Mos" orchestrator on Discord** (`chan 1517622518662434996` / `srv 1112631390438166618`) on `w-jarvis` — the first live target. |
### C. Runtime, health, lifecycle
| ID | Requirement |
|---|---|
| R9 | Fleet is **mutable by the orchestrator agent** — add/remove agents per need. |
| R13 | Fleet **gracefully handles Pi + Claude harness updates** — keep harnesses current. |
| R14 | The **Pi harness is customized** for proper tool usage, etc. |
| R15 | **Agent heartbeat** properly configured for **Claude AND GPT/Pi** agents. |
### D. Surfaces, testing, docs
| ID | Requirement |
|---|---|
| R18 | Fleet built so the **webUI can view / monitor / terminate / butt-in** on a session. |
| R11 | Installed and **tested on both `w-jarvis` and `dragon-lin`**. |
| R12 | **Documentation**: how to install, configure, and use the fleet. |
## Architecture / approach
- **Config model:** `roster.yaml` is the source of truth (already exists). Add **presets** (`general`/`coding`/`research`/`hybrid`) as shipped example rosters; `init` selects a preset, always injects the orchestrator, and writes the roster. Re-init = regenerate roster (preserve user/site overrides — mirrors install env-merge from #567).
- **Orchestrator agent:** always present; carries the chat connector config (connector type + target IDs) so it can be commanded over chat. tmux is the substrate; the connector bridges chat ↔ the orchestrator session.
- **Comms layers (R16):** (1) **tmux** inter-agent (`agent-send`, proven) — default, always available. (2) **chat connector** for human↔orchestrator (Discord now; Matrix the strategic target). (3) **Matrix** as the locally-controlled cross-agent bus (future). Connector is pluggable + reconfigurable.
- **Heartbeat (R15):** runtime-agnostic launcher sidecar already covers pi/claude/codex (#584). Refine per-runtime (native HB) with the **custom Pi harness** (R14) + a Claude path.
- **Updates (R13):** `mosaic update` (CLI) + a fleet-aware harness-update step that refreshes pi/claude/codex and re-launches agents safely (drain → update → relaunch via the durable launcher).
- **webUI (R18):** the fleet exposes machine-readable state (`fleet ps --json` already carries tenant/host/heartbeat/managed) + control verbs (start/stop/watch/send); webUI consumes these (control plane rides federation per north star). Ensure a stable JSON contract + a terminate/attach(butt-in) path.
## Phases (incremental, each shippable)
| Phase | Deliverable | Notes |
|---|---|---|
| **F1 Presets + init wizard** | preset rosters (general/coding/research/hybrid) + always-orchestrator + AI-free `fleet init` selecting a preset; re-init idempotent | R1R5, R8, R10, R17 |
| **F2 Connector + Mos-on-Discord** | orchestrator chat-connector config (Discord first) + **Mos live on Discord `1517…`/`1112…`** on w-jarvis | R6, R19, partial R16 |
| **F3 Heartbeat + harness** | HB confirmed for claude + pi/gpt; **custom Pi harness** (tool usage, native HB, model self-report); graceful harness updates | R13, R14, R15 |
| **F4 Matrix + comms toggle** | Matrix connector (local server) + user toggle tmux/Matrix at init/anytime | R7, R16 |
| **F5 Orchestrator-mutable fleet** | orchestrator can add/remove agents at runtime | R9 |
| **F6 webUI hooks** | stable JSON contract + terminate/attach surface for webUI view/monitor/terminate/butt-in | R18 |
| **F7 Test + docs** | install+test on w-jarvis AND dragon-lin; user docs (install/configure/use) | R11, R12 (runs alongside every phase) |
## Work division (proposed — confirm with dragon-lin)
- **Jarvis @ w-jarvis (Lead):** F1 presets+wizard, F2 connector+Mos-on-Discord, F5 mutability, F6 webUI hooks; merge authority + dual-engine reviews; co-testing on w-jarvis.
- **coder @ dragon-lin:** F3 custom Pi harness + harness-update flow (pi/codex-savvy); plus its in-flight constitution P4P6 (P4 installer rework underpins `fleet init`/updates — coordinate the install path). Co-testing on dragon-lin (R11).
- **Shared:** F4 Matrix (whoever has bandwidth); F7 testing/docs continuous.
## Immediate target: Mos on Discord (F2 first slice)
The discord plugin is available (`~/.claude.json`). Path: configure the **orchestrator** as a durable
fleet session running Claude Code with the discord plugin bridged to channel `1517622518662434996`
(server `1112631390438166618`) on w-jarvis, with the existing Discord Bridge Protocol (ack within
~3s, reply via `mcp__discord__reply`, no `AskUserQuestion`). Heartbeat via the launcher sidecar.
## Success criteria
- A non-AI user can `mosaic fleet init`, pick a type, and get a working fleet + orchestrator.
- **Mos answers in Discord `1517…`** on w-jarvis.
- Fleet runs + is observable (`fleet ps`) on **both** w-jarvis and dragon-lin.
- Harness updates handled gracefully; HB healthy for claude + pi/gpt agents.
- Docs let a new operator install/configure/use the fleet.
- Re-init + orchestrator mutation work.
## Assumptions (veto-able)
- `ASSUMPTION:` presets ship as example rosters under the framework (`fleet/examples/*.yaml`), selected by `init`.
- `ASSUMPTION:` chat connectors are pluggable; Discord first (target exists), Matrix is the strategic default later.
- `ASSUMPTION:` "Mos" = a Claude Code orchestrator session with the discord plugin (reuses the documented Discord Bridge Protocol).
- `ASSUMPTION:` per north star, runtimes default to Codex/pi-on-Codex for workers; the orchestrator "Mos" runs Claude Code (in Claude Code, which is allowed).

View File

@@ -6,7 +6,7 @@ MOSAIC_TMUX_SOCKET=${MOSAIC_TMUX_SOCKET:-mosaic-factory}
MOSAIC_AGENT_RUNTIME=${MOSAIC_AGENT_RUNTIME:-pi}
MOSAIC_AGENT_WORKDIR=${MOSAIC_AGENT_WORKDIR:-$HOME}
MOSAIC_AGENT_COMMAND=${MOSAIC_AGENT_COMMAND:-}
MOSAIC_HEARTBEAT_RUN_DIR=${MOSAIC_HEARTBEAT_RUN_DIR:-$HOME/.config/mosaic/fleet/run}
MOSAIC_HEARTBEAT_RUN_DIR=${MOSAIC_HEARTBEAT_RUN_DIR:-${MOSAIC_HOME:-$HOME/.config/mosaic}/fleet/run}
MOSAIC_HEARTBEAT_INTERVAL=${MOSAIC_HEARTBEAT_INTERVAL:-15}
if [ -z "$AGENT_NAME" ]; then
@@ -129,7 +129,7 @@ _start_heartbeat_sidecar() {
# references to any variables from this script's environment.
local sidecar_script
sidecar_script=$(printf \
'hb=%s; pid=%s; iv=%s; mkdir -p "$(dirname "$hb")"; while kill -0 "$pid" 2>/dev/null; do tmp="$hb.tmp.$$"; printf "ts=%%s\npid=%%s\nstatus=ok\n" "$(date +%%Y-%%m-%%dT%%H:%%M:%%S%%z)" "$pid" > "$tmp" && mv "$tmp" "$hb"; sleep "$iv"; done' \
'hb=%q; pid=%q; iv=%q; mkdir -p "$(dirname "$hb")"; while kill -0 "$pid" 2>/dev/null; do tmp="$hb.tmp.$$"; printf "ts=%%s\npid=%%s\nstatus=ok\n" "$(date +%%Y-%%m-%%dT%%H:%%M:%%S%%z)" "$pid" > "$tmp" && mv "$tmp" "$hb"; sleep "$iv"; done' \
"$hb_file" "$pane_pid" "$interval")
# setsid + disown ensures the sidecar survives this script exiting.

View File

@@ -1,6 +1,6 @@
{
"name": "@mosaicstack/mosaic",
"version": "0.0.36",
"version": "0.0.37",
"repository": {
"type": "git",
"url": "https://git.mosaicstack.dev/mosaicstack/stack.git",

View File

@@ -856,6 +856,23 @@ describe('fleet ps — heartbeat parsing', () => {
expect(hb.health).toBe('unknown');
expect(hb.ts).toBeNull();
});
it('honors MOSAIC_HEARTBEAT_INTERVAL for the freshness threshold', () => {
const prev = process.env.MOSAIC_HEARTBEAT_INTERVAL;
try {
// A 60s-old beat is STALE at the default 15s interval (3x15 = 45s)...
const ts = new Date(NOW - 60_000).toISOString();
const content = `ts=${ts}\npid=1\nstatus=ok\n`;
delete process.env.MOSAIC_HEARTBEAT_INTERVAL;
expect(parseHeartbeat(content, NOW).health).toBe('stale');
// ...but HEALTHY when the operator widened the interval to 30s (3x30 = 90s).
process.env.MOSAIC_HEARTBEAT_INTERVAL = '30';
expect(parseHeartbeat(content, NOW).health).toBe('healthy');
} finally {
if (prev === undefined) delete process.env.MOSAIC_HEARTBEAT_INTERVAL;
else process.env.MOSAIC_HEARTBEAT_INTERVAL = prev;
}
});
});
describe('fleet ps — systemd show parsing', () => {
@@ -2875,3 +2892,33 @@ describe('fleet init wizard', () => {
expect(content).toContain('name: coder0');
});
});
describe('fleet ps — heartbeat path resolution', () => {
const savedRunDir = process.env.MOSAIC_HEARTBEAT_RUN_DIR;
const savedHome = process.env.MOSAIC_HOME;
afterEach(() => {
if (savedRunDir === undefined) delete process.env.MOSAIC_HEARTBEAT_RUN_DIR;
else process.env.MOSAIC_HEARTBEAT_RUN_DIR = savedRunDir;
if (savedHome === undefined) delete process.env.MOSAIC_HOME;
else process.env.MOSAIC_HOME = savedHome;
});
it('honors MOSAIC_HEARTBEAT_RUN_DIR (matches the writer sidecar override)', () => {
process.env.MOSAIC_HEARTBEAT_RUN_DIR = '/run/hb';
expect(heartbeatPath('agent-x', '/any/home')).toBe(join('/run/hb', 'agent-x.hb'));
});
it('honors MOSAIC_HOME when no explicit mosaicHome is given', () => {
delete process.env.MOSAIC_HEARTBEAT_RUN_DIR;
process.env.MOSAIC_HOME = '/custom/mhome';
expect(heartbeatPath('agent-y')).toBe(join('/custom/mhome', 'fleet', 'run', 'agent-y.hb'));
});
it('falls back to <mosaicHome>/fleet/run by default', () => {
delete process.env.MOSAIC_HEARTBEAT_RUN_DIR;
delete process.env.MOSAIC_HOME;
expect(heartbeatPath('agent-z', '/home/u/.config/mosaic')).toBe(
join('/home/u/.config/mosaic', 'fleet', 'run', 'agent-z.hb'),
);
});
});

View File

@@ -152,13 +152,16 @@ export function resolveFleetPaths(mosaicHome = defaultMosaicHome()): FleetPaths
}
function defaultMosaicHome(): string {
return join(homedir(), '.config', 'mosaic');
// Honor MOSAIC_HOME so the reader matches the writer sidecar (and the launcher),
// even when MOSAIC_HOME is set in the shell without an explicit --mosaic-home flag.
return process.env.MOSAIC_HOME ?? join(homedir(), '.config', 'mosaic');
}
function assertDefaultMosaicHomeForSystemd(mosaicHome: string): void {
if (resolve(mosaicHome) !== resolve(defaultMosaicHome())) {
const literalHome = join(homedir(), '.config', 'mosaic');
if (resolve(mosaicHome) !== resolve(literalHome)) {
throw new Error(
`install-systemd only supports the default Mosaic home (${defaultMosaicHome()}) because the user systemd units use %h/.config/mosaic paths.`,
`install-systemd only supports the default Mosaic home (${literalHome}) because the user systemd units use %h/.config/mosaic paths.`,
);
}
}
@@ -368,6 +371,16 @@ export function buildAgentTailCommand(
// ---------------------------------------------------------------------------
export const HEARTBEAT_INTERVAL_MS = 15_000;
/**
* Heartbeat interval in ms, honoring MOSAIC_HEARTBEAT_INTERVAL (seconds) so the
* `fleet ps` freshness threshold matches the writer sidecar's actual cadence
* (start-agent-session.sh). Falls back to HEARTBEAT_INTERVAL_MS (15s).
*/
export function heartbeatIntervalMs(): number {
const sec = Number.parseInt(process.env.MOSAIC_HEARTBEAT_INTERVAL ?? '', 10);
return Number.isFinite(sec) && sec > 0 ? sec * 1000 : HEARTBEAT_INTERVAL_MS;
}
export const HEARTBEAT_HEALTHY_MULTIPLIER = 3;
export interface HeartbeatInfo {
@@ -465,7 +478,10 @@ export function parseTmuxListSessions(output: string): string[] {
* Returns the heartbeat file path for an agent.
*/
export function heartbeatPath(agentName: string, mosaicHome = defaultMosaicHome()): string {
return join(mosaicHome, 'fleet', 'run', `${agentName}.hb`);
// Honor MOSAIC_HEARTBEAT_RUN_DIR (the writer sidecar's override); otherwise the
// canonical <mosaicHome>/fleet/run. Keeps reader and writer on the same path.
const runDir = process.env.MOSAIC_HEARTBEAT_RUN_DIR ?? join(mosaicHome, 'fleet', 'run');
return join(runDir, `${agentName}.hb`);
}
/**
@@ -496,7 +512,7 @@ export function parseHeartbeat(content: string | null, nowMs = Date.now()): Hear
status = val;
}
}
const thresholdMs = HEARTBEAT_INTERVAL_MS * HEARTBEAT_HEALTHY_MULTIPLIER;
const thresholdMs = heartbeatIntervalMs() * HEARTBEAT_HEALTHY_MULTIPLIER;
let health: 'healthy' | 'stale' | 'unknown' = 'unknown';
let ageMs: number | null = null;
if (ts !== null) {