Compare commits
9 Commits
feat/fleet
...
feat/f3-m2
| Author | SHA1 | Date | |
|---|---|---|---|
| 5c643cd54e | |||
| b26bbb02e9 | |||
| bda38bddc1 | |||
| 56e5c35678 | |||
| 6ffb27787e | |||
| 130837365f | |||
| 67df06f1c4 | |||
| 60a309d5a4 | |||
| 2dc0f24828 |
105
docs/fleet/PRD-fleet-suite.md
Normal file
105
docs/fleet/PRD-fleet-suite.md
Normal file
@@ -0,0 +1,105 @@
|
|||||||
|
# PRD — Mosaic Fleet Suite (init, configure, operate)
|
||||||
|
|
||||||
|
> **Workstream:** W-FLEET (Fleet) under mission `mvp-20260312` · **Phase:** 3→4 productization
|
||||||
|
> **North star:** [docs/fleet/north-star.md](./north-star.md) · prior: Phase-2 observability (#579), durable launch (#581), real-agent enablement (#583/#584/#586), releases 0.0.35–0.0.37
|
||||||
|
> **Lead:** Jarvis @ `w-jarvis`. **Collaborator:** coder agent @ `dragon-lin` (jwoltje@10.1.10.37:coder0-0).
|
||||||
|
> Owner of this file: Fleet workstream lead. Does not modify MVP single-writer control-plane files.
|
||||||
|
|
||||||
|
## Mission
|
||||||
|
|
||||||
|
Turn the proven fleet primitives into a **user-installable, AI-free-configurable fleet product**:
|
||||||
|
a user runs `mosaic fleet init`, answers a few questions (general / coding / research / hybrid),
|
||||||
|
gets a recommended set of agents plus one always-on orchestrator wired for chat-ops, and can
|
||||||
|
operate, mutate, re-create, and observe the fleet — over tmux today and Matrix tomorrow — from
|
||||||
|
CLI/TUI and (designed-for) the webUI.
|
||||||
|
|
||||||
|
**Immediate tangible goal:** the **"Mos"** orchestrator agent running on `w-jarvis`, reachable
|
||||||
|
in **Discord channel `1517622518662434996`** (server `1112631390438166618`). Once the fleet is
|
||||||
|
functional, we use the fleet itself to continue the work.
|
||||||
|
|
||||||
|
## Requirements
|
||||||
|
|
||||||
|
### A. Configure-without-AI CLI
|
||||||
|
| ID | Requirement |
|
||||||
|
|---|---|
|
||||||
|
| R1 | `mosaic fleet` command set is functional end-to-end (init/install/start/stop/status/ps/verify + agent verbs). |
|
||||||
|
| R2 | `mosaic fleet init` is an interactive, **AI-free** CLI wizard. |
|
||||||
|
| R3 | Init asks the **configuration type**: `general`, `coding`, `research`, `hybrid`, … (extensible). |
|
||||||
|
| R4 | Based on the answer, the fleet is populated with a **recommended set of agents** (a preset). |
|
||||||
|
| R5 | **Exactly one main orchestrator agent** is always configured, regardless of type. |
|
||||||
|
| R10 | A set of **recommended configurations (presets)** ships for easy duplication. |
|
||||||
|
| R8 | User can **re-create** the fleet when config needs change (idempotent re-init / reconfigure). |
|
||||||
|
| R17 | Fleet controls are **simple and intuitive**. |
|
||||||
|
|
||||||
|
### B. Comms & orchestrator chat-ops
|
||||||
|
| ID | Requirement |
|
||||||
|
|---|---|
|
||||||
|
| R6 | Init can wire the orchestrator to a chat connector — **Telegram / Discord / Matrix / Slack** — for command + comms. |
|
||||||
|
| R7 | Designed with the end-goal of **Matrix comms on a locally-controlled server**. |
|
||||||
|
| R16 | Fleet supports **tmux AND Matrix** comms, **user-configurable** at init or any time. Not all users want Matrix. |
|
||||||
|
| R19 | **"Mos" orchestrator on Discord** (`chan 1517622518662434996` / `srv 1112631390438166618`) on `w-jarvis` — the first live target. |
|
||||||
|
|
||||||
|
### C. Runtime, health, lifecycle
|
||||||
|
| ID | Requirement |
|
||||||
|
|---|---|
|
||||||
|
| R9 | Fleet is **mutable by the orchestrator agent** — add/remove agents per need. |
|
||||||
|
| R13 | Fleet **gracefully handles Pi + Claude harness updates** — keep harnesses current. |
|
||||||
|
| R14 | The **Pi harness is customized** for proper tool usage, etc. |
|
||||||
|
| R15 | **Agent heartbeat** properly configured for **Claude AND GPT/Pi** agents. |
|
||||||
|
|
||||||
|
### D. Surfaces, testing, docs
|
||||||
|
| ID | Requirement |
|
||||||
|
|---|---|
|
||||||
|
| R18 | Fleet built so the **webUI can view / monitor / terminate / butt-in** on a session. |
|
||||||
|
| R11 | Installed and **tested on both `w-jarvis` and `dragon-lin`**. |
|
||||||
|
| R12 | **Documentation**: how to install, configure, and use the fleet. |
|
||||||
|
|
||||||
|
## Architecture / approach
|
||||||
|
|
||||||
|
- **Config model:** `roster.yaml` is the source of truth (already exists). Add **presets** (`general`/`coding`/`research`/`hybrid`) as shipped example rosters; `init` selects a preset, always injects the orchestrator, and writes the roster. Re-init = regenerate roster (preserve user/site overrides — mirrors install env-merge from #567).
|
||||||
|
- **Orchestrator agent:** always present; carries the chat connector config (connector type + target IDs) so it can be commanded over chat. tmux is the substrate; the connector bridges chat ↔ the orchestrator session.
|
||||||
|
- **Comms layers (R16):** (1) **tmux** inter-agent (`agent-send`, proven) — default, always available. (2) **chat connector** for human↔orchestrator (Discord now; Matrix the strategic target). (3) **Matrix** as the locally-controlled cross-agent bus (future). Connector is pluggable + reconfigurable.
|
||||||
|
- **Heartbeat (R15):** runtime-agnostic launcher sidecar already covers pi/claude/codex (#584). Refine per-runtime (native HB) with the **custom Pi harness** (R14) + a Claude path.
|
||||||
|
- **Updates (R13):** `mosaic update` (CLI) + a fleet-aware harness-update step that refreshes pi/claude/codex and re-launches agents safely (drain → update → relaunch via the durable launcher).
|
||||||
|
- **webUI (R18):** the fleet exposes machine-readable state (`fleet ps --json` already carries tenant/host/heartbeat/managed) + control verbs (start/stop/watch/send); webUI consumes these (control plane rides federation per north star). Ensure a stable JSON contract + a terminate/attach(butt-in) path.
|
||||||
|
|
||||||
|
## Phases (incremental, each shippable)
|
||||||
|
|
||||||
|
| Phase | Deliverable | Notes |
|
||||||
|
|---|---|---|
|
||||||
|
| **F1 Presets + init wizard** | preset rosters (general/coding/research/hybrid) + always-orchestrator + AI-free `fleet init` selecting a preset; re-init idempotent | R1–R5, R8, R10, R17 |
|
||||||
|
| **F2 Connector + Mos-on-Discord** | orchestrator chat-connector config (Discord first) + **Mos live on Discord `1517…`/`1112…`** on w-jarvis | R6, R19, partial R16 |
|
||||||
|
| **F3 Heartbeat + harness** | HB confirmed for claude + pi/gpt; **custom Pi harness** (tool usage, native HB, model self-report); graceful harness updates | R13, R14, R15 |
|
||||||
|
| **F4 Matrix + comms toggle** | Matrix connector (local server) + user toggle tmux/Matrix at init/anytime | R7, R16 |
|
||||||
|
| **F5 Orchestrator-mutable fleet** | orchestrator can add/remove agents at runtime | R9 |
|
||||||
|
| **F6 webUI hooks** | stable JSON contract + terminate/attach surface for webUI view/monitor/terminate/butt-in | R18 |
|
||||||
|
| **F7 Test + docs** | install+test on w-jarvis AND dragon-lin; user docs (install/configure/use) | R11, R12 (runs alongside every phase) |
|
||||||
|
|
||||||
|
## Work division (proposed — confirm with dragon-lin)
|
||||||
|
|
||||||
|
- **Jarvis @ w-jarvis (Lead):** F1 presets+wizard, F2 connector+Mos-on-Discord, F5 mutability, F6 webUI hooks; merge authority + dual-engine reviews; co-testing on w-jarvis.
|
||||||
|
- **coder @ dragon-lin:** F3 custom Pi harness + harness-update flow (pi/codex-savvy); plus its in-flight constitution P4–P6 (P4 installer rework underpins `fleet init`/updates — coordinate the install path). Co-testing on dragon-lin (R11).
|
||||||
|
- **Shared:** F4 Matrix (whoever has bandwidth); F7 testing/docs continuous.
|
||||||
|
|
||||||
|
## Immediate target: Mos on Discord (F2 first slice)
|
||||||
|
|
||||||
|
The discord plugin is available (`~/.claude.json`). Path: configure the **orchestrator** as a durable
|
||||||
|
fleet session running Claude Code with the discord plugin bridged to channel `1517622518662434996`
|
||||||
|
(server `1112631390438166618`) on w-jarvis, with the existing Discord Bridge Protocol (ack within
|
||||||
|
~3s, reply via `mcp__discord__reply`, no `AskUserQuestion`). Heartbeat via the launcher sidecar.
|
||||||
|
|
||||||
|
## Success criteria
|
||||||
|
|
||||||
|
- A non-AI user can `mosaic fleet init`, pick a type, and get a working fleet + orchestrator.
|
||||||
|
- **Mos answers in Discord `1517…`** on w-jarvis.
|
||||||
|
- Fleet runs + is observable (`fleet ps`) on **both** w-jarvis and dragon-lin.
|
||||||
|
- Harness updates handled gracefully; HB healthy for claude + pi/gpt agents.
|
||||||
|
- Docs let a new operator install/configure/use the fleet.
|
||||||
|
- Re-init + orchestrator mutation work.
|
||||||
|
|
||||||
|
## Assumptions (veto-able)
|
||||||
|
|
||||||
|
- `ASSUMPTION:` presets ship as example rosters under the framework (`fleet/examples/*.yaml`), selected by `init`.
|
||||||
|
- `ASSUMPTION:` chat connectors are pluggable; Discord first (target exists), Matrix is the strategic default later.
|
||||||
|
- `ASSUMPTION:` "Mos" = a Claude Code orchestrator session with the discord plugin (reuses the documented Discord Bridge Protocol).
|
||||||
|
- `ASSUMPTION:` per north star, runtimes default to Codex/pi-on-Codex for workers; the orchestrator "Mos" runs Claude Code (in Claude Code, which is allowed).
|
||||||
@@ -9,8 +9,16 @@
|
|||||||
* 4. Memory routing — remind agent to use ~/.config/mosaic/memory/
|
* 4. Memory routing — remind agent to use ~/.config/mosaic/memory/
|
||||||
*/
|
*/
|
||||||
|
|
||||||
import type { ExtensionAPI } from '@mariozechner/pi-coding-agent';
|
import type { ExtensionAPI, ExtensionContext } from '@earendil-works/pi-coding-agent';
|
||||||
import { existsSync, readFileSync, writeFileSync, unlinkSync, mkdirSync } from 'node:fs';
|
import { Type } from 'typebox';
|
||||||
|
import {
|
||||||
|
existsSync,
|
||||||
|
readFileSync,
|
||||||
|
writeFileSync,
|
||||||
|
unlinkSync,
|
||||||
|
mkdirSync,
|
||||||
|
renameSync,
|
||||||
|
} from 'node:fs';
|
||||||
import { join, basename } from 'node:path';
|
import { join, basename } from 'node:path';
|
||||||
import { homedir } from 'node:os';
|
import { homedir } from 'node:os';
|
||||||
import { execSync, spawnSync } from 'node:child_process';
|
import { execSync, spawnSync } from 'node:child_process';
|
||||||
@@ -25,6 +33,57 @@ const MOSAIC_HOME = process.env['MOSAIC_HOME'] ?? join(homedir(), '.config', 'mo
|
|||||||
// Helpers
|
// Helpers
|
||||||
// ---------------------------------------------------------------------------
|
// ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
// ---------------------------------------------------------------------------
|
||||||
|
// Native heartbeat (fleet R14/R15)
|
||||||
|
// ---------------------------------------------------------------------------
|
||||||
|
// When this agent runs under the Mosaic fleet (MOSAIC_AGENT_NAME set), the
|
||||||
|
// extension writes its OWN heartbeat in the same .hb contract `fleet ps` reads
|
||||||
|
// (ts/pid/status[/model]) and touches a `.hb.native` precedence marker so the
|
||||||
|
// shell sidecar defers. Native HB knows the real turn state (busy/ok), so it is
|
||||||
|
// more accurate than the pane-PID-only sidecar fallback.
|
||||||
|
const HB_AGENT_NAME = process.env['MOSAIC_AGENT_NAME'] ?? '';
|
||||||
|
const HB_RUN_DIR = process.env['MOSAIC_HEARTBEAT_RUN_DIR'] ?? join(MOSAIC_HOME, 'fleet', 'run');
|
||||||
|
const HB_INTERVAL_MS = (() => {
|
||||||
|
const s = Number.parseInt(process.env['MOSAIC_HEARTBEAT_INTERVAL'] ?? '', 10);
|
||||||
|
return Number.isFinite(s) && s > 0 ? s * 1000 : 15_000;
|
||||||
|
})();
|
||||||
|
|
||||||
|
function nativeHbEnabled(): boolean {
|
||||||
|
return HB_AGENT_NAME.length > 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
function readModelId(ctx: ExtensionContext): string | null {
|
||||||
|
const m = ctx.model as unknown as { id?: string; name?: string } | undefined;
|
||||||
|
return m?.id ?? m?.name ?? null;
|
||||||
|
}
|
||||||
|
|
||||||
|
function writeNativeHeartbeat(status: 'ok' | 'busy', model: string | null): void {
|
||||||
|
if (!nativeHbEnabled()) return;
|
||||||
|
try {
|
||||||
|
mkdirSync(HB_RUN_DIR, { recursive: true });
|
||||||
|
const hb = join(HB_RUN_DIR, `${HB_AGENT_NAME}.hb`);
|
||||||
|
const lines = [`ts=${nowIso()}`, `pid=${process.pid}`, `status=${status}`];
|
||||||
|
if (model) lines.push(`model=${model}`);
|
||||||
|
const tmp = `${hb}.tmp.${process.pid}`;
|
||||||
|
writeFileSync(tmp, lines.join('\n') + '\n');
|
||||||
|
renameSync(tmp, hb); // atomic replace — fleet ps never reads a partial file
|
||||||
|
// Precedence marker: tells the shell sidecar that native HB is authoritative.
|
||||||
|
writeFileSync(join(HB_RUN_DIR, `${HB_AGENT_NAME}.hb.native`), nowIso() + '\n');
|
||||||
|
} catch {
|
||||||
|
// Best-effort: never let heartbeat I/O disrupt the Pi session.
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
function clearNativeMarker(): void {
|
||||||
|
if (!nativeHbEnabled()) return;
|
||||||
|
try {
|
||||||
|
const m = join(HB_RUN_DIR, `${HB_AGENT_NAME}.hb.native`);
|
||||||
|
if (existsSync(m)) unlinkSync(m); // native stopping — let the sidecar take over
|
||||||
|
} catch {
|
||||||
|
/* ignore */
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
function safeRead(filePath: string): string | null {
|
function safeRead(filePath: string): string | null {
|
||||||
try {
|
try {
|
||||||
return readFileSync(filePath, 'utf-8');
|
return readFileSync(filePath, 'utf-8');
|
||||||
@@ -187,6 +246,9 @@ function buildMissionSummary(cwd: string, mission: ActiveMission): string {
|
|||||||
|
|
||||||
export default function register(pi: ExtensionAPI) {
|
export default function register(pi: ExtensionAPI) {
|
||||||
let sessionCwd = process.cwd();
|
let sessionCwd = process.cwd();
|
||||||
|
let hbStatus: 'ok' | 'busy' = 'ok';
|
||||||
|
let hbModel: string | null = null;
|
||||||
|
let hbTimer: ReturnType<typeof setInterval> | null = null;
|
||||||
|
|
||||||
// ── Session Start ─────────────────────────────────────────────────────
|
// ── Session Start ─────────────────────────────────────────────────────
|
||||||
pi.on('session_start', async (_event, ctx) => {
|
pi.on('session_start', async (_event, ctx) => {
|
||||||
@@ -207,10 +269,39 @@ export default function register(pi: ExtensionAPI) {
|
|||||||
} else {
|
} else {
|
||||||
ctx.ui.notify('Mosaic framework loaded', 'info');
|
ctx.ui.notify('Mosaic framework loaded', 'info');
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Native heartbeat: write immediately, then on an interval. Idle = 'ok';
|
||||||
|
// turn_start/turn_end flip the status so `fleet ps` reflects real activity.
|
||||||
|
if (nativeHbEnabled()) {
|
||||||
|
hbModel = readModelId(ctx);
|
||||||
|
writeNativeHeartbeat('ok', hbModel);
|
||||||
|
hbTimer = setInterval(() => writeNativeHeartbeat(hbStatus, hbModel), HB_INTERVAL_MS);
|
||||||
|
if (typeof hbTimer.unref === 'function') hbTimer.unref();
|
||||||
|
}
|
||||||
});
|
});
|
||||||
|
|
||||||
// ── Session End ───────────────────────────────────────────────────────
|
// ── Turn lifecycle → accurate busy/ok heartbeat ───────────────────────
|
||||||
pi.on('session_end', async (_event, _ctx) => {
|
pi.on('turn_start', async (_event, ctx) => {
|
||||||
|
hbStatus = 'busy';
|
||||||
|
hbModel = readModelId(ctx) ?? hbModel;
|
||||||
|
writeNativeHeartbeat('busy', hbModel);
|
||||||
|
});
|
||||||
|
pi.on('turn_end', async (_event, ctx) => {
|
||||||
|
hbStatus = 'ok';
|
||||||
|
hbModel = readModelId(ctx) ?? hbModel;
|
||||||
|
writeNativeHeartbeat('ok', hbModel);
|
||||||
|
});
|
||||||
|
|
||||||
|
// ── Session Shutdown ──────────────────────────────────────────────────
|
||||||
|
// (The pi API event is 'session_shutdown'; the prior 'session_end' handler
|
||||||
|
// never fired — fixed here so repo hooks + lock cleanup actually run.)
|
||||||
|
pi.on('session_shutdown', async (_event, _ctx) => {
|
||||||
|
if (hbTimer) {
|
||||||
|
clearInterval(hbTimer);
|
||||||
|
hbTimer = null;
|
||||||
|
}
|
||||||
|
clearNativeMarker();
|
||||||
|
|
||||||
// Run repo session-end hook
|
// Run repo session-end hook
|
||||||
runRepoHook(sessionCwd, 'session-end');
|
runRepoHook(sessionCwd, 'session-end');
|
||||||
|
|
||||||
@@ -252,4 +343,32 @@ export default function register(pi: ExtensionAPI) {
|
|||||||
}
|
}
|
||||||
},
|
},
|
||||||
});
|
});
|
||||||
|
|
||||||
|
// ── Register mosaic_mission_status tool (model-callable) ──────────────
|
||||||
|
// R14 "proper tool usage": give the agent a first-class tool to load its
|
||||||
|
// active Mosaic mission, milestone progress, task counts, and latest
|
||||||
|
// scratchpad — so it self-orients on in-flight work before planning,
|
||||||
|
// instead of shelling out or guessing. Mirrors the /mosaic-status command
|
||||||
|
// but returns the summary as tool output the LLM can read.
|
||||||
|
pi.registerTool({
|
||||||
|
name: 'mosaic_mission_status',
|
||||||
|
label: 'Mosaic Mission Status',
|
||||||
|
description:
|
||||||
|
'Return the active Mosaic mission, milestone progress, task counts, and latest scratchpad for the current project. Returns a note when no mission is active.',
|
||||||
|
promptSnippet: 'Read the active Mosaic mission + task state for the current project',
|
||||||
|
promptGuidelines: [
|
||||||
|
'Use mosaic_mission_status at the start of a session or task to load the active mission, milestone progress, and open tasks before planning work.',
|
||||||
|
],
|
||||||
|
parameters: Type.Object({}),
|
||||||
|
async execute(_toolCallId, _params, _signal, _onUpdate, _ctx) {
|
||||||
|
const mission = detectMission(sessionCwd);
|
||||||
|
const text = mission
|
||||||
|
? buildMissionSummary(sessionCwd, mission)
|
||||||
|
: 'No active Mosaic mission in this project.';
|
||||||
|
return {
|
||||||
|
content: [{ type: 'text', text }],
|
||||||
|
details: mission ? { ...mission } : { active: false },
|
||||||
|
};
|
||||||
|
},
|
||||||
|
});
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -6,7 +6,7 @@ MOSAIC_TMUX_SOCKET=${MOSAIC_TMUX_SOCKET:-mosaic-factory}
|
|||||||
MOSAIC_AGENT_RUNTIME=${MOSAIC_AGENT_RUNTIME:-pi}
|
MOSAIC_AGENT_RUNTIME=${MOSAIC_AGENT_RUNTIME:-pi}
|
||||||
MOSAIC_AGENT_WORKDIR=${MOSAIC_AGENT_WORKDIR:-$HOME}
|
MOSAIC_AGENT_WORKDIR=${MOSAIC_AGENT_WORKDIR:-$HOME}
|
||||||
MOSAIC_AGENT_COMMAND=${MOSAIC_AGENT_COMMAND:-}
|
MOSAIC_AGENT_COMMAND=${MOSAIC_AGENT_COMMAND:-}
|
||||||
MOSAIC_HEARTBEAT_RUN_DIR=${MOSAIC_HEARTBEAT_RUN_DIR:-$HOME/.config/mosaic/fleet/run}
|
MOSAIC_HEARTBEAT_RUN_DIR=${MOSAIC_HEARTBEAT_RUN_DIR:-${MOSAIC_HOME:-$HOME/.config/mosaic}/fleet/run}
|
||||||
MOSAIC_HEARTBEAT_INTERVAL=${MOSAIC_HEARTBEAT_INTERVAL:-15}
|
MOSAIC_HEARTBEAT_INTERVAL=${MOSAIC_HEARTBEAT_INTERVAL:-15}
|
||||||
|
|
||||||
if [ -z "$AGENT_NAME" ]; then
|
if [ -z "$AGENT_NAME" ]; then
|
||||||
@@ -90,11 +90,18 @@ MOSAIC_RUNTIME_BIN_PREFIX=$(_build_runtime_bin_prefix)
|
|||||||
#
|
#
|
||||||
# We build the snippet as a double-quoted here-string embedded in a printf call
|
# We build the snippet as a double-quoted here-string embedded in a printf call
|
||||||
# to avoid nested quoting problems.
|
# to avoid nested quoting problems.
|
||||||
|
#
|
||||||
|
# MOSAIC_AGENT_NAME must also be exported INTO the pane: panes inherit the tmux
|
||||||
|
# server environment (not this script's, and not the systemd unit's), so the
|
||||||
|
# name would otherwise be empty in-pane and the runtime's native heartbeat
|
||||||
|
# (which gates on MOSAIC_AGENT_NAME) would never fire. %q-quote it so it is a
|
||||||
|
# safe single bash token regardless of the name's characters.
|
||||||
|
AGENT_NAME_Q=$(printf '%q' "$AGENT_NAME")
|
||||||
|
|
||||||
if [ -n "$MOSAIC_RUNTIME_BIN_PREFIX" ]; then
|
if [ -n "$MOSAIC_RUNTIME_BIN_PREFIX" ]; then
|
||||||
PANE_SHELL_SNIPPET="export PATH=\"${MOSAIC_RUNTIME_BIN_PREFIX}:\${PATH}\"; exec ${MOSAIC_AGENT_COMMAND}"
|
PANE_SHELL_SNIPPET="export MOSAIC_AGENT_NAME=${AGENT_NAME_Q}; export PATH=\"${MOSAIC_RUNTIME_BIN_PREFIX}:\${PATH}\"; exec ${MOSAIC_AGENT_COMMAND}"
|
||||||
else
|
else
|
||||||
PANE_SHELL_SNIPPET="exec ${MOSAIC_AGENT_COMMAND}"
|
PANE_SHELL_SNIPPET="export MOSAIC_AGENT_NAME=${AGENT_NAME_Q}; exec ${MOSAIC_AGENT_COMMAND}"
|
||||||
fi
|
fi
|
||||||
|
|
||||||
mkdir -p "$MOSAIC_AGENT_WORKDIR"
|
mkdir -p "$MOSAIC_AGENT_WORKDIR"
|
||||||
@@ -129,7 +136,7 @@ _start_heartbeat_sidecar() {
|
|||||||
# references to any variables from this script's environment.
|
# references to any variables from this script's environment.
|
||||||
local sidecar_script
|
local sidecar_script
|
||||||
sidecar_script=$(printf \
|
sidecar_script=$(printf \
|
||||||
'hb=%s; pid=%s; iv=%s; mkdir -p "$(dirname "$hb")"; while kill -0 "$pid" 2>/dev/null; do tmp="$hb.tmp.$$"; printf "ts=%%s\npid=%%s\nstatus=ok\n" "$(date +%%Y-%%m-%%dT%%H:%%M:%%S%%z)" "$pid" > "$tmp" && mv "$tmp" "$hb"; sleep "$iv"; done' \
|
'hb=%q; pid=%q; iv=%q; mkdir -p "$(dirname "$hb")"; while kill -0 "$pid" 2>/dev/null; do nat="$hb.native"; if [ -f "$nat" ] && [ "$(( $(date +%%s) - $(stat -c %%Y "$nat" 2>/dev/null || echo 0) ))" -lt "$(( iv * 2 ))" ]; then sleep "$iv"; continue; fi; tmp="$hb.tmp.$$"; printf "ts=%%s\npid=%%s\nstatus=ok\n" "$(date +%%Y-%%m-%%dT%%H:%%M:%%S%%z)" "$pid" > "$tmp" && mv "$tmp" "$hb"; sleep "$iv"; done' \
|
||||||
"$hb_file" "$pane_pid" "$interval")
|
"$hb_file" "$pane_pid" "$interval")
|
||||||
|
|
||||||
# setsid + disown ensures the sidecar survives this script exiting.
|
# setsid + disown ensures the sidecar survives this script exiting.
|
||||||
|
|||||||
@@ -1,6 +1,6 @@
|
|||||||
{
|
{
|
||||||
"name": "@mosaicstack/mosaic",
|
"name": "@mosaicstack/mosaic",
|
||||||
"version": "0.0.36",
|
"version": "0.0.37",
|
||||||
"repository": {
|
"repository": {
|
||||||
"type": "git",
|
"type": "git",
|
||||||
"url": "https://git.mosaicstack.dev/mosaicstack/stack.git",
|
"url": "https://git.mosaicstack.dev/mosaicstack/stack.git",
|
||||||
|
|||||||
@@ -833,6 +833,17 @@ describe('fleet ps — heartbeat parsing', () => {
|
|||||||
expect(hb.pid).toBe(12345);
|
expect(hb.pid).toBe(12345);
|
||||||
expect(hb.status).toBe('ok');
|
expect(hb.status).toBe('ok');
|
||||||
expect(hb.ageMs).toBe(10_000);
|
expect(hb.ageMs).toBe(10_000);
|
||||||
|
// No model= line in a legacy/sidecar heartbeat → model is null.
|
||||||
|
expect(hb.model).toBeNull();
|
||||||
|
});
|
||||||
|
|
||||||
|
it('parses a self-reported model id from a native heartbeat (model= line)', () => {
|
||||||
|
const ts = new Date(NOW - 5_000).toISOString();
|
||||||
|
const content = `ts=${ts}\npid=42\nstatus=busy\nmodel=openai-codex/gpt-5.5:high\n`;
|
||||||
|
const hb = parseHeartbeat(content, NOW);
|
||||||
|
expect(hb.model).toBe('openai-codex/gpt-5.5:high');
|
||||||
|
expect(hb.status).toBe('busy');
|
||||||
|
expect(hb.health).toBe('healthy');
|
||||||
});
|
});
|
||||||
|
|
||||||
it('reports stale when heartbeat is older than 3×interval', () => {
|
it('reports stale when heartbeat is older than 3×interval', () => {
|
||||||
@@ -856,6 +867,23 @@ describe('fleet ps — heartbeat parsing', () => {
|
|||||||
expect(hb.health).toBe('unknown');
|
expect(hb.health).toBe('unknown');
|
||||||
expect(hb.ts).toBeNull();
|
expect(hb.ts).toBeNull();
|
||||||
});
|
});
|
||||||
|
|
||||||
|
it('honors MOSAIC_HEARTBEAT_INTERVAL for the freshness threshold', () => {
|
||||||
|
const prev = process.env.MOSAIC_HEARTBEAT_INTERVAL;
|
||||||
|
try {
|
||||||
|
// A 60s-old beat is STALE at the default 15s interval (3x15 = 45s)...
|
||||||
|
const ts = new Date(NOW - 60_000).toISOString();
|
||||||
|
const content = `ts=${ts}\npid=1\nstatus=ok\n`;
|
||||||
|
delete process.env.MOSAIC_HEARTBEAT_INTERVAL;
|
||||||
|
expect(parseHeartbeat(content, NOW).health).toBe('stale');
|
||||||
|
// ...but HEALTHY when the operator widened the interval to 30s (3x30 = 90s).
|
||||||
|
process.env.MOSAIC_HEARTBEAT_INTERVAL = '30';
|
||||||
|
expect(parseHeartbeat(content, NOW).health).toBe('healthy');
|
||||||
|
} finally {
|
||||||
|
if (prev === undefined) delete process.env.MOSAIC_HEARTBEAT_INTERVAL;
|
||||||
|
else process.env.MOSAIC_HEARTBEAT_INTERVAL = prev;
|
||||||
|
}
|
||||||
|
});
|
||||||
});
|
});
|
||||||
|
|
||||||
describe('fleet ps — systemd show parsing', () => {
|
describe('fleet ps — systemd show parsing', () => {
|
||||||
@@ -2875,3 +2903,33 @@ describe('fleet init wizard', () => {
|
|||||||
expect(content).toContain('name: coder0');
|
expect(content).toContain('name: coder0');
|
||||||
});
|
});
|
||||||
});
|
});
|
||||||
|
|
||||||
|
describe('fleet ps — heartbeat path resolution', () => {
|
||||||
|
const savedRunDir = process.env.MOSAIC_HEARTBEAT_RUN_DIR;
|
||||||
|
const savedHome = process.env.MOSAIC_HOME;
|
||||||
|
afterEach(() => {
|
||||||
|
if (savedRunDir === undefined) delete process.env.MOSAIC_HEARTBEAT_RUN_DIR;
|
||||||
|
else process.env.MOSAIC_HEARTBEAT_RUN_DIR = savedRunDir;
|
||||||
|
if (savedHome === undefined) delete process.env.MOSAIC_HOME;
|
||||||
|
else process.env.MOSAIC_HOME = savedHome;
|
||||||
|
});
|
||||||
|
|
||||||
|
it('honors MOSAIC_HEARTBEAT_RUN_DIR (matches the writer sidecar override)', () => {
|
||||||
|
process.env.MOSAIC_HEARTBEAT_RUN_DIR = '/run/hb';
|
||||||
|
expect(heartbeatPath('agent-x', '/any/home')).toBe(join('/run/hb', 'agent-x.hb'));
|
||||||
|
});
|
||||||
|
|
||||||
|
it('honors MOSAIC_HOME when no explicit mosaicHome is given', () => {
|
||||||
|
delete process.env.MOSAIC_HEARTBEAT_RUN_DIR;
|
||||||
|
process.env.MOSAIC_HOME = '/custom/mhome';
|
||||||
|
expect(heartbeatPath('agent-y')).toBe(join('/custom/mhome', 'fleet', 'run', 'agent-y.hb'));
|
||||||
|
});
|
||||||
|
|
||||||
|
it('falls back to <mosaicHome>/fleet/run by default', () => {
|
||||||
|
delete process.env.MOSAIC_HEARTBEAT_RUN_DIR;
|
||||||
|
delete process.env.MOSAIC_HOME;
|
||||||
|
expect(heartbeatPath('agent-z', '/home/u/.config/mosaic')).toBe(
|
||||||
|
join('/home/u/.config/mosaic', 'fleet', 'run', 'agent-z.hb'),
|
||||||
|
);
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|||||||
@@ -152,13 +152,16 @@ export function resolveFleetPaths(mosaicHome = defaultMosaicHome()): FleetPaths
|
|||||||
}
|
}
|
||||||
|
|
||||||
function defaultMosaicHome(): string {
|
function defaultMosaicHome(): string {
|
||||||
return join(homedir(), '.config', 'mosaic');
|
// Honor MOSAIC_HOME so the reader matches the writer sidecar (and the launcher),
|
||||||
|
// even when MOSAIC_HOME is set in the shell without an explicit --mosaic-home flag.
|
||||||
|
return process.env.MOSAIC_HOME ?? join(homedir(), '.config', 'mosaic');
|
||||||
}
|
}
|
||||||
|
|
||||||
function assertDefaultMosaicHomeForSystemd(mosaicHome: string): void {
|
function assertDefaultMosaicHomeForSystemd(mosaicHome: string): void {
|
||||||
if (resolve(mosaicHome) !== resolve(defaultMosaicHome())) {
|
const literalHome = join(homedir(), '.config', 'mosaic');
|
||||||
|
if (resolve(mosaicHome) !== resolve(literalHome)) {
|
||||||
throw new Error(
|
throw new Error(
|
||||||
`install-systemd only supports the default Mosaic home (${defaultMosaicHome()}) because the user systemd units use %h/.config/mosaic paths.`,
|
`install-systemd only supports the default Mosaic home (${literalHome}) because the user systemd units use %h/.config/mosaic paths.`,
|
||||||
);
|
);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -368,6 +371,16 @@ export function buildAgentTailCommand(
|
|||||||
// ---------------------------------------------------------------------------
|
// ---------------------------------------------------------------------------
|
||||||
|
|
||||||
export const HEARTBEAT_INTERVAL_MS = 15_000;
|
export const HEARTBEAT_INTERVAL_MS = 15_000;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Heartbeat interval in ms, honoring MOSAIC_HEARTBEAT_INTERVAL (seconds) so the
|
||||||
|
* `fleet ps` freshness threshold matches the writer sidecar's actual cadence
|
||||||
|
* (start-agent-session.sh). Falls back to HEARTBEAT_INTERVAL_MS (15s).
|
||||||
|
*/
|
||||||
|
export function heartbeatIntervalMs(): number {
|
||||||
|
const sec = Number.parseInt(process.env.MOSAIC_HEARTBEAT_INTERVAL ?? '', 10);
|
||||||
|
return Number.isFinite(sec) && sec > 0 ? sec * 1000 : HEARTBEAT_INTERVAL_MS;
|
||||||
|
}
|
||||||
export const HEARTBEAT_HEALTHY_MULTIPLIER = 3;
|
export const HEARTBEAT_HEALTHY_MULTIPLIER = 3;
|
||||||
|
|
||||||
export interface HeartbeatInfo {
|
export interface HeartbeatInfo {
|
||||||
@@ -377,6 +390,8 @@ export interface HeartbeatInfo {
|
|||||||
/** healthy | stale | unknown */
|
/** healthy | stale | unknown */
|
||||||
health: 'healthy' | 'stale' | 'unknown';
|
health: 'healthy' | 'stale' | 'unknown';
|
||||||
ageMs: number | null;
|
ageMs: number | null;
|
||||||
|
/** Model id the runtime self-reported in its heartbeat (native HB only), else null. */
|
||||||
|
model: string | null;
|
||||||
}
|
}
|
||||||
|
|
||||||
export interface AgentPsRow {
|
export interface AgentPsRow {
|
||||||
@@ -465,7 +480,10 @@ export function parseTmuxListSessions(output: string): string[] {
|
|||||||
* Returns the heartbeat file path for an agent.
|
* Returns the heartbeat file path for an agent.
|
||||||
*/
|
*/
|
||||||
export function heartbeatPath(agentName: string, mosaicHome = defaultMosaicHome()): string {
|
export function heartbeatPath(agentName: string, mosaicHome = defaultMosaicHome()): string {
|
||||||
return join(mosaicHome, 'fleet', 'run', `${agentName}.hb`);
|
// Honor MOSAIC_HEARTBEAT_RUN_DIR (the writer sidecar's override); otherwise the
|
||||||
|
// canonical <mosaicHome>/fleet/run. Keeps reader and writer on the same path.
|
||||||
|
const runDir = process.env.MOSAIC_HEARTBEAT_RUN_DIR ?? join(mosaicHome, 'fleet', 'run');
|
||||||
|
return join(runDir, `${agentName}.hb`);
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
@@ -474,15 +492,17 @@ export function heartbeatPath(agentName: string, mosaicHome = defaultMosaicHome(
|
|||||||
* ts=<iso8601>
|
* ts=<iso8601>
|
||||||
* pid=<pid>
|
* pid=<pid>
|
||||||
* status=<ok|busy>
|
* status=<ok|busy>
|
||||||
|
* model=<model-id> (optional — native runtime heartbeats self-report it)
|
||||||
*/
|
*/
|
||||||
export function parseHeartbeat(content: string | null, nowMs = Date.now()): HeartbeatInfo {
|
export function parseHeartbeat(content: string | null, nowMs = Date.now()): HeartbeatInfo {
|
||||||
if (content === null) {
|
if (content === null) {
|
||||||
return { ts: null, pid: null, status: null, health: 'unknown', ageMs: null };
|
return { ts: null, pid: null, status: null, health: 'unknown', ageMs: null, model: null };
|
||||||
}
|
}
|
||||||
const lines = content.split('\n');
|
const lines = content.split('\n');
|
||||||
let ts: Date | null = null;
|
let ts: Date | null = null;
|
||||||
let pid: number | null = null;
|
let pid: number | null = null;
|
||||||
let status: 'ok' | 'busy' | null = null;
|
let status: 'ok' | 'busy' | null = null;
|
||||||
|
let model: string | null = null;
|
||||||
for (const line of lines) {
|
for (const line of lines) {
|
||||||
const [key, ...rest] = line.split('=');
|
const [key, ...rest] = line.split('=');
|
||||||
const val = rest.join('=').trim();
|
const val = rest.join('=').trim();
|
||||||
@@ -494,16 +514,18 @@ export function parseHeartbeat(content: string | null, nowMs = Date.now()): Hear
|
|||||||
if (Number.isFinite(n)) pid = n;
|
if (Number.isFinite(n)) pid = n;
|
||||||
} else if (key === 'status' && (val === 'ok' || val === 'busy')) {
|
} else if (key === 'status' && (val === 'ok' || val === 'busy')) {
|
||||||
status = val;
|
status = val;
|
||||||
|
} else if (key === 'model' && val) {
|
||||||
|
model = val;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
const thresholdMs = HEARTBEAT_INTERVAL_MS * HEARTBEAT_HEALTHY_MULTIPLIER;
|
const thresholdMs = heartbeatIntervalMs() * HEARTBEAT_HEALTHY_MULTIPLIER;
|
||||||
let health: 'healthy' | 'stale' | 'unknown' = 'unknown';
|
let health: 'healthy' | 'stale' | 'unknown' = 'unknown';
|
||||||
let ageMs: number | null = null;
|
let ageMs: number | null = null;
|
||||||
if (ts !== null) {
|
if (ts !== null) {
|
||||||
ageMs = nowMs - ts.getTime();
|
ageMs = nowMs - ts.getTime();
|
||||||
health = ageMs <= thresholdMs ? 'healthy' : 'stale';
|
health = ageMs <= thresholdMs ? 'healthy' : 'stale';
|
||||||
}
|
}
|
||||||
return { ts, pid, status, health, ageMs };
|
return { ts, pid, status, health, ageMs, model };
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
@@ -1107,6 +1129,7 @@ export function registerFleetCommand(program: Command, deps: FleetCommandDeps =
|
|||||||
'PID'.padEnd(8),
|
'PID'.padEnd(8),
|
||||||
'IDLE'.padEnd(8),
|
'IDLE'.padEnd(8),
|
||||||
'HB'.padEnd(12),
|
'HB'.padEnd(12),
|
||||||
|
'MODEL'.padEnd(22),
|
||||||
'FLAGS',
|
'FLAGS',
|
||||||
].join(' ');
|
].join(' ');
|
||||||
console.log(header);
|
console.log(header);
|
||||||
@@ -1121,6 +1144,7 @@ export function registerFleetCommand(program: Command, deps: FleetCommandDeps =
|
|||||||
row.heartbeat.ageMs !== null
|
row.heartbeat.ageMs !== null
|
||||||
? `${Math.round(row.heartbeat.ageMs / 1000)}s/${row.heartbeat.health}`
|
? `${Math.round(row.heartbeat.ageMs / 1000)}s/${row.heartbeat.health}`
|
||||||
: `unknown`;
|
: `unknown`;
|
||||||
|
const model = row.heartbeat.model ?? '-';
|
||||||
const flags: string[] = [];
|
const flags: string[] = [];
|
||||||
if (!row.managed) flags.push('UNMANAGED');
|
if (!row.managed) flags.push('UNMANAGED');
|
||||||
if (row.driftFlag) flags.push('DRIFT');
|
if (row.driftFlag) flags.push('DRIFT');
|
||||||
@@ -1137,6 +1161,7 @@ export function registerFleetCommand(program: Command, deps: FleetCommandDeps =
|
|||||||
pid.padEnd(8),
|
pid.padEnd(8),
|
||||||
idle.padEnd(8),
|
idle.padEnd(8),
|
||||||
hbAge.padEnd(12),
|
hbAge.padEnd(12),
|
||||||
|
model.padEnd(22),
|
||||||
flags.join(','),
|
flags.join(','),
|
||||||
].join(' '),
|
].join(' '),
|
||||||
);
|
);
|
||||||
|
|||||||
Reference in New Issue
Block a user