Compare commits

..

2 Commits

Author SHA1 Message Date
Jarvis
7633bec2b4 ci: re-trigger pipeline (flaky pglite WASM OOM in packages/db, unrelated)
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline failed
The push/ci run for the prior commit failed only in packages/db's
src/migrate.test.ts with 'memory access out of bounds' inside the pglite
WASM module — a known-flaky in-memory-Postgres crash under CI memory
pressure. The pr/ci pipeline passed on the identical tree, and this PR
changes only a bash launcher script (no TS / no db package), so the
failure cannot originate here. Empty commit to re-run CI.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 23:15:33 -05:00
Jarvis
9a183fcd4f fix(fleet): pre-trust claude agent workdir to clear the folder-trust gate (#644)
Some checks failed
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/push/ci Pipeline failed
Fleet-launched Claude agents stall forever at Claude Code's one-time
"Is this a project you trust?" folder-trust prompt: there is no human in
the pane to answer it, yet the heartbeat keeps reporting "healthy" because
the pane process is alive — it's just blocked. This is the most common
fleet outage (F1 / premature stop).

--dangerously-skip-permissions does NOT bypass this gate, and neither does
`trustedProjectDirectories` in settings.json (both verified empirically on
2026-06-24). The only record the gate honors is the per-project entry in
~/.claude.json: projects["<dir>"].hasTrustDialogAccepted == true — exactly
what answering the prompt writes.

start-agent-session.sh now pre-seeds that record for the claude runtime
before launching the pane. The seeding is:
- claude-only (codex/pi have no such gate),
- idempotent (no-op when already trusted),
- atomic (tempfile + os.replace; never corrupts a partial/unreadable file),
- flock-serialized across concurrent agent launches sharing ~/.claude.json,
- best-effort (any failure is non-fatal — the agent still launches, worst
  case it falls back to the pre-fix behavior).

Verified end-to-end: with /home/jarvis untrusted, the modified launcher
flips hasTrustDialogAccepted to true and Claude boots straight to the ready
prompt with no gate.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 22:59:22 -05:00
10 changed files with 46 additions and 606 deletions

7
.gitignore vendored
View File

@@ -15,10 +15,3 @@ infra/step-ca/dev-password
# Scratch dirs created by the framework git-wrapper shell test harnesses
.mosaic-test-work/
# Transient config files vite/vitest/esbuild write next to a *.config.ts while
# loading it, then unlink. They are untracked but were not ignored, so turbo's
# package traversal hashed them and intermittently failed CI with "Package
# traversal error: ... .timestamp-*.mjs: No such file or directory" when the
# file vanished mid-scan. Ignoring them removes the race.
*.timestamp-*.mjs

View File

@@ -1,66 +0,0 @@
# H1 — heartbeat readiness detection
## Objective
Add runtime-agnostic readiness classification to `mosaic fleet ps` so an agent can be reported as working/idle/stuck/stale/dead/unknown instead of treating pane liveness as progress.
## Scope
- `packages/mosaic/src/commands/fleet.ts`
- exported readiness state/types/default thresholds/helpers/classifier
- `AgentPsRow.readiness` additive JSON field
- table HB column and IDLE/STUCK flags
- `packages/mosaic/src/commands/fleet.spec.ts`
- pure classifier branch/boundary coverage
- threshold helper coverage
- legitimate render/JSON assertion updates for new HB text
## Acceptance Criteria
- Branches covered: dead, unknown, stale, busy working, null-idle working, stuck boundary, idle boundary, working below idle.
- Threshold env helpers default to 300s/900s and honor positive integer env values.
- `fleet ps` rows populate `readiness` for roster and unmanaged socket sessions.
- Table HB text becomes `<age>s/<readiness>` when heartbeat age exists; remains `unknown` when absent.
- Flags include `IDLE`/`STUCK` for matching readiness.
- Local gates green: `pnpm typecheck`, `pnpm lint`, `pnpm format:check`, fleet vitest.
- Pre-push queue guard passes; PR opened off `origin/main`; no merge by worker.
## Constraints / Assumptions
- Source branch: `origin/main` @ `e3adc6a`.
- No scope creep beyond readiness detection.
- `docs/TASKS.md` and `docs/fleet/TASKS.md` are orchestrator-owned; worker will not modify them.
- PRD alignment source: `docs/fleet/PRD.md` Phase 2 observability; this is a refinement of heartbeat observability, preserving existing unknown/stale behavior.
## Plan
1. Install dependencies with requested PNPM environment.
2. Add readiness types/helpers/classifier near heartbeat constants.
3. Add `readiness` to `AgentPsRow` and populate both row paths.
4. Update table render and flags.
5. Add unit tests and update affected ps render/JSON assertions.
6. Run build precheck + required gates.
7. Run automated independent review, remediate findings.
8. Queue guard, push, open PR.
## Progress
- 2026-06-24: Branch created from `origin/main` @ `e3adc6a`.
- 2026-06-24: Implemented readiness thresholds/classifier, JSON row field, HB column label, and IDLE/STUCK flags.
- 2026-06-24: Added classifier branch/boundary tests, threshold helper tests, JSON shape assertions, and readiness table rendering assertions.
## Verification Evidence
- `pnpm install --store-dir "$HOME/.pnpm-store"` — pass.
- `npx turbo build --filter=@mosaicstack/mosaic^...` — pass, 12/12 tasks successful.
- `pnpm typecheck` — pass, 41/41 tasks successful.
- `pnpm lint` — pass, 23/23 tasks successful.
- `pnpm format:check` — pass, all matched files use Prettier style.
- `pnpm --filter @mosaicstack/mosaic exec vitest run src/commands/fleet.spec.ts` — pass, 171 tests.
- `pnpm --filter @mosaicstack/mosaic test` — pass, 39 files / 547 tests; `fleet.spec.ts` 171 tests.
- `~/.config/mosaic/tools/codex/codex-code-review.sh --uncommitted` — approve, 0 findings (reviewed supplied diff; sandbox file-inspection limitation noted by tool).
## Risks / Blockers
- No current blocker.
- Review tool could not inspect repo files directly due sandbox wrapper limitation, but it reviewed the supplied diff and approved with no findings.

View File

@@ -28,7 +28,6 @@ export default tseslint.config(
'apps/web/e2e/helpers/*.ts',
'apps/web/playwright.config.ts',
'apps/gateway/vitest.config.ts',
'packages/db/vitest.config.ts',
'packages/storage/vitest.config.ts',
'packages/mosaic/__tests__/*.ts',
'tools/federation-harness/*.ts',

View File

@@ -4,22 +4,5 @@ export default defineConfig({
test: {
globals: true,
environment: 'node',
// The migration suite spins up a real PGlite (WASM Postgres) instance per
// test and applies the full drizzle migration set. Each case legitimately
// takes ~5s locally and considerably longer on CI, where turbo runs many
// packages' test suites concurrently. The 5s vitest default then expires
// mid-migration and the run fails as a phantom "Test timed out in 5000ms"
// (often surfacing the underlying WASM `memory access out of bounds` when
// the heap is starved). Give migrations real headroom.
testTimeout: 120_000,
hookTimeout: 120_000,
// Each PGlite instance carries a multi-hundred-MB WASM heap. Running test
// files in parallel forks multiplies that peak and is what tips the CI
// runner into the WASM OOM. A single fork keeps only one instance resident
// at a time — slightly slower, but deterministic.
pool: 'forks',
poolOptions: {
forks: { singleFork: true },
},
},
});

View File

@@ -1,6 +1,6 @@
{
"name": "@mosaicstack/mosaic",
"version": "0.0.43",
"version": "0.0.41",
"repository": {
"type": "git",
"url": "https://git.mosaicstack.dev/mosaicstack/stack.git",

View File

@@ -30,7 +30,6 @@ import {
refreshActiveFleetUnits,
readRosterAgentNames,
buildRelaunchCommands,
checkFrameworkDrift,
FRAMEWORK_RESEED_PACKAGE,
} from './runtime/update-checker.js';
import { runWizard } from './wizard.js';
@@ -419,48 +418,6 @@ program
// checkForAllUpdates imported statically above
const { execSync } = await import('node:child_process');
// Re-seed the framework from the freshly-installed package, propagate shipped
// systemd unit fixes to the active units, and (opt-in) relaunch durable
// agents. Shared by the "packages updated" and the "framework drift" paths.
const reseedFramework = (reason: string): void => {
console.log(reason);
const reseed = runFrameworkReseed();
if (!reseed.ok) {
console.error(
`\n⚠ Framework re-seed skipped: ${reseed.reason ?? 'unknown'}.\n` +
' Activate manually: bash "$(npm root -g)/@mosaicstack/mosaic/framework/install.sh" ' +
'(MOSAIC_SYNC_ONLY=1 MOSAIC_INSTALL_MODE=keep)',
);
return;
}
console.log('✔ Framework re-seeded.');
// Propagate shipped systemd unit fixes to the ACTIVE units (re-seed only
// touches ~/.config/mosaic/systemd/user; systemd runs ~/.config/systemd/user).
const units = refreshActiveFleetUnits();
if (units.refreshed.length > 0) {
console.log(`✔ Refreshed ${units.refreshed.length} active systemd unit(s).`);
}
const agents = readRosterAgentNames();
if (agents.length === 0) return;
if (opts.relaunch) {
console.log(`\nRelaunching ${agents.length} fleet agent(s) to pick up the new runtime…`);
for (const restart of buildRelaunchCommands(agents)) {
try {
execSync(restart.join(' '), { stdio: 'inherit', timeout: 30_000 });
} catch {
console.error(` ⚠ failed to restart agent — run: ${restart.join(' ')}`);
}
}
console.log('✔ Agents relaunched.');
} else {
console.log(
`\n ${agents.length} fleet agent(s) are still running the previous runtime. ` +
'Restart them to activate the update:\n mosaic update --relaunch ' +
'(or: mosaic fleet restart <agent>)',
);
}
};
console.log('Checking for updates…');
const results = checkForAllUpdates({ skipCache: true });
@@ -475,18 +432,6 @@ program
process.exit(1);
}
console.log('\n✔ All packages up to date.');
// #642: the CLI may have been upgraded outside `mosaic update` (e.g. a
// direct `npm i -g`), leaving the framework files stale even though no
// package is reported outdated. Detect that via the framework version and
// re-seed so shipped launcher/runtime fixes still activate.
const drift = checkFrameworkDrift();
if (drift.drifted && opts.reseed !== false) {
reseedFramework(
`\nFramework drift detected (on-disk v${drift.installed} < bundled v${drift.bundled}) — ` +
'the CLI was updated outside `mosaic update`. Re-seeding framework files into ' +
'~/.config/mosaic (data-safe; keeps your edits)…',
);
}
return;
}
@@ -511,17 +456,52 @@ program
// F3-m3 / R13: the CLI is updated, but the framework files in
// ~/.config/mosaic/ are still the previous version. Re-seed them from the
// freshly-installed package so shipped launcher/runtime changes ACTIVATE.
// Re-seed when the framework-bearing package itself updated OR the on-disk
// framework is older than the freshly-installed one (#642 — e.g. only
// sibling packages were outdated but the CLI was already ahead).
// Only when the framework-bearing package itself updated.
const mosaicUpdated = outdated.some(
(r: { package: string }) => r.package === FRAMEWORK_RESEED_PACKAGE,
);
const drift = checkFrameworkDrift();
if ((mosaicUpdated || drift.drifted) && opts.reseed !== false) {
reseedFramework(
if (mosaicUpdated && opts.reseed !== false) {
console.log(
'\nRe-seeding framework files into ~/.config/mosaic (data-safe; keeps your edits)…',
);
const reseed = runFrameworkReseed();
if (reseed.ok) {
console.log('✔ Framework re-seeded.');
// Propagate shipped systemd unit fixes to the ACTIVE units (re-seed only
// touches ~/.config/mosaic/systemd/user; systemd runs ~/.config/systemd/user).
const units = refreshActiveFleetUnits();
if (units.refreshed.length > 0) {
console.log(`✔ Refreshed ${units.refreshed.length} active systemd unit(s).`);
}
const agents = readRosterAgentNames();
if (agents.length > 0) {
if (opts.relaunch) {
console.log(
`\nRelaunching ${agents.length} fleet agent(s) to pick up the new runtime…`,
);
for (const restart of buildRelaunchCommands(agents)) {
try {
execSync(restart.join(' '), { stdio: 'inherit', timeout: 30_000 });
} catch {
console.error(` ⚠ failed to restart agent — run: ${restart.join(' ')}`);
}
}
console.log('✔ Agents relaunched.');
} else {
console.log(
`\n ${agents.length} fleet agent(s) are still running the previous runtime. ` +
'Restart them to activate the update:\n mosaic update --relaunch ' +
'(or: mosaic fleet restart <agent>)',
);
}
}
} else {
console.error(
`\n⚠ Framework re-seed skipped: ${reseed.reason ?? 'unknown'}.\n` +
' Activate manually: bash "$(npm root -g)/@mosaicstack/mosaic/framework/install.sh" ' +
'(MOSAIC_SYNC_ONLY=1 MOSAIC_INSTALL_MODE=keep)',
);
}
}
});

View File

@@ -19,21 +19,17 @@ import {
buildSystemdShowCommand,
buildTmuxListPanesCommand,
buildTmuxListSessionsCommand,
classifyReadiness,
classifySendResult,
countOrchestrators,
countEnhancers,
detectDrift,
enableFleetUnits,
FLEET_PROFILES,
HEARTBEAT_IDLE_THRESHOLD_SECONDS,
HEARTBEAT_STUCK_THRESHOLD_SECONDS,
generateAgentEnv,
getDefaultOperatorSourceLabel,
getDefaultTenantAndHost,
getRosterAgent,
heartbeatPath,
idleThresholdSeconds,
isSendAccepted,
loadFleetRoster,
mergeAgentEnv,
@@ -48,7 +44,6 @@ import {
resolvePresetFilename,
RUNTIME_ACCEPTABLE_COMMANDS,
serializeRosterToYaml,
stuckThresholdSeconds,
VERIFY_DEFAULT_TIMEOUT_MS,
VERIFY_POLL_INTERVAL_MS,
type AgentPsRow,
@@ -938,127 +933,6 @@ describe('fleet ps — heartbeat parsing', () => {
});
});
describe('fleet ps — readiness thresholds', () => {
const savedIdle = process.env.MOSAIC_HEARTBEAT_IDLE_THRESHOLD;
const savedStuck = process.env.MOSAIC_HEARTBEAT_STUCK_THRESHOLD;
afterEach(() => {
if (savedIdle === undefined) delete process.env.MOSAIC_HEARTBEAT_IDLE_THRESHOLD;
else process.env.MOSAIC_HEARTBEAT_IDLE_THRESHOLD = savedIdle;
if (savedStuck === undefined) delete process.env.MOSAIC_HEARTBEAT_STUCK_THRESHOLD;
else process.env.MOSAIC_HEARTBEAT_STUCK_THRESHOLD = savedStuck;
});
it('uses default readiness thresholds when env is unset', () => {
delete process.env.MOSAIC_HEARTBEAT_IDLE_THRESHOLD;
delete process.env.MOSAIC_HEARTBEAT_STUCK_THRESHOLD;
expect(idleThresholdSeconds()).toBe(HEARTBEAT_IDLE_THRESHOLD_SECONDS);
expect(stuckThresholdSeconds()).toBe(HEARTBEAT_STUCK_THRESHOLD_SECONDS);
});
it('honors positive integer readiness thresholds from env', () => {
process.env.MOSAIC_HEARTBEAT_IDLE_THRESHOLD = '120';
process.env.MOSAIC_HEARTBEAT_STUCK_THRESHOLD = '480';
expect(idleThresholdSeconds()).toBe(120);
expect(stuckThresholdSeconds()).toBe(480);
});
it('falls back to defaults for invalid readiness thresholds', () => {
process.env.MOSAIC_HEARTBEAT_IDLE_THRESHOLD = '0';
process.env.MOSAIC_HEARTBEAT_STUCK_THRESHOLD = 'not-a-number';
expect(idleThresholdSeconds()).toBe(HEARTBEAT_IDLE_THRESHOLD_SECONDS);
expect(stuckThresholdSeconds()).toBe(HEARTBEAT_STUCK_THRESHOLD_SECONDS);
});
});
describe('fleet ps — readiness classification', () => {
const thresholds = { idleThresholdSeconds: 300, stuckThresholdSeconds: 900 };
it('reports dead when the pane is not alive', () => {
expect(
classifyReadiness(
{ paneAlive: false, hbHealth: 'healthy', hbStatus: 'busy', idleSeconds: 0 },
thresholds,
),
).toBe('dead');
});
it('reports unknown when heartbeat health is unknown', () => {
expect(
classifyReadiness(
{ paneAlive: true, hbHealth: 'unknown', hbStatus: null, idleSeconds: 0 },
thresholds,
),
).toBe('unknown');
});
it('reports stale when heartbeat health is stale', () => {
expect(
classifyReadiness(
{ paneAlive: true, hbHealth: 'stale', hbStatus: 'busy', idleSeconds: 1_000 },
thresholds,
),
).toBe('stale');
});
it('reports working when heartbeat status is busy, even past stuck threshold', () => {
expect(
classifyReadiness(
{ paneAlive: true, hbHealth: 'healthy', hbStatus: 'busy', idleSeconds: 2_000 },
thresholds,
),
).toBe('working');
});
it('reports working when pane idle seconds are unavailable', () => {
expect(
classifyReadiness(
{ paneAlive: true, hbHealth: 'healthy', hbStatus: 'ok', idleSeconds: null },
thresholds,
),
).toBe('working');
});
it('reports stuck at the stuck threshold boundary', () => {
expect(
classifyReadiness(
{ paneAlive: true, hbHealth: 'healthy', hbStatus: 'ok', idleSeconds: 900 },
thresholds,
),
).toBe('stuck');
});
it('reports idle at the idle threshold boundary', () => {
expect(
classifyReadiness(
{ paneAlive: true, hbHealth: 'healthy', hbStatus: 'ok', idleSeconds: 300 },
thresholds,
),
).toBe('idle');
});
it('reports working below the idle threshold', () => {
expect(
classifyReadiness(
{ paneAlive: true, hbHealth: 'healthy', hbStatus: 'ok', idleSeconds: 299 },
thresholds,
),
).toBe('working');
});
it('checks stuck before idle when thresholds are inverted', () => {
expect(
classifyReadiness(
{ paneAlive: true, hbHealth: 'healthy', hbStatus: 'ok', idleSeconds: 350 },
{ idleThresholdSeconds: 900, stuckThresholdSeconds: 300 },
),
).toBe('stuck');
});
});
describe('fleet ps — systemd show parsing', () => {
it('parses ActiveState, SubState, UnitFileState from systemctl show output', () => {
const output = 'ActiveState=active\nSubState=running\nUnitFileState=enabled\n';
@@ -1450,9 +1324,8 @@ describe('fleet ps — JSON output shape (FR-6)', () => {
// boot-enable warning: active + disabled
expect(row.bootEnableWarning).toBe(true);
// heartbeat missing → unknown readiness preserves existing display semantics
// heartbeat missing → unknown
expect(row.heartbeat.health).toBe('unknown');
expect(row.readiness).toBe('unknown');
expect(row.name).toBe('canary-pi');
expect(row.runtime).toBe('pi');
@@ -1514,92 +1387,6 @@ describe('fleet ps — command sequences issued', () => {
});
});
describe('fleet ps — readiness table output', () => {
it('renders readiness in HB column and flags idle/stuck rows', async () => {
const home = await mkdtemp(join(tmpdir(), 'mosaic-fleet-'));
const rosterPath = join(home, 'fleet', 'roster.yaml');
const runDir = join(home, 'fleet', 'run');
await mkdir(runDir, { recursive: true });
await writeFile(
rosterPath,
[
'version: 1',
'transport: tmux',
'agents:',
' - name: idle-agent',
' runtime: pi',
' - name: stuck-agent',
' runtime: pi',
].join('\n'),
);
const nowMs = 1_700_000_000_000;
const idleActivityEpoch = Math.floor((nowMs - 10_000) / 1000);
const stuckActivityEpoch = Math.floor((nowMs - 40_000) / 1000);
const hbTs = new Date(nowMs - 1_000).toISOString();
await writeFile(join(runDir, 'idle-agent.hb'), `ts=${hbTs}\npid=111\nstatus=ok\n`);
await writeFile(join(runDir, 'stuck-agent.hb'), `ts=${hbTs}\npid=222\nstatus=ok\n`);
const savedIdle = process.env.MOSAIC_HEARTBEAT_IDLE_THRESHOLD;
const savedStuck = process.env.MOSAIC_HEARTBEAT_STUCK_THRESHOLD;
process.env.MOSAIC_HEARTBEAT_IDLE_THRESHOLD = '5';
process.env.MOSAIC_HEARTBEAT_STUCK_THRESHOLD = '30';
const dateNow = vi.spyOn(Date, 'now').mockReturnValue(nowMs);
const runner: CommandRunner = async (command, args) => {
const full = [command, ...args].join(' ');
if (full.includes('list-sessions')) {
return { stdout: 'idle-agent\nstuck-agent\n', stderr: '', exitCode: 0 };
}
if (full.includes('=idle-agent:0.0')) {
return { stdout: `111 pi 0 ${idleActivityEpoch}\n`, stderr: '', exitCode: 0 };
}
if (full.includes('=stuck-agent:0.0')) {
return { stdout: `222 pi 0 ${stuckActivityEpoch}\n`, stderr: '', exitCode: 0 };
}
if (full.includes('systemctl') && full.includes('show')) {
return {
stdout: 'ActiveState=active\nSubState=running\nUnitFileState=enabled\n',
stderr: '',
exitCode: 0,
};
}
return { stdout: '', stderr: '', exitCode: 0 };
};
const lines: string[] = [];
const origLog = console.log;
console.log = (msg: string) => {
lines.push(msg);
};
const program = new Command();
program.exitOverride();
registerFleetCommand(program, { runner, mosaicHome: home });
try {
await program.parseAsync(['node', 'mosaic', 'fleet', 'ps']);
} finally {
console.log = origLog;
dateNow.mockRestore();
if (savedIdle === undefined) delete process.env.MOSAIC_HEARTBEAT_IDLE_THRESHOLD;
else process.env.MOSAIC_HEARTBEAT_IDLE_THRESHOLD = savedIdle;
if (savedStuck === undefined) delete process.env.MOSAIC_HEARTBEAT_STUCK_THRESHOLD;
else process.env.MOSAIC_HEARTBEAT_STUCK_THRESHOLD = savedStuck;
await rm(home, { recursive: true, force: true });
}
const idleLine = lines.find((line) => line.includes('idle-agent'));
const stuckLine = lines.find((line) => line.includes('stuck-agent'));
expect(idleLine).toBeDefined();
expect(idleLine).toContain('1s/idle');
expect(idleLine).toMatch(/\bIDLE\b/);
expect(stuckLine).toBeDefined();
expect(stuckLine).toContain('1s/stuck');
expect(stuckLine).toMatch(/\bSTUCK\b/);
});
});
describe('buildTmuxListSessionsCommand', () => {
it('builds exact list-sessions command with session_name format', () => {
expect(buildTmuxListSessionsCommand('mosaic-fleet')).toEqual([
@@ -1727,7 +1514,6 @@ describe('fleet ps — unmanaged socket sessions', () => {
// driftFlag must be false for unmanaged (no roster runtime to compare)
expect(unmanagedRow.driftFlag).toBe(false);
expect(unmanagedRow.readiness).toBe('unknown');
});
it('shows UNMANAGED flag in table output for unmanaged sessions', async () => {

View File

@@ -394,8 +394,6 @@ export function buildAgentTailCommand(agentName: string, lines: number, socketNa
// ---------------------------------------------------------------------------
export const HEARTBEAT_INTERVAL_MS = 15_000;
export const HEARTBEAT_IDLE_THRESHOLD_SECONDS = 300;
export const HEARTBEAT_STUCK_THRESHOLD_SECONDS = 900;
/**
* Heartbeat interval in ms, honoring MOSAIC_HEARTBEAT_INTERVAL (seconds) so the
@@ -406,68 +404,8 @@ export function heartbeatIntervalMs(): number {
const sec = Number.parseInt(process.env.MOSAIC_HEARTBEAT_INTERVAL ?? '', 10);
return Number.isFinite(sec) && sec > 0 ? sec * 1000 : HEARTBEAT_INTERVAL_MS;
}
/** Idle threshold in seconds, honoring MOSAIC_HEARTBEAT_IDLE_THRESHOLD. */
export function idleThresholdSeconds(): number {
const sec = Number.parseInt(process.env.MOSAIC_HEARTBEAT_IDLE_THRESHOLD ?? '', 10);
return Number.isFinite(sec) && sec > 0 ? sec : HEARTBEAT_IDLE_THRESHOLD_SECONDS;
}
/** Stuck threshold in seconds, honoring MOSAIC_HEARTBEAT_STUCK_THRESHOLD. */
export function stuckThresholdSeconds(): number {
const sec = Number.parseInt(process.env.MOSAIC_HEARTBEAT_STUCK_THRESHOLD ?? '', 10);
return Number.isFinite(sec) && sec > 0 ? sec : HEARTBEAT_STUCK_THRESHOLD_SECONDS;
}
export const HEARTBEAT_HEALTHY_MULTIPLIER = 3;
export type ReadinessState = 'working' | 'idle' | 'stuck' | 'stale' | 'dead' | 'unknown';
export interface ReadinessSignals {
paneAlive: boolean;
hbHealth: 'healthy' | 'stale' | 'unknown';
hbStatus: 'ok' | 'busy' | null;
idleSeconds: number | null;
}
export interface ReadinessThresholds {
idleThresholdSeconds: number;
stuckThresholdSeconds: number;
}
/**
* Classify whether an agent is progressing based on already-parsed heartbeat/tmux signals.
* Best-effort and runtime-agnostic: it never probes, never throws, and preserves existing
* unknown/stale behavior when heartbeat data is absent or old.
*/
export function classifyReadiness(
signals: Partial<ReadinessSignals> | null | undefined,
thresholds: Partial<ReadinessThresholds> | null | undefined = {},
): ReadinessState {
try {
if (signals?.paneAlive !== true) return 'dead';
if (signals.hbHealth === 'unknown' || signals.hbHealth === undefined) return 'unknown';
if (signals.hbHealth === 'stale') return 'stale';
if (signals.hbStatus === 'busy') return 'working';
if (signals.idleSeconds === null || signals.idleSeconds === undefined) return 'working';
const idleSeconds = Number.isFinite(signals.idleSeconds) ? signals.idleSeconds : null;
if (idleSeconds === null) return 'working';
const idleThreshold = Number.isFinite(thresholds?.idleThresholdSeconds)
? Number(thresholds?.idleThresholdSeconds)
: idleThresholdSeconds();
const stuckThreshold = Number.isFinite(thresholds?.stuckThresholdSeconds)
? Number(thresholds?.stuckThresholdSeconds)
: stuckThresholdSeconds();
if (idleSeconds >= stuckThreshold) return 'stuck';
if (idleSeconds >= idleThreshold) return 'idle';
return 'working';
} catch {
return 'unknown';
}
}
export interface HeartbeatInfo {
ts: Date | null;
pid: number | null;
@@ -491,7 +429,6 @@ export interface AgentPsRow {
paneCommand: string | null;
idleSeconds: number | null;
heartbeat: HeartbeatInfo;
readiness: ReadinessState;
/** roster runtime !== actual pane command */
driftFlag: boolean;
/** active but UnitFileState=disabled */
@@ -1085,10 +1022,6 @@ export function registerFleetCommand(program: Command, deps: FleetCommandDeps =
const nowMs = Date.now();
const rows: AgentPsRow[] = [];
const readinessThresholds = {
idleThresholdSeconds: idleThresholdSeconds(),
stuckThresholdSeconds: stuckThresholdSeconds(),
};
// Build the set of roster agent names for quick lookup when filtering socket sessions.
const rosterAgentNames = new Set(roster.agents.map((a) => a.name));
@@ -1119,17 +1052,6 @@ export function registerFleetCommand(program: Command, deps: FleetCommandDeps =
const bootEnableWarning =
sysInfo.ActiveState === 'active' && sysInfo.UnitFileState === 'disabled';
const paneAlive = !paneInfo.dead;
const readiness = classifyReadiness(
{
paneAlive,
hbHealth: hb.health,
hbStatus: hb.status,
idleSeconds: paneInfo.idleSeconds,
},
readinessThresholds,
);
rows.push({
name: agent.name,
tenant_id,
@@ -1137,12 +1059,11 @@ export function registerFleetCommand(program: Command, deps: FleetCommandDeps =
runtime: agent.runtime,
systemdActive: sysInfo.ActiveState,
systemdEnabled: sysInfo.UnitFileState,
paneAlive,
paneAlive: !paneInfo.dead,
panePid: paneInfo.pid,
paneCommand: paneInfo.command,
idleSeconds: paneInfo.idleSeconds,
heartbeat: hb,
readiness,
driftFlag,
bootEnableWarning,
managed: true,
@@ -1189,17 +1110,6 @@ export function registerFleetCommand(program: Command, deps: FleetCommandDeps =
const bootEnableWarning =
sysInfo.ActiveState === 'active' && sysInfo.UnitFileState === 'disabled';
const paneAlive = !paneInfo.dead;
const readiness = classifyReadiness(
{
paneAlive,
hbHealth: hb.health,
hbStatus: hb.status,
idleSeconds: paneInfo.idleSeconds,
},
readinessThresholds,
);
rows.push({
name: sessionName,
tenant_id,
@@ -1208,12 +1118,11 @@ export function registerFleetCommand(program: Command, deps: FleetCommandDeps =
runtime: 'unknown',
systemdActive: sysInfo.ActiveState,
systemdEnabled: sysInfo.UnitFileState,
paneAlive,
paneAlive: !paneInfo.dead,
panePid: paneInfo.pid,
paneCommand: paneInfo.command,
idleSeconds: paneInfo.idleSeconds,
heartbeat: hb,
readiness,
// No roster runtime to compare — drift is not meaningful for unmanaged sessions
driftFlag: false,
bootEnableWarning,
@@ -1255,15 +1164,13 @@ export function registerFleetCommand(program: Command, deps: FleetCommandDeps =
const idle = row.idleSeconds !== null ? `${row.idleSeconds}s` : '-';
const hbAge =
row.heartbeat.ageMs !== null
? `${Math.round(row.heartbeat.ageMs / 1000)}s/${row.readiness}`
? `${Math.round(row.heartbeat.ageMs / 1000)}s/${row.heartbeat.health}`
: `unknown`;
const model = row.heartbeat.model ?? '-';
const flags: string[] = [];
if (!row.managed) flags.push('UNMANAGED');
if (row.driftFlag) flags.push('DRIFT');
if (row.bootEnableWarning) flags.push('BOOT-ENABLE');
if (row.readiness === 'idle') flags.push('IDLE');
if (row.readiness === 'stuck') flags.push('STUCK');
console.log(
[

View File

@@ -8,9 +8,6 @@ import {
readRosterAgentNames,
runFrameworkReseed,
refreshActiveFleetUnits,
readInstalledFrameworkVersion,
readBundledFrameworkVersion,
checkFrameworkDrift,
} from './update-checker.js';
import { existsSync, readFileSync } from 'node:fs';
@@ -126,73 +123,3 @@ describe('refreshActiveFleetUnits', () => {
expect(existsSync(join(configHome, 'systemd', 'user', 'mosaic-agent@.service'))).toBe(false);
});
});
/**
* #642: re-seed when the on-disk framework is older than the bundled one even
* if no package is reported outdated (CLI upgraded outside `mosaic update`).
*/
describe('framework drift detection', () => {
let home: string; // stand-in for ~/.config/mosaic
let fw: string; // stand-in for the bundled framework root
beforeEach(() => {
const root = mkdtempSync(join(tmpdir(), 'mosaic-drift-'));
home = join(root, 'mosaic');
fw = join(root, 'framework');
mkdirSync(home, { recursive: true });
mkdirSync(fw, { recursive: true });
});
afterEach(() => {
rmSync(join(home, '..'), { recursive: true, force: true });
});
const writeInstalled = (v: string) => writeFileSync(join(home, '.framework-version'), v);
const writeBundled = (v: string) =>
writeFileSync(join(fw, 'install.sh'), `#!/usr/bin/env bash\nFRAMEWORK_VERSION=${v}\n`);
describe('readInstalledFrameworkVersion', () => {
it('returns undefined when the version file is absent', () => {
expect(readInstalledFrameworkVersion(home)).toBeUndefined();
});
it('parses the integer (tolerating surrounding whitespace)', () => {
writeInstalled(' 3\n');
expect(readInstalledFrameworkVersion(home)).toBe(3);
});
it('returns undefined for non-numeric content', () => {
writeInstalled('not-a-number\n');
expect(readInstalledFrameworkVersion(home)).toBeUndefined();
});
});
describe('readBundledFrameworkVersion', () => {
it('returns undefined when install.sh is absent', () => {
expect(readBundledFrameworkVersion(fw)).toBeUndefined();
});
it('parses FRAMEWORK_VERSION=<n> from install.sh', () => {
writeBundled('4');
expect(readBundledFrameworkVersion(fw)).toBe(4);
});
});
describe('checkFrameworkDrift', () => {
it('reports drift when on-disk is older than bundled', () => {
writeInstalled('3');
writeBundled('4');
expect(checkFrameworkDrift(home, fw)).toEqual({ drifted: true, installed: 3, bundled: 4 });
});
it('no drift when versions match', () => {
writeInstalled('4');
writeBundled('4');
expect(checkFrameworkDrift(home, fw)).toMatchObject({ drifted: false });
});
it('no drift when on-disk is newer than bundled', () => {
writeInstalled('5');
writeBundled('4');
expect(checkFrameworkDrift(home, fw)).toMatchObject({ drifted: false });
});
it('no drift (conservative) when a version cannot be read', () => {
writeBundled('4'); // installed version file missing
expect(checkFrameworkDrift(home, fw)).toMatchObject({ drifted: false, bundled: 4 });
});
});
});

View File

@@ -521,75 +521,6 @@ export function runFrameworkReseed(
}
}
// ─── Framework drift detection (#642) ────────────────────────────────────────
//
// `mosaic update` only re-seeds the framework when the @mosaicstack/mosaic
// package itself is upgraded *within that command*. When the CLI is upgraded
// some OTHER way — a direct `npm i -g @mosaicstack/mosaic`, or an upgrade run
// where only sibling packages were outdated — the framework files in
// ~/.config/mosaic stay stale and shipped launcher/runtime fixes never
// activate. Comparing the on-disk framework schema version against the version
// bundled in the installed package detects exactly that situation.
/** Read the framework schema version recorded on disk (~/.config/mosaic/.framework-version). */
export function readInstalledFrameworkVersion(
mosaicHome = join(homedir(), '.config', 'mosaic'),
): number | undefined {
const vf = join(mosaicHome, '.framework-version');
if (!existsSync(vf)) return undefined;
try {
const n = parseInt(readFileSync(vf, 'utf-8').trim(), 10);
return Number.isFinite(n) ? n : undefined;
} catch {
return undefined;
}
}
/**
* Read the framework schema version shipped in the installed package by parsing
* `FRAMEWORK_VERSION=<n>` out of the bundled install.sh (the authoritative
* source the installer writes to .framework-version).
*/
export function readBundledFrameworkVersion(
frameworkRoot = resolveBundledFrameworkRoot(),
): number | undefined {
const installer = join(frameworkRoot, 'install.sh');
if (!existsSync(installer)) return undefined;
try {
const m = readFileSync(installer, 'utf-8').match(/^\s*FRAMEWORK_VERSION=(\d+)/m);
const raw = m?.[1];
if (!raw) return undefined;
const n = parseInt(raw, 10);
return Number.isFinite(n) ? n : undefined;
} catch {
return undefined;
}
}
export interface FrameworkDrift {
/** True only when both versions are known AND the on-disk one is older. */
drifted: boolean;
installed?: number;
bundled?: number;
}
/**
* Detect whether the on-disk framework is older than the framework bundled in
* the installed CLI (#642). Conservative: if either version can't be read the
* result is no-drift, so a missing/unreadable version file never triggers an
* unexpected re-seed.
*/
export function checkFrameworkDrift(
mosaicHome = join(homedir(), '.config', 'mosaic'),
frameworkRoot = resolveBundledFrameworkRoot(),
): FrameworkDrift {
const installed = readInstalledFrameworkVersion(mosaicHome);
const bundled = readBundledFrameworkVersion(frameworkRoot);
const drifted =
typeof installed === 'number' && typeof bundled === 'number' && installed < bundled;
return { drifted, installed, bundled };
}
/**
* Best-effort parse of the fleet roster for agent names (used to relaunch
* durable agents after a re-seed). Returns [] when no roster exists.