Compare commits
6 Commits
af7dd3fa7c
...
release/mo
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
a60b740ad0 | ||
| 70661e3fab | |||
| ec8dd7ca86 | |||
| d887555852 | |||
| e3adc6a1bc | |||
| aa27c42129 |
66
docs/scratchpads/h1-heartbeat-readiness.md
Normal file
66
docs/scratchpads/h1-heartbeat-readiness.md
Normal file
@@ -0,0 +1,66 @@
|
||||
# H1 — heartbeat readiness detection
|
||||
|
||||
## Objective
|
||||
|
||||
Add runtime-agnostic readiness classification to `mosaic fleet ps` so an agent can be reported as working/idle/stuck/stale/dead/unknown instead of treating pane liveness as progress.
|
||||
|
||||
## Scope
|
||||
|
||||
- `packages/mosaic/src/commands/fleet.ts`
|
||||
- exported readiness state/types/default thresholds/helpers/classifier
|
||||
- `AgentPsRow.readiness` additive JSON field
|
||||
- table HB column and IDLE/STUCK flags
|
||||
- `packages/mosaic/src/commands/fleet.spec.ts`
|
||||
- pure classifier branch/boundary coverage
|
||||
- threshold helper coverage
|
||||
- legitimate render/JSON assertion updates for new HB text
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- Branches covered: dead, unknown, stale, busy working, null-idle working, stuck boundary, idle boundary, working below idle.
|
||||
- Threshold env helpers default to 300s/900s and honor positive integer env values.
|
||||
- `fleet ps` rows populate `readiness` for roster and unmanaged socket sessions.
|
||||
- Table HB text becomes `<age>s/<readiness>` when heartbeat age exists; remains `unknown` when absent.
|
||||
- Flags include `IDLE`/`STUCK` for matching readiness.
|
||||
- Local gates green: `pnpm typecheck`, `pnpm lint`, `pnpm format:check`, fleet vitest.
|
||||
- Pre-push queue guard passes; PR opened off `origin/main`; no merge by worker.
|
||||
|
||||
## Constraints / Assumptions
|
||||
|
||||
- Source branch: `origin/main` @ `e3adc6a`.
|
||||
- No scope creep beyond readiness detection.
|
||||
- `docs/TASKS.md` and `docs/fleet/TASKS.md` are orchestrator-owned; worker will not modify them.
|
||||
- PRD alignment source: `docs/fleet/PRD.md` Phase 2 observability; this is a refinement of heartbeat observability, preserving existing unknown/stale behavior.
|
||||
|
||||
## Plan
|
||||
|
||||
1. Install dependencies with requested PNPM environment.
|
||||
2. Add readiness types/helpers/classifier near heartbeat constants.
|
||||
3. Add `readiness` to `AgentPsRow` and populate both row paths.
|
||||
4. Update table render and flags.
|
||||
5. Add unit tests and update affected ps render/JSON assertions.
|
||||
6. Run build precheck + required gates.
|
||||
7. Run automated independent review, remediate findings.
|
||||
8. Queue guard, push, open PR.
|
||||
|
||||
## Progress
|
||||
|
||||
- 2026-06-24: Branch created from `origin/main` @ `e3adc6a`.
|
||||
- 2026-06-24: Implemented readiness thresholds/classifier, JSON row field, HB column label, and IDLE/STUCK flags.
|
||||
- 2026-06-24: Added classifier branch/boundary tests, threshold helper tests, JSON shape assertions, and readiness table rendering assertions.
|
||||
|
||||
## Verification Evidence
|
||||
|
||||
- `pnpm install --store-dir "$HOME/.pnpm-store"` — pass.
|
||||
- `npx turbo build --filter=@mosaicstack/mosaic^...` — pass, 12/12 tasks successful.
|
||||
- `pnpm typecheck` — pass, 41/41 tasks successful.
|
||||
- `pnpm lint` — pass, 23/23 tasks successful.
|
||||
- `pnpm format:check` — pass, all matched files use Prettier style.
|
||||
- `pnpm --filter @mosaicstack/mosaic exec vitest run src/commands/fleet.spec.ts` — pass, 171 tests.
|
||||
- `pnpm --filter @mosaicstack/mosaic test` — pass, 39 files / 547 tests; `fleet.spec.ts` 171 tests.
|
||||
- `~/.config/mosaic/tools/codex/codex-code-review.sh --uncommitted` — approve, 0 findings (reviewed supplied diff; sandbox file-inspection limitation noted by tool).
|
||||
|
||||
## Risks / Blockers
|
||||
|
||||
- No current blocker.
|
||||
- Review tool could not inspect repo files directly due sandbox wrapper limitation, but it reviewed the supplied diff and approved with no findings.
|
||||
53
docs/scratchpads/h1b-pane-idle-signal.md
Normal file
53
docs/scratchpads/h1b-pane-idle-signal.md
Normal file
@@ -0,0 +1,53 @@
|
||||
# H1b — tmux pane idle signal wiring
|
||||
|
||||
## Objective
|
||||
|
||||
Feed `classifyReadiness()` a real idle signal on tmux 3.4 by deriving `idleSeconds` from the first available tmux timestamp source: pane activity, then window activity, then session activity.
|
||||
|
||||
## Scope
|
||||
|
||||
- `packages/mosaic/src/commands/fleet.ts`
|
||||
- Extend `buildTmuxListPanesCommand()` format to include `#{window_activity}` and `#{session_activity}` after the existing fields.
|
||||
- Update `parseTmuxListPanes()` to choose the first non-empty finite positive timestamp and clamp future idle values to 0.
|
||||
- `packages/mosaic/src/commands/fleet.spec.ts`
|
||||
- Cover pane/window/session activity parsing behavior, empty-field index alignment, null idle, future clamping, math correctness, and exact tmux format.
|
||||
|
||||
## Out of Scope
|
||||
|
||||
- No changes to `classifyReadiness()`, thresholds, `AgentPsRow`, or `fleet ps` rendering.
|
||||
- No merge by worker; orchestrator routes review/merge.
|
||||
- Workers do not modify `docs/TASKS.md`.
|
||||
|
||||
## PRD Alignment
|
||||
|
||||
Aligned with `docs/fleet/PRD.md` FR-1 and acceptance criteria for truthful `mosaic fleet ps` pane/pid/idle observability.
|
||||
|
||||
## Plan
|
||||
|
||||
1. Sync branch from latest `origin/main` and install dependencies with required pnpm env.
|
||||
2. Add/confirm reproducer tests for tmux 3.4 empty `pane_activity` and new fallback behavior.
|
||||
3. Implement the focused parser/format change only.
|
||||
4. Run required build, baseline gates, fleet vitest, and independent review.
|
||||
5. Run pre-push queue guard, push branch, and open PR to `main` with Mosaic wrapper.
|
||||
|
||||
## Progress
|
||||
|
||||
- 2026-06-24: Branch `fix/fleet-pane-idle-activity` created from `origin/main` @ `ec8dd7c` after fetching.
|
||||
- 2026-06-24: Session-start generated local `.mosaic/orchestrator/*` changes on the previous release branch; stashed as `coder1 session-start state before H1b` to keep this branch clean.
|
||||
- 2026-06-24: Added TDD coverage for the tmux 3.4 production case (`pane_activity` empty, `window_activity` populated), exact new list-panes format, null/future/multiple-source behavior.
|
||||
- 2026-06-24: Implemented parser fallback without changing readiness classifier thresholds or render shape.
|
||||
|
||||
## Verification Evidence
|
||||
|
||||
- `pnpm install --store-dir "$HOME/.pnpm-store"` — pass.
|
||||
- Reproducer before implementation: `pnpm --filter @mosaicstack/mosaic exec vitest run src/commands/fleet.spec.ts` — failed as expected (old format, no fallback, negative future idle).
|
||||
- `npx turbo build --filter=@mosaicstack/mosaic^...` — pass, 12/12 tasks successful.
|
||||
- `pnpm typecheck` — pass, 41/41 tasks successful.
|
||||
- `pnpm lint` — pass, 23/23 tasks successful.
|
||||
- `pnpm format:check` — pass, all matched files use Prettier style.
|
||||
- `pnpm --filter @mosaicstack/mosaic exec vitest run src/commands/fleet.spec.ts` — pass, 176 tests.
|
||||
- `~/.config/mosaic/tools/codex/codex-code-review.sh --uncommitted` — approve, 0 findings (reviewed supplied diff; sandbox file-inspection limitation noted by tool).
|
||||
|
||||
## Risks / Blockers
|
||||
|
||||
- No current blocker.
|
||||
@@ -1,6 +1,6 @@
|
||||
{
|
||||
"name": "@mosaicstack/mosaic",
|
||||
"version": "0.0.41",
|
||||
"version": "0.0.44",
|
||||
"repository": {
|
||||
"type": "git",
|
||||
"url": "https://git.mosaicstack.dev/mosaicstack/stack.git",
|
||||
|
||||
@@ -19,17 +19,21 @@ import {
|
||||
buildSystemdShowCommand,
|
||||
buildTmuxListPanesCommand,
|
||||
buildTmuxListSessionsCommand,
|
||||
classifyReadiness,
|
||||
classifySendResult,
|
||||
countOrchestrators,
|
||||
countEnhancers,
|
||||
detectDrift,
|
||||
enableFleetUnits,
|
||||
FLEET_PROFILES,
|
||||
HEARTBEAT_IDLE_THRESHOLD_SECONDS,
|
||||
HEARTBEAT_STUCK_THRESHOLD_SECONDS,
|
||||
generateAgentEnv,
|
||||
getDefaultOperatorSourceLabel,
|
||||
getDefaultTenantAndHost,
|
||||
getRosterAgent,
|
||||
heartbeatPath,
|
||||
idleThresholdSeconds,
|
||||
isSendAccepted,
|
||||
loadFleetRoster,
|
||||
mergeAgentEnv,
|
||||
@@ -44,6 +48,7 @@ import {
|
||||
resolvePresetFilename,
|
||||
RUNTIME_ACCEPTABLE_COMMANDS,
|
||||
serializeRosterToYaml,
|
||||
stuckThresholdSeconds,
|
||||
VERIFY_DEFAULT_TIMEOUT_MS,
|
||||
VERIFY_POLL_INTERVAL_MS,
|
||||
type AgentPsRow,
|
||||
@@ -850,7 +855,7 @@ describe('fleet ps — command construction', () => {
|
||||
'-t',
|
||||
'=canary-pi:0.0',
|
||||
'-F',
|
||||
'#{pane_pid} #{pane_current_command} #{pane_dead} #{pane_activity}',
|
||||
'#{pane_pid} #{pane_current_command} #{pane_dead} #{pane_activity} #{window_activity} #{session_activity}',
|
||||
]);
|
||||
});
|
||||
|
||||
@@ -933,6 +938,127 @@ describe('fleet ps — heartbeat parsing', () => {
|
||||
});
|
||||
});
|
||||
|
||||
describe('fleet ps — readiness thresholds', () => {
|
||||
const savedIdle = process.env.MOSAIC_HEARTBEAT_IDLE_THRESHOLD;
|
||||
const savedStuck = process.env.MOSAIC_HEARTBEAT_STUCK_THRESHOLD;
|
||||
|
||||
afterEach(() => {
|
||||
if (savedIdle === undefined) delete process.env.MOSAIC_HEARTBEAT_IDLE_THRESHOLD;
|
||||
else process.env.MOSAIC_HEARTBEAT_IDLE_THRESHOLD = savedIdle;
|
||||
if (savedStuck === undefined) delete process.env.MOSAIC_HEARTBEAT_STUCK_THRESHOLD;
|
||||
else process.env.MOSAIC_HEARTBEAT_STUCK_THRESHOLD = savedStuck;
|
||||
});
|
||||
|
||||
it('uses default readiness thresholds when env is unset', () => {
|
||||
delete process.env.MOSAIC_HEARTBEAT_IDLE_THRESHOLD;
|
||||
delete process.env.MOSAIC_HEARTBEAT_STUCK_THRESHOLD;
|
||||
|
||||
expect(idleThresholdSeconds()).toBe(HEARTBEAT_IDLE_THRESHOLD_SECONDS);
|
||||
expect(stuckThresholdSeconds()).toBe(HEARTBEAT_STUCK_THRESHOLD_SECONDS);
|
||||
});
|
||||
|
||||
it('honors positive integer readiness thresholds from env', () => {
|
||||
process.env.MOSAIC_HEARTBEAT_IDLE_THRESHOLD = '120';
|
||||
process.env.MOSAIC_HEARTBEAT_STUCK_THRESHOLD = '480';
|
||||
|
||||
expect(idleThresholdSeconds()).toBe(120);
|
||||
expect(stuckThresholdSeconds()).toBe(480);
|
||||
});
|
||||
|
||||
it('falls back to defaults for invalid readiness thresholds', () => {
|
||||
process.env.MOSAIC_HEARTBEAT_IDLE_THRESHOLD = '0';
|
||||
process.env.MOSAIC_HEARTBEAT_STUCK_THRESHOLD = 'not-a-number';
|
||||
|
||||
expect(idleThresholdSeconds()).toBe(HEARTBEAT_IDLE_THRESHOLD_SECONDS);
|
||||
expect(stuckThresholdSeconds()).toBe(HEARTBEAT_STUCK_THRESHOLD_SECONDS);
|
||||
});
|
||||
});
|
||||
|
||||
describe('fleet ps — readiness classification', () => {
|
||||
const thresholds = { idleThresholdSeconds: 300, stuckThresholdSeconds: 900 };
|
||||
|
||||
it('reports dead when the pane is not alive', () => {
|
||||
expect(
|
||||
classifyReadiness(
|
||||
{ paneAlive: false, hbHealth: 'healthy', hbStatus: 'busy', idleSeconds: 0 },
|
||||
thresholds,
|
||||
),
|
||||
).toBe('dead');
|
||||
});
|
||||
|
||||
it('reports unknown when heartbeat health is unknown', () => {
|
||||
expect(
|
||||
classifyReadiness(
|
||||
{ paneAlive: true, hbHealth: 'unknown', hbStatus: null, idleSeconds: 0 },
|
||||
thresholds,
|
||||
),
|
||||
).toBe('unknown');
|
||||
});
|
||||
|
||||
it('reports stale when heartbeat health is stale', () => {
|
||||
expect(
|
||||
classifyReadiness(
|
||||
{ paneAlive: true, hbHealth: 'stale', hbStatus: 'busy', idleSeconds: 1_000 },
|
||||
thresholds,
|
||||
),
|
||||
).toBe('stale');
|
||||
});
|
||||
|
||||
it('reports working when heartbeat status is busy, even past stuck threshold', () => {
|
||||
expect(
|
||||
classifyReadiness(
|
||||
{ paneAlive: true, hbHealth: 'healthy', hbStatus: 'busy', idleSeconds: 2_000 },
|
||||
thresholds,
|
||||
),
|
||||
).toBe('working');
|
||||
});
|
||||
|
||||
it('reports working when pane idle seconds are unavailable', () => {
|
||||
expect(
|
||||
classifyReadiness(
|
||||
{ paneAlive: true, hbHealth: 'healthy', hbStatus: 'ok', idleSeconds: null },
|
||||
thresholds,
|
||||
),
|
||||
).toBe('working');
|
||||
});
|
||||
|
||||
it('reports stuck at the stuck threshold boundary', () => {
|
||||
expect(
|
||||
classifyReadiness(
|
||||
{ paneAlive: true, hbHealth: 'healthy', hbStatus: 'ok', idleSeconds: 900 },
|
||||
thresholds,
|
||||
),
|
||||
).toBe('stuck');
|
||||
});
|
||||
|
||||
it('reports idle at the idle threshold boundary', () => {
|
||||
expect(
|
||||
classifyReadiness(
|
||||
{ paneAlive: true, hbHealth: 'healthy', hbStatus: 'ok', idleSeconds: 300 },
|
||||
thresholds,
|
||||
),
|
||||
).toBe('idle');
|
||||
});
|
||||
|
||||
it('reports working below the idle threshold', () => {
|
||||
expect(
|
||||
classifyReadiness(
|
||||
{ paneAlive: true, hbHealth: 'healthy', hbStatus: 'ok', idleSeconds: 299 },
|
||||
thresholds,
|
||||
),
|
||||
).toBe('working');
|
||||
});
|
||||
|
||||
it('checks stuck before idle when thresholds are inverted', () => {
|
||||
expect(
|
||||
classifyReadiness(
|
||||
{ paneAlive: true, hbHealth: 'healthy', hbStatus: 'ok', idleSeconds: 350 },
|
||||
{ idleThresholdSeconds: 900, stuckThresholdSeconds: 300 },
|
||||
),
|
||||
).toBe('stuck');
|
||||
});
|
||||
});
|
||||
|
||||
describe('fleet ps — systemd show parsing', () => {
|
||||
it('parses ActiveState, SubState, UnitFileState from systemctl show output', () => {
|
||||
const output = 'ActiveState=active\nSubState=running\nUnitFileState=enabled\n';
|
||||
@@ -953,9 +1079,11 @@ describe('fleet ps — systemd show parsing', () => {
|
||||
describe('fleet ps — tmux list-panes parsing', () => {
|
||||
const NOW_MS = 1_700_000_000_000;
|
||||
|
||||
it('parses alive pane with pid, command, and idle time', () => {
|
||||
const activityEpoch = Math.floor((NOW_MS - 30_000) / 1000); // 30s ago
|
||||
const output = `12345 claude 0 ${activityEpoch}\n`;
|
||||
it('uses pane_activity when present', () => {
|
||||
const paneActivityEpoch = Math.floor((NOW_MS - 30_000) / 1000); // 30s ago
|
||||
const windowActivityEpoch = Math.floor((NOW_MS - 60_000) / 1000); // 60s ago
|
||||
const sessionActivityEpoch = Math.floor((NOW_MS - 90_000) / 1000); // 90s ago
|
||||
const output = `12345 claude 0 ${paneActivityEpoch} ${windowActivityEpoch} ${sessionActivityEpoch}\n`;
|
||||
const result = parseTmuxListPanes(output, NOW_MS);
|
||||
expect(result.pid).toBe(12345);
|
||||
expect(result.command).toBe('claude');
|
||||
@@ -963,8 +1091,45 @@ describe('fleet ps — tmux list-panes parsing', () => {
|
||||
expect(result.idleSeconds).toBe(30);
|
||||
});
|
||||
|
||||
it('uses window_activity when pane_activity is empty', () => {
|
||||
const windowActivityEpoch = Math.floor((NOW_MS - 45_000) / 1000); // 45s ago
|
||||
const sessionActivityEpoch = Math.floor((NOW_MS - 90_000) / 1000); // 90s ago
|
||||
const output = `12345 node 0 ${windowActivityEpoch} ${sessionActivityEpoch}\n`;
|
||||
expect(output).toContain('0 '); // empty pane_activity preserves index alignment
|
||||
const result = parseTmuxListPanes(output, NOW_MS);
|
||||
expect(result.pid).toBe(12345);
|
||||
expect(result.command).toBe('node');
|
||||
expect(result.dead).toBe(false);
|
||||
expect(result.idleSeconds).toBe(45);
|
||||
});
|
||||
|
||||
it('uses session_activity when pane_activity and window_activity are empty', () => {
|
||||
const sessionActivityEpoch = Math.floor((NOW_MS - 75_000) / 1000); // 75s ago
|
||||
const output = `12345 node 0 ${sessionActivityEpoch}\n`;
|
||||
const result = parseTmuxListPanes(output, NOW_MS);
|
||||
expect(result.idleSeconds).toBe(75);
|
||||
});
|
||||
|
||||
it('reports null idleSeconds when all activity sources are empty', () => {
|
||||
const output = '12345 node 0 \n';
|
||||
const result = parseTmuxListPanes(output, NOW_MS);
|
||||
expect(result.idleSeconds).toBeNull();
|
||||
});
|
||||
|
||||
it('computes exact idle seconds from now minus epoch seconds', () => {
|
||||
const activityEpoch = 1_699_999_877;
|
||||
const result = parseTmuxListPanes(`12345 claude 0 ${activityEpoch} 0 0\n`, NOW_MS);
|
||||
expect(result.idleSeconds).toBe(123);
|
||||
});
|
||||
|
||||
it('clamps future activity epochs to 0 idle seconds', () => {
|
||||
const futureActivityEpoch = Math.floor((NOW_MS + 30_000) / 1000);
|
||||
const result = parseTmuxListPanes(`12345 claude 0 ${futureActivityEpoch} 0 0\n`, NOW_MS);
|
||||
expect(result.idleSeconds).toBe(0);
|
||||
});
|
||||
|
||||
it('reports dead pane when pane_dead=1', () => {
|
||||
const output = `0 bash 1 0\n`;
|
||||
const output = `0 bash 1 0 0 0\n`;
|
||||
const result = parseTmuxListPanes(output, NOW_MS);
|
||||
expect(result.dead).toBe(true);
|
||||
});
|
||||
@@ -1324,8 +1489,9 @@ describe('fleet ps — JSON output shape (FR-6)', () => {
|
||||
// boot-enable warning: active + disabled
|
||||
expect(row.bootEnableWarning).toBe(true);
|
||||
|
||||
// heartbeat missing → unknown
|
||||
// heartbeat missing → unknown readiness preserves existing display semantics
|
||||
expect(row.heartbeat.health).toBe('unknown');
|
||||
expect(row.readiness).toBe('unknown');
|
||||
|
||||
expect(row.name).toBe('canary-pi');
|
||||
expect(row.runtime).toBe('pi');
|
||||
@@ -1387,6 +1553,92 @@ describe('fleet ps — command sequences issued', () => {
|
||||
});
|
||||
});
|
||||
|
||||
describe('fleet ps — readiness table output', () => {
|
||||
it('renders readiness in HB column and flags idle/stuck rows', async () => {
|
||||
const home = await mkdtemp(join(tmpdir(), 'mosaic-fleet-'));
|
||||
const rosterPath = join(home, 'fleet', 'roster.yaml');
|
||||
const runDir = join(home, 'fleet', 'run');
|
||||
await mkdir(runDir, { recursive: true });
|
||||
await writeFile(
|
||||
rosterPath,
|
||||
[
|
||||
'version: 1',
|
||||
'transport: tmux',
|
||||
'agents:',
|
||||
' - name: idle-agent',
|
||||
' runtime: pi',
|
||||
' - name: stuck-agent',
|
||||
' runtime: pi',
|
||||
].join('\n'),
|
||||
);
|
||||
|
||||
const nowMs = 1_700_000_000_000;
|
||||
const idleActivityEpoch = Math.floor((nowMs - 10_000) / 1000);
|
||||
const stuckActivityEpoch = Math.floor((nowMs - 40_000) / 1000);
|
||||
const hbTs = new Date(nowMs - 1_000).toISOString();
|
||||
await writeFile(join(runDir, 'idle-agent.hb'), `ts=${hbTs}\npid=111\nstatus=ok\n`);
|
||||
await writeFile(join(runDir, 'stuck-agent.hb'), `ts=${hbTs}\npid=222\nstatus=ok\n`);
|
||||
|
||||
const savedIdle = process.env.MOSAIC_HEARTBEAT_IDLE_THRESHOLD;
|
||||
const savedStuck = process.env.MOSAIC_HEARTBEAT_STUCK_THRESHOLD;
|
||||
process.env.MOSAIC_HEARTBEAT_IDLE_THRESHOLD = '5';
|
||||
process.env.MOSAIC_HEARTBEAT_STUCK_THRESHOLD = '30';
|
||||
|
||||
const dateNow = vi.spyOn(Date, 'now').mockReturnValue(nowMs);
|
||||
const runner: CommandRunner = async (command, args) => {
|
||||
const full = [command, ...args].join(' ');
|
||||
if (full.includes('list-sessions')) {
|
||||
return { stdout: 'idle-agent\nstuck-agent\n', stderr: '', exitCode: 0 };
|
||||
}
|
||||
if (full.includes('=idle-agent:0.0')) {
|
||||
return { stdout: `111 pi 0 ${idleActivityEpoch}\n`, stderr: '', exitCode: 0 };
|
||||
}
|
||||
if (full.includes('=stuck-agent:0.0')) {
|
||||
return { stdout: `222 pi 0 ${stuckActivityEpoch}\n`, stderr: '', exitCode: 0 };
|
||||
}
|
||||
if (full.includes('systemctl') && full.includes('show')) {
|
||||
return {
|
||||
stdout: 'ActiveState=active\nSubState=running\nUnitFileState=enabled\n',
|
||||
stderr: '',
|
||||
exitCode: 0,
|
||||
};
|
||||
}
|
||||
return { stdout: '', stderr: '', exitCode: 0 };
|
||||
};
|
||||
|
||||
const lines: string[] = [];
|
||||
const origLog = console.log;
|
||||
console.log = (msg: string) => {
|
||||
lines.push(msg);
|
||||
};
|
||||
|
||||
const program = new Command();
|
||||
program.exitOverride();
|
||||
registerFleetCommand(program, { runner, mosaicHome: home });
|
||||
|
||||
try {
|
||||
await program.parseAsync(['node', 'mosaic', 'fleet', 'ps']);
|
||||
} finally {
|
||||
console.log = origLog;
|
||||
dateNow.mockRestore();
|
||||
if (savedIdle === undefined) delete process.env.MOSAIC_HEARTBEAT_IDLE_THRESHOLD;
|
||||
else process.env.MOSAIC_HEARTBEAT_IDLE_THRESHOLD = savedIdle;
|
||||
if (savedStuck === undefined) delete process.env.MOSAIC_HEARTBEAT_STUCK_THRESHOLD;
|
||||
else process.env.MOSAIC_HEARTBEAT_STUCK_THRESHOLD = savedStuck;
|
||||
await rm(home, { recursive: true, force: true });
|
||||
}
|
||||
|
||||
const idleLine = lines.find((line) => line.includes('idle-agent'));
|
||||
const stuckLine = lines.find((line) => line.includes('stuck-agent'));
|
||||
expect(idleLine).toBeDefined();
|
||||
expect(idleLine).toContain('1s/idle');
|
||||
expect(idleLine).toMatch(/\bIDLE\b/);
|
||||
expect(stuckLine).toBeDefined();
|
||||
expect(stuckLine).toContain('1s/stuck');
|
||||
expect(stuckLine).toMatch(/\bSTUCK\b/);
|
||||
});
|
||||
});
|
||||
|
||||
describe('buildTmuxListSessionsCommand', () => {
|
||||
it('builds exact list-sessions command with session_name format', () => {
|
||||
expect(buildTmuxListSessionsCommand('mosaic-fleet')).toEqual([
|
||||
@@ -1514,6 +1766,7 @@ describe('fleet ps — unmanaged socket sessions', () => {
|
||||
|
||||
// driftFlag must be false for unmanaged (no roster runtime to compare)
|
||||
expect(unmanagedRow.driftFlag).toBe(false);
|
||||
expect(unmanagedRow.readiness).toBe('unknown');
|
||||
});
|
||||
|
||||
it('shows UNMANAGED flag in table output for unmanaged sessions', async () => {
|
||||
|
||||
@@ -394,6 +394,8 @@ export function buildAgentTailCommand(agentName: string, lines: number, socketNa
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
export const HEARTBEAT_INTERVAL_MS = 15_000;
|
||||
export const HEARTBEAT_IDLE_THRESHOLD_SECONDS = 300;
|
||||
export const HEARTBEAT_STUCK_THRESHOLD_SECONDS = 900;
|
||||
|
||||
/**
|
||||
* Heartbeat interval in ms, honoring MOSAIC_HEARTBEAT_INTERVAL (seconds) so the
|
||||
@@ -404,8 +406,68 @@ export function heartbeatIntervalMs(): number {
|
||||
const sec = Number.parseInt(process.env.MOSAIC_HEARTBEAT_INTERVAL ?? '', 10);
|
||||
return Number.isFinite(sec) && sec > 0 ? sec * 1000 : HEARTBEAT_INTERVAL_MS;
|
||||
}
|
||||
|
||||
/** Idle threshold in seconds, honoring MOSAIC_HEARTBEAT_IDLE_THRESHOLD. */
|
||||
export function idleThresholdSeconds(): number {
|
||||
const sec = Number.parseInt(process.env.MOSAIC_HEARTBEAT_IDLE_THRESHOLD ?? '', 10);
|
||||
return Number.isFinite(sec) && sec > 0 ? sec : HEARTBEAT_IDLE_THRESHOLD_SECONDS;
|
||||
}
|
||||
|
||||
/** Stuck threshold in seconds, honoring MOSAIC_HEARTBEAT_STUCK_THRESHOLD. */
|
||||
export function stuckThresholdSeconds(): number {
|
||||
const sec = Number.parseInt(process.env.MOSAIC_HEARTBEAT_STUCK_THRESHOLD ?? '', 10);
|
||||
return Number.isFinite(sec) && sec > 0 ? sec : HEARTBEAT_STUCK_THRESHOLD_SECONDS;
|
||||
}
|
||||
export const HEARTBEAT_HEALTHY_MULTIPLIER = 3;
|
||||
|
||||
export type ReadinessState = 'working' | 'idle' | 'stuck' | 'stale' | 'dead' | 'unknown';
|
||||
|
||||
export interface ReadinessSignals {
|
||||
paneAlive: boolean;
|
||||
hbHealth: 'healthy' | 'stale' | 'unknown';
|
||||
hbStatus: 'ok' | 'busy' | null;
|
||||
idleSeconds: number | null;
|
||||
}
|
||||
|
||||
export interface ReadinessThresholds {
|
||||
idleThresholdSeconds: number;
|
||||
stuckThresholdSeconds: number;
|
||||
}
|
||||
|
||||
/**
|
||||
* Classify whether an agent is progressing based on already-parsed heartbeat/tmux signals.
|
||||
* Best-effort and runtime-agnostic: it never probes, never throws, and preserves existing
|
||||
* unknown/stale behavior when heartbeat data is absent or old.
|
||||
*/
|
||||
export function classifyReadiness(
|
||||
signals: Partial<ReadinessSignals> | null | undefined,
|
||||
thresholds: Partial<ReadinessThresholds> | null | undefined = {},
|
||||
): ReadinessState {
|
||||
try {
|
||||
if (signals?.paneAlive !== true) return 'dead';
|
||||
if (signals.hbHealth === 'unknown' || signals.hbHealth === undefined) return 'unknown';
|
||||
if (signals.hbHealth === 'stale') return 'stale';
|
||||
if (signals.hbStatus === 'busy') return 'working';
|
||||
if (signals.idleSeconds === null || signals.idleSeconds === undefined) return 'working';
|
||||
|
||||
const idleSeconds = Number.isFinite(signals.idleSeconds) ? signals.idleSeconds : null;
|
||||
if (idleSeconds === null) return 'working';
|
||||
|
||||
const idleThreshold = Number.isFinite(thresholds?.idleThresholdSeconds)
|
||||
? Number(thresholds?.idleThresholdSeconds)
|
||||
: idleThresholdSeconds();
|
||||
const stuckThreshold = Number.isFinite(thresholds?.stuckThresholdSeconds)
|
||||
? Number(thresholds?.stuckThresholdSeconds)
|
||||
: stuckThresholdSeconds();
|
||||
|
||||
if (idleSeconds >= stuckThreshold) return 'stuck';
|
||||
if (idleSeconds >= idleThreshold) return 'idle';
|
||||
return 'working';
|
||||
} catch {
|
||||
return 'unknown';
|
||||
}
|
||||
}
|
||||
|
||||
export interface HeartbeatInfo {
|
||||
ts: Date | null;
|
||||
pid: number | null;
|
||||
@@ -429,6 +491,7 @@ export interface AgentPsRow {
|
||||
paneCommand: string | null;
|
||||
idleSeconds: number | null;
|
||||
heartbeat: HeartbeatInfo;
|
||||
readiness: ReadinessState;
|
||||
/** roster runtime !== actual pane command */
|
||||
driftFlag: boolean;
|
||||
/** active but UnitFileState=disabled */
|
||||
@@ -461,7 +524,7 @@ export function buildSystemdShowCommand(agentName: string): string[] {
|
||||
|
||||
/**
|
||||
* Returns the tmux list-panes command for an agent pane.
|
||||
* Format: `#{pane_pid} #{pane_current_command} #{pane_dead} #{pane_activity}`
|
||||
* Format: `#{pane_pid} #{pane_current_command} #{pane_dead} #{pane_activity} #{window_activity} #{session_activity}`
|
||||
*/
|
||||
export function buildTmuxListPanesCommand(agentName: string, socketName = ''): string[] {
|
||||
return [
|
||||
@@ -471,7 +534,7 @@ export function buildTmuxListPanesCommand(agentName: string, socketName = ''): s
|
||||
'-t',
|
||||
`=${agentName}:0.0`,
|
||||
'-F',
|
||||
'#{pane_pid} #{pane_current_command} #{pane_dead} #{pane_activity}',
|
||||
'#{pane_pid} #{pane_current_command} #{pane_dead} #{pane_activity} #{window_activity} #{session_activity}',
|
||||
];
|
||||
}
|
||||
|
||||
@@ -571,8 +634,8 @@ export function parseSystemdShow(output: string): {
|
||||
}
|
||||
|
||||
/**
|
||||
* Parse the output of `tmux list-panes -F '#{pane_pid} #{pane_current_command} #{pane_dead} #{pane_activity}'`
|
||||
* pane_activity is a Unix epoch timestamp (seconds).
|
||||
* Parse the output of `tmux list-panes -F '#{pane_pid} #{pane_current_command} #{pane_dead} #{pane_activity} #{window_activity} #{session_activity}'`
|
||||
* Activity fields are Unix epoch timestamps (seconds), ordered most precise to coarsest.
|
||||
*/
|
||||
export function parseTmuxListPanes(
|
||||
output: string,
|
||||
@@ -582,16 +645,18 @@ export function parseTmuxListPanes(
|
||||
if (!line) {
|
||||
return { pid: null, command: null, dead: true, idleSeconds: null };
|
||||
}
|
||||
// format: <pid> <command> <dead(0|1)> <activity_epoch>
|
||||
// format: <pid> <command> <dead(0|1)> <pane_activity> <window_activity> <session_activity>
|
||||
const parts = line.split(' ');
|
||||
const pid = parts[0] ? (Number.isFinite(Number(parts[0])) ? Number(parts[0]) : null) : null;
|
||||
const command = parts[1] ?? null;
|
||||
const dead = parts[2] === '1';
|
||||
const activityEpoch = parts[3] ? Number(parts[3]) : NaN;
|
||||
const idleSeconds =
|
||||
Number.isFinite(activityEpoch) && activityEpoch > 0
|
||||
? Math.floor((nowMs - activityEpoch * 1000) / 1000)
|
||||
: null;
|
||||
const activityEpoch = parts
|
||||
.slice(3, 6)
|
||||
.map((part) => (part ? Number(part) : NaN))
|
||||
.find((epoch) => Number.isFinite(epoch) && epoch > 0);
|
||||
const idleSeconds = activityEpoch
|
||||
? Math.max(0, Math.floor((nowMs - activityEpoch * 1000) / 1000))
|
||||
: null;
|
||||
return { pid, command, dead, idleSeconds };
|
||||
}
|
||||
|
||||
@@ -1022,6 +1087,10 @@ export function registerFleetCommand(program: Command, deps: FleetCommandDeps =
|
||||
const nowMs = Date.now();
|
||||
|
||||
const rows: AgentPsRow[] = [];
|
||||
const readinessThresholds = {
|
||||
idleThresholdSeconds: idleThresholdSeconds(),
|
||||
stuckThresholdSeconds: stuckThresholdSeconds(),
|
||||
};
|
||||
|
||||
// Build the set of roster agent names for quick lookup when filtering socket sessions.
|
||||
const rosterAgentNames = new Set(roster.agents.map((a) => a.name));
|
||||
@@ -1052,6 +1121,17 @@ export function registerFleetCommand(program: Command, deps: FleetCommandDeps =
|
||||
const bootEnableWarning =
|
||||
sysInfo.ActiveState === 'active' && sysInfo.UnitFileState === 'disabled';
|
||||
|
||||
const paneAlive = !paneInfo.dead;
|
||||
const readiness = classifyReadiness(
|
||||
{
|
||||
paneAlive,
|
||||
hbHealth: hb.health,
|
||||
hbStatus: hb.status,
|
||||
idleSeconds: paneInfo.idleSeconds,
|
||||
},
|
||||
readinessThresholds,
|
||||
);
|
||||
|
||||
rows.push({
|
||||
name: agent.name,
|
||||
tenant_id,
|
||||
@@ -1059,11 +1139,12 @@ export function registerFleetCommand(program: Command, deps: FleetCommandDeps =
|
||||
runtime: agent.runtime,
|
||||
systemdActive: sysInfo.ActiveState,
|
||||
systemdEnabled: sysInfo.UnitFileState,
|
||||
paneAlive: !paneInfo.dead,
|
||||
paneAlive,
|
||||
panePid: paneInfo.pid,
|
||||
paneCommand: paneInfo.command,
|
||||
idleSeconds: paneInfo.idleSeconds,
|
||||
heartbeat: hb,
|
||||
readiness,
|
||||
driftFlag,
|
||||
bootEnableWarning,
|
||||
managed: true,
|
||||
@@ -1110,6 +1191,17 @@ export function registerFleetCommand(program: Command, deps: FleetCommandDeps =
|
||||
const bootEnableWarning =
|
||||
sysInfo.ActiveState === 'active' && sysInfo.UnitFileState === 'disabled';
|
||||
|
||||
const paneAlive = !paneInfo.dead;
|
||||
const readiness = classifyReadiness(
|
||||
{
|
||||
paneAlive,
|
||||
hbHealth: hb.health,
|
||||
hbStatus: hb.status,
|
||||
idleSeconds: paneInfo.idleSeconds,
|
||||
},
|
||||
readinessThresholds,
|
||||
);
|
||||
|
||||
rows.push({
|
||||
name: sessionName,
|
||||
tenant_id,
|
||||
@@ -1118,11 +1210,12 @@ export function registerFleetCommand(program: Command, deps: FleetCommandDeps =
|
||||
runtime: 'unknown',
|
||||
systemdActive: sysInfo.ActiveState,
|
||||
systemdEnabled: sysInfo.UnitFileState,
|
||||
paneAlive: !paneInfo.dead,
|
||||
paneAlive,
|
||||
panePid: paneInfo.pid,
|
||||
paneCommand: paneInfo.command,
|
||||
idleSeconds: paneInfo.idleSeconds,
|
||||
heartbeat: hb,
|
||||
readiness,
|
||||
// No roster runtime to compare — drift is not meaningful for unmanaged sessions
|
||||
driftFlag: false,
|
||||
bootEnableWarning,
|
||||
@@ -1164,13 +1257,15 @@ export function registerFleetCommand(program: Command, deps: FleetCommandDeps =
|
||||
const idle = row.idleSeconds !== null ? `${row.idleSeconds}s` : '-';
|
||||
const hbAge =
|
||||
row.heartbeat.ageMs !== null
|
||||
? `${Math.round(row.heartbeat.ageMs / 1000)}s/${row.heartbeat.health}`
|
||||
? `${Math.round(row.heartbeat.ageMs / 1000)}s/${row.readiness}`
|
||||
: `unknown`;
|
||||
const model = row.heartbeat.model ?? '-';
|
||||
const flags: string[] = [];
|
||||
if (!row.managed) flags.push('UNMANAGED');
|
||||
if (row.driftFlag) flags.push('DRIFT');
|
||||
if (row.bootEnableWarning) flags.push('BOOT-ENABLE');
|
||||
if (row.readiness === 'idle') flags.push('IDLE');
|
||||
if (row.readiness === 'stuck') flags.push('STUCK');
|
||||
|
||||
console.log(
|
||||
[
|
||||
|
||||
Reference in New Issue
Block a user