Compare commits

...

4 Commits

Author SHA1 Message Date
Jarvis
6661e33ca9 chore(release): mosaic CLI 0.0.45
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
Roll out H2 readiness semantics (#653): idle healthy agents report as
available, not stuck/idle; stuck reserved for genuine blocks.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-24 08:58:40 -05:00
937077f6be fix(fleet): report idle agents as available, reserve stuck for genuine blocks (#653)
All checks were successful
ci/woodpecker/push/publish Pipeline was successful
ci/woodpecker/push/ci Pipeline was successful
2026-06-24 13:58:22 +00:00
1020cfaf9b chore(release): mosaic CLI 0.0.44 (#652)
All checks were successful
ci/woodpecker/push/publish Pipeline was successful
ci/woodpecker/push/ci Pipeline was successful
2026-06-24 06:49:04 +00:00
70661e3fab fix(fleet): derive pane idle from window activity fallback (#651)
All checks were successful
ci/woodpecker/push/publish Pipeline was successful
ci/woodpecker/push/ci Pipeline was successful
2026-06-24 06:37:45 +00:00
5 changed files with 233 additions and 92 deletions

View File

@@ -0,0 +1,53 @@
# H1b — tmux pane idle signal wiring
## Objective
Feed `classifyReadiness()` a real idle signal on tmux 3.4 by deriving `idleSeconds` from the first available tmux timestamp source: pane activity, then window activity, then session activity.
## Scope
- `packages/mosaic/src/commands/fleet.ts`
- Extend `buildTmuxListPanesCommand()` format to include `#{window_activity}` and `#{session_activity}` after the existing fields.
- Update `parseTmuxListPanes()` to choose the first non-empty finite positive timestamp and clamp future idle values to 0.
- `packages/mosaic/src/commands/fleet.spec.ts`
- Cover pane/window/session activity parsing behavior, empty-field index alignment, null idle, future clamping, math correctness, and exact tmux format.
## Out of Scope
- No changes to `classifyReadiness()`, thresholds, `AgentPsRow`, or `fleet ps` rendering.
- No merge by worker; orchestrator routes review/merge.
- Workers do not modify `docs/TASKS.md`.
## PRD Alignment
Aligned with `docs/fleet/PRD.md` FR-1 and acceptance criteria for truthful `mosaic fleet ps` pane/pid/idle observability.
## Plan
1. Sync branch from latest `origin/main` and install dependencies with required pnpm env.
2. Add/confirm reproducer tests for tmux 3.4 empty `pane_activity` and new fallback behavior.
3. Implement the focused parser/format change only.
4. Run required build, baseline gates, fleet vitest, and independent review.
5. Run pre-push queue guard, push branch, and open PR to `main` with Mosaic wrapper.
## Progress
- 2026-06-24: Branch `fix/fleet-pane-idle-activity` created from `origin/main` @ `ec8dd7c` after fetching.
- 2026-06-24: Session-start generated local `.mosaic/orchestrator/*` changes on the previous release branch; stashed as `coder1 session-start state before H1b` to keep this branch clean.
- 2026-06-24: Added TDD coverage for the tmux 3.4 production case (`pane_activity` empty, `window_activity` populated), exact new list-panes format, null/future/multiple-source behavior.
- 2026-06-24: Implemented parser fallback without changing readiness classifier thresholds or render shape.
## Verification Evidence
- `pnpm install --store-dir "$HOME/.pnpm-store"` — pass.
- Reproducer before implementation: `pnpm --filter @mosaicstack/mosaic exec vitest run src/commands/fleet.spec.ts` — failed as expected (old format, no fallback, negative future idle).
- `npx turbo build --filter=@mosaicstack/mosaic^...` — pass, 12/12 tasks successful.
- `pnpm typecheck` — pass, 41/41 tasks successful.
- `pnpm lint` — pass, 23/23 tasks successful.
- `pnpm format:check` — pass, all matched files use Prettier style.
- `pnpm --filter @mosaicstack/mosaic exec vitest run src/commands/fleet.spec.ts` — pass, 176 tests.
- `~/.config/mosaic/tools/codex/codex-code-review.sh --uncommitted` — approve, 0 findings (reviewed supplied diff; sandbox file-inspection limitation noted by tool).
## Risks / Blockers
- No current blocker.

View File

@@ -0,0 +1,70 @@
# H2 — readiness semantics: available, not stuck
## Objective
Correct fleet readiness semantics so a healthy long-idle agent is reported as `available` (good/assignable) instead of `stuck` (fault). Reserve `stuck` in the type/JSON value space for future positive block evidence.
## Scope
- `packages/mosaic/src/commands/fleet.ts`
- replace `idle` readiness state with `available`
- keep `stuck` in the union but stop emitting it from idle-only heuristics
- remove stuck threshold helper/env handling
- remove IDLE/STUCK alarm flags from table rendering
- `packages/mosaic/src/commands/fleet.spec.ts`
- update classifier branch/boundary tests
- assert very long idle maps to `available`, not `stuck`
- update table/JSON assertions for available with no alarm flags
- remove stuck threshold helper tests
## Acceptance Criteria
- `classifyReadiness()` remains pure/total/never-throw and maps:
- dead/stale/unknown unchanged
- busy/null/undefined/non-finite idle to `working`
- idle >= activity threshold to `available`
- idle < activity threshold to `working`
- No idle-derived path emits `stuck`.
- `MOSAIC_HEARTBEAT_IDLE_THRESHOLD` remains backward compatible as the working→available activity threshold.
- `MOSAIC_HEARTBEAT_STUCK_THRESHOLD` and helper/default are removed.
- `fleet ps` keeps the idle-seconds column header `IDLE`, renders `available` in HB label, and does not add IDLE/STUCK warning flags.
- Local gates green: build precheck, typecheck, lint, format:check, fleet vitest.
- PR opened against `main`; no merge by worker.
## Constraints / Assumptions
- Source branch: `origin/main` @ `1020cfa`.
- `docs/TASKS.md` is orchestrator-owned; worker will not modify it.
- Documentation impact is captured in this scratchpad and PR description; no user/admin guide behavior beyond CLI readiness label semantics.
## Plan
1. Install dependencies with requested PNPM environment.
2. Inspect current H1/H1b readiness implementation and tests.
3. Update classifier types/helpers/rendering.
4. Update focused tests.
5. Run build precheck + required gates.
6. Run automated code review, remediate any findings.
7. Queue guard, push, open PR.
## Progress
- 2026-06-24: Branch created from `origin/main` @ `1020cfa`.
- 2026-06-24: Replaced idle-derived `idle`/`stuck` outputs with `available`; retained `stuck` in type union for future positive block evidence.
- 2026-06-24: Removed stuck threshold env/helper plumbing and IDLE/STUCK alarm flags.
- 2026-06-24: Updated classifier and table-render tests for available semantics.
## Verification Evidence
- `pnpm install --store-dir "$HOME/.pnpm-store"` — pass.
- `npx turbo build --filter=@mosaicstack/mosaic^...` — pass, 12/12 tasks successful.
- `pnpm typecheck` — pass, 41/41 tasks successful.
- `pnpm lint` — pass, 23/23 tasks successful.
- `pnpm format:check` — pass, all matched files use Prettier style.
- `pnpm --filter @mosaicstack/mosaic exec vitest run src/commands/fleet.spec.ts` — pass, 177 tests.
- `~/.config/mosaic/tools/codex/codex-code-review.sh --uncommitted` — approve, 0 findings (reviewed supplied diff; sandbox file-inspection limitation noted by tool).
## Risks / Blockers
- No current blocker.
- Review tool could not inspect repo files directly due sandbox wrapper limitation, but it reviewed the supplied diff and approved with no findings.

View File

@@ -1,6 +1,6 @@
{ {
"name": "@mosaicstack/mosaic", "name": "@mosaicstack/mosaic",
"version": "0.0.43", "version": "0.0.45",
"repository": { "repository": {
"type": "git", "type": "git",
"url": "https://git.mosaicstack.dev/mosaicstack/stack.git", "url": "https://git.mosaicstack.dev/mosaicstack/stack.git",

View File

@@ -27,7 +27,6 @@ import {
enableFleetUnits, enableFleetUnits,
FLEET_PROFILES, FLEET_PROFILES,
HEARTBEAT_IDLE_THRESHOLD_SECONDS, HEARTBEAT_IDLE_THRESHOLD_SECONDS,
HEARTBEAT_STUCK_THRESHOLD_SECONDS,
generateAgentEnv, generateAgentEnv,
getDefaultOperatorSourceLabel, getDefaultOperatorSourceLabel,
getDefaultTenantAndHost, getDefaultTenantAndHost,
@@ -48,7 +47,6 @@ import {
resolvePresetFilename, resolvePresetFilename,
RUNTIME_ACCEPTABLE_COMMANDS, RUNTIME_ACCEPTABLE_COMMANDS,
serializeRosterToYaml, serializeRosterToYaml,
stuckThresholdSeconds,
VERIFY_DEFAULT_TIMEOUT_MS, VERIFY_DEFAULT_TIMEOUT_MS,
VERIFY_POLL_INTERVAL_MS, VERIFY_POLL_INTERVAL_MS,
type AgentPsRow, type AgentPsRow,
@@ -855,7 +853,7 @@ describe('fleet ps — command construction', () => {
'-t', '-t',
'=canary-pi:0.0', '=canary-pi:0.0',
'-F', '-F',
'#{pane_pid} #{pane_current_command} #{pane_dead} #{pane_activity}', '#{pane_pid} #{pane_current_command} #{pane_dead} #{pane_activity} #{window_activity} #{session_activity}',
]); ]);
}); });
@@ -940,42 +938,33 @@ describe('fleet ps — heartbeat parsing', () => {
describe('fleet ps — readiness thresholds', () => { describe('fleet ps — readiness thresholds', () => {
const savedIdle = process.env.MOSAIC_HEARTBEAT_IDLE_THRESHOLD; const savedIdle = process.env.MOSAIC_HEARTBEAT_IDLE_THRESHOLD;
const savedStuck = process.env.MOSAIC_HEARTBEAT_STUCK_THRESHOLD;
afterEach(() => { afterEach(() => {
if (savedIdle === undefined) delete process.env.MOSAIC_HEARTBEAT_IDLE_THRESHOLD; if (savedIdle === undefined) delete process.env.MOSAIC_HEARTBEAT_IDLE_THRESHOLD;
else process.env.MOSAIC_HEARTBEAT_IDLE_THRESHOLD = savedIdle; else process.env.MOSAIC_HEARTBEAT_IDLE_THRESHOLD = savedIdle;
if (savedStuck === undefined) delete process.env.MOSAIC_HEARTBEAT_STUCK_THRESHOLD;
else process.env.MOSAIC_HEARTBEAT_STUCK_THRESHOLD = savedStuck;
}); });
it('uses default readiness thresholds when env is unset', () => { it('uses the default activity threshold when env is unset', () => {
delete process.env.MOSAIC_HEARTBEAT_IDLE_THRESHOLD; delete process.env.MOSAIC_HEARTBEAT_IDLE_THRESHOLD;
delete process.env.MOSAIC_HEARTBEAT_STUCK_THRESHOLD;
expect(idleThresholdSeconds()).toBe(HEARTBEAT_IDLE_THRESHOLD_SECONDS); expect(idleThresholdSeconds()).toBe(HEARTBEAT_IDLE_THRESHOLD_SECONDS);
expect(stuckThresholdSeconds()).toBe(HEARTBEAT_STUCK_THRESHOLD_SECONDS);
}); });
it('honors positive integer readiness thresholds from env', () => { it('honors a positive integer activity threshold from env', () => {
process.env.MOSAIC_HEARTBEAT_IDLE_THRESHOLD = '120'; process.env.MOSAIC_HEARTBEAT_IDLE_THRESHOLD = '120';
process.env.MOSAIC_HEARTBEAT_STUCK_THRESHOLD = '480';
expect(idleThresholdSeconds()).toBe(120); expect(idleThresholdSeconds()).toBe(120);
expect(stuckThresholdSeconds()).toBe(480);
}); });
it('falls back to defaults for invalid readiness thresholds', () => { it('falls back to the default for invalid activity thresholds', () => {
process.env.MOSAIC_HEARTBEAT_IDLE_THRESHOLD = '0'; process.env.MOSAIC_HEARTBEAT_IDLE_THRESHOLD = '0';
process.env.MOSAIC_HEARTBEAT_STUCK_THRESHOLD = 'not-a-number';
expect(idleThresholdSeconds()).toBe(HEARTBEAT_IDLE_THRESHOLD_SECONDS); expect(idleThresholdSeconds()).toBe(HEARTBEAT_IDLE_THRESHOLD_SECONDS);
expect(stuckThresholdSeconds()).toBe(HEARTBEAT_STUCK_THRESHOLD_SECONDS);
}); });
}); });
describe('fleet ps — readiness classification', () => { describe('fleet ps — readiness classification', () => {
const thresholds = { idleThresholdSeconds: 300, stuckThresholdSeconds: 900 }; const thresholds = { idleThresholdSeconds: 300 };
it('reports dead when the pane is not alive', () => { it('reports dead when the pane is not alive', () => {
expect( expect(
@@ -1004,7 +993,7 @@ describe('fleet ps — readiness classification', () => {
).toBe('stale'); ).toBe('stale');
}); });
it('reports working when heartbeat status is busy, even past stuck threshold', () => { it('reports working when heartbeat status is busy, even after the activity threshold', () => {
expect( expect(
classifyReadiness( classifyReadiness(
{ paneAlive: true, hbHealth: 'healthy', hbStatus: 'busy', idleSeconds: 2_000 }, { paneAlive: true, hbHealth: 'healthy', hbStatus: 'busy', idleSeconds: 2_000 },
@@ -1013,7 +1002,7 @@ describe('fleet ps — readiness classification', () => {
).toBe('working'); ).toBe('working');
}); });
it('reports working when pane idle seconds are unavailable', () => { it('reports working when pane idle seconds are null', () => {
expect( expect(
classifyReadiness( classifyReadiness(
{ paneAlive: true, hbHealth: 'healthy', hbStatus: 'ok', idleSeconds: null }, { paneAlive: true, hbHealth: 'healthy', hbStatus: 'ok', idleSeconds: null },
@@ -1022,25 +1011,31 @@ describe('fleet ps — readiness classification', () => {
).toBe('working'); ).toBe('working');
}); });
it('reports stuck at the stuck threshold boundary', () => { it('reports working when pane idle seconds are undefined', () => {
expect( expect(
classifyReadiness( classifyReadiness({ paneAlive: true, hbHealth: 'healthy', hbStatus: 'ok' }, thresholds),
{ paneAlive: true, hbHealth: 'healthy', hbStatus: 'ok', idleSeconds: 900 }, ).toBe('working');
thresholds,
),
).toBe('stuck');
}); });
it('reports idle at the idle threshold boundary', () => { it('reports working when pane idle seconds are non-finite', () => {
expect(
classifyReadiness(
{ paneAlive: true, hbHealth: 'healthy', hbStatus: 'ok', idleSeconds: Number.NaN },
thresholds,
),
).toBe('working');
});
it('reports available at the activity threshold boundary', () => {
expect( expect(
classifyReadiness( classifyReadiness(
{ paneAlive: true, hbHealth: 'healthy', hbStatus: 'ok', idleSeconds: 300 }, { paneAlive: true, hbHealth: 'healthy', hbStatus: 'ok', idleSeconds: 300 },
thresholds, thresholds,
), ),
).toBe('idle'); ).toBe('available');
}); });
it('reports working below the idle threshold', () => { it('reports working below the activity threshold', () => {
expect( expect(
classifyReadiness( classifyReadiness(
{ paneAlive: true, hbHealth: 'healthy', hbStatus: 'ok', idleSeconds: 299 }, { paneAlive: true, hbHealth: 'healthy', hbStatus: 'ok', idleSeconds: 299 },
@@ -1049,13 +1044,14 @@ describe('fleet ps — readiness classification', () => {
).toBe('working'); ).toBe('working');
}); });
it('checks stuck before idle when thresholds are inverted', () => { it('reports very long idle as available, not stuck', () => {
expect( const readiness = classifyReadiness(
classifyReadiness( { paneAlive: true, hbHealth: 'healthy', hbStatus: 'ok', idleSeconds: 100_000 },
{ paneAlive: true, hbHealth: 'healthy', hbStatus: 'ok', idleSeconds: 350 }, thresholds,
{ idleThresholdSeconds: 900, stuckThresholdSeconds: 300 }, );
),
).toBe('stuck'); expect(readiness).toBe('available');
expect(readiness).not.toBe('stuck');
}); });
}); });
@@ -1079,9 +1075,11 @@ describe('fleet ps — systemd show parsing', () => {
describe('fleet ps — tmux list-panes parsing', () => { describe('fleet ps — tmux list-panes parsing', () => {
const NOW_MS = 1_700_000_000_000; const NOW_MS = 1_700_000_000_000;
it('parses alive pane with pid, command, and idle time', () => { it('uses pane_activity when present', () => {
const activityEpoch = Math.floor((NOW_MS - 30_000) / 1000); // 30s ago const paneActivityEpoch = Math.floor((NOW_MS - 30_000) / 1000); // 30s ago
const output = `12345 claude 0 ${activityEpoch}\n`; const windowActivityEpoch = Math.floor((NOW_MS - 60_000) / 1000); // 60s ago
const sessionActivityEpoch = Math.floor((NOW_MS - 90_000) / 1000); // 90s ago
const output = `12345 claude 0 ${paneActivityEpoch} ${windowActivityEpoch} ${sessionActivityEpoch}\n`;
const result = parseTmuxListPanes(output, NOW_MS); const result = parseTmuxListPanes(output, NOW_MS);
expect(result.pid).toBe(12345); expect(result.pid).toBe(12345);
expect(result.command).toBe('claude'); expect(result.command).toBe('claude');
@@ -1089,8 +1087,45 @@ describe('fleet ps — tmux list-panes parsing', () => {
expect(result.idleSeconds).toBe(30); expect(result.idleSeconds).toBe(30);
}); });
it('uses window_activity when pane_activity is empty', () => {
const windowActivityEpoch = Math.floor((NOW_MS - 45_000) / 1000); // 45s ago
const sessionActivityEpoch = Math.floor((NOW_MS - 90_000) / 1000); // 90s ago
const output = `12345 node 0 ${windowActivityEpoch} ${sessionActivityEpoch}\n`;
expect(output).toContain('0 '); // empty pane_activity preserves index alignment
const result = parseTmuxListPanes(output, NOW_MS);
expect(result.pid).toBe(12345);
expect(result.command).toBe('node');
expect(result.dead).toBe(false);
expect(result.idleSeconds).toBe(45);
});
it('uses session_activity when pane_activity and window_activity are empty', () => {
const sessionActivityEpoch = Math.floor((NOW_MS - 75_000) / 1000); // 75s ago
const output = `12345 node 0 ${sessionActivityEpoch}\n`;
const result = parseTmuxListPanes(output, NOW_MS);
expect(result.idleSeconds).toBe(75);
});
it('reports null idleSeconds when all activity sources are empty', () => {
const output = '12345 node 0 \n';
const result = parseTmuxListPanes(output, NOW_MS);
expect(result.idleSeconds).toBeNull();
});
it('computes exact idle seconds from now minus epoch seconds', () => {
const activityEpoch = 1_699_999_877;
const result = parseTmuxListPanes(`12345 claude 0 ${activityEpoch} 0 0\n`, NOW_MS);
expect(result.idleSeconds).toBe(123);
});
it('clamps future activity epochs to 0 idle seconds', () => {
const futureActivityEpoch = Math.floor((NOW_MS + 30_000) / 1000);
const result = parseTmuxListPanes(`12345 claude 0 ${futureActivityEpoch} 0 0\n`, NOW_MS);
expect(result.idleSeconds).toBe(0);
});
it('reports dead pane when pane_dead=1', () => { it('reports dead pane when pane_dead=1', () => {
const output = `0 bash 1 0\n`; const output = `0 bash 1 0 0 0\n`;
const result = parseTmuxListPanes(output, NOW_MS); const result = parseTmuxListPanes(output, NOW_MS);
expect(result.dead).toBe(true); expect(result.dead).toBe(true);
}); });
@@ -1515,7 +1550,7 @@ describe('fleet ps — command sequences issued', () => {
}); });
describe('fleet ps — readiness table output', () => { describe('fleet ps — readiness table output', () => {
it('renders readiness in HB column and flags idle/stuck rows', async () => { it('renders available in HB column without idle/stuck alarm flags', async () => {
const home = await mkdtemp(join(tmpdir(), 'mosaic-fleet-')); const home = await mkdtemp(join(tmpdir(), 'mosaic-fleet-'));
const rosterPath = join(home, 'fleet', 'roster.yaml'); const rosterPath = join(home, 'fleet', 'roster.yaml');
const runDir = join(home, 'fleet', 'run'); const runDir = join(home, 'fleet', 'run');
@@ -1526,36 +1561,34 @@ describe('fleet ps — readiness table output', () => {
'version: 1', 'version: 1',
'transport: tmux', 'transport: tmux',
'agents:', 'agents:',
' - name: idle-agent', ' - name: working-agent',
' runtime: pi', ' runtime: pi',
' - name: stuck-agent', ' - name: available-agent',
' runtime: pi', ' runtime: pi',
].join('\n'), ].join('\n'),
); );
const nowMs = 1_700_000_000_000; const nowMs = 1_700_000_000_000;
const idleActivityEpoch = Math.floor((nowMs - 10_000) / 1000); const workingActivityEpoch = Math.floor((nowMs - 2_000) / 1000);
const stuckActivityEpoch = Math.floor((nowMs - 40_000) / 1000); const availableActivityEpoch = Math.floor((nowMs - 40_000) / 1000);
const hbTs = new Date(nowMs - 1_000).toISOString(); const hbTs = new Date(nowMs - 1_000).toISOString();
await writeFile(join(runDir, 'idle-agent.hb'), `ts=${hbTs}\npid=111\nstatus=ok\n`); await writeFile(join(runDir, 'working-agent.hb'), `ts=${hbTs}\npid=111\nstatus=ok\n`);
await writeFile(join(runDir, 'stuck-agent.hb'), `ts=${hbTs}\npid=222\nstatus=ok\n`); await writeFile(join(runDir, 'available-agent.hb'), `ts=${hbTs}\npid=222\nstatus=ok\n`);
const savedIdle = process.env.MOSAIC_HEARTBEAT_IDLE_THRESHOLD; const savedIdle = process.env.MOSAIC_HEARTBEAT_IDLE_THRESHOLD;
const savedStuck = process.env.MOSAIC_HEARTBEAT_STUCK_THRESHOLD;
process.env.MOSAIC_HEARTBEAT_IDLE_THRESHOLD = '5'; process.env.MOSAIC_HEARTBEAT_IDLE_THRESHOLD = '5';
process.env.MOSAIC_HEARTBEAT_STUCK_THRESHOLD = '30';
const dateNow = vi.spyOn(Date, 'now').mockReturnValue(nowMs); const dateNow = vi.spyOn(Date, 'now').mockReturnValue(nowMs);
const runner: CommandRunner = async (command, args) => { const runner: CommandRunner = async (command, args) => {
const full = [command, ...args].join(' '); const full = [command, ...args].join(' ');
if (full.includes('list-sessions')) { if (full.includes('list-sessions')) {
return { stdout: 'idle-agent\nstuck-agent\n', stderr: '', exitCode: 0 }; return { stdout: 'working-agent\navailable-agent\n', stderr: '', exitCode: 0 };
} }
if (full.includes('=idle-agent:0.0')) { if (full.includes('=working-agent:0.0')) {
return { stdout: `111 pi 0 ${idleActivityEpoch}\n`, stderr: '', exitCode: 0 }; return { stdout: `111 pi 0 ${workingActivityEpoch}\n`, stderr: '', exitCode: 0 };
} }
if (full.includes('=stuck-agent:0.0')) { if (full.includes('=available-agent:0.0')) {
return { stdout: `222 pi 0 ${stuckActivityEpoch}\n`, stderr: '', exitCode: 0 }; return { stdout: `222 pi 0 ${availableActivityEpoch}\n`, stderr: '', exitCode: 0 };
} }
if (full.includes('systemctl') && full.includes('show')) { if (full.includes('systemctl') && full.includes('show')) {
return { return {
@@ -1584,19 +1617,17 @@ describe('fleet ps — readiness table output', () => {
dateNow.mockRestore(); dateNow.mockRestore();
if (savedIdle === undefined) delete process.env.MOSAIC_HEARTBEAT_IDLE_THRESHOLD; if (savedIdle === undefined) delete process.env.MOSAIC_HEARTBEAT_IDLE_THRESHOLD;
else process.env.MOSAIC_HEARTBEAT_IDLE_THRESHOLD = savedIdle; else process.env.MOSAIC_HEARTBEAT_IDLE_THRESHOLD = savedIdle;
if (savedStuck === undefined) delete process.env.MOSAIC_HEARTBEAT_STUCK_THRESHOLD;
else process.env.MOSAIC_HEARTBEAT_STUCK_THRESHOLD = savedStuck;
await rm(home, { recursive: true, force: true }); await rm(home, { recursive: true, force: true });
} }
const idleLine = lines.find((line) => line.includes('idle-agent')); const workingLine = lines.find((line) => line.includes('working-agent'));
const stuckLine = lines.find((line) => line.includes('stuck-agent')); const availableLine = lines.find((line) => line.includes('available-agent'));
expect(idleLine).toBeDefined(); expect(workingLine).toBeDefined();
expect(idleLine).toContain('1s/idle'); expect(workingLine).toContain('1s/working');
expect(idleLine).toMatch(/\bIDLE\b/); expect(availableLine).toBeDefined();
expect(stuckLine).toBeDefined(); expect(availableLine).toContain('1s/available');
expect(stuckLine).toContain('1s/stuck'); expect(availableLine).not.toMatch(/\bIDLE\b/);
expect(stuckLine).toMatch(/\bSTUCK\b/); expect(availableLine).not.toMatch(/\bSTUCK\b/);
}); });
}); });

View File

@@ -395,7 +395,6 @@ export function buildAgentTailCommand(agentName: string, lines: number, socketNa
export const HEARTBEAT_INTERVAL_MS = 15_000; export const HEARTBEAT_INTERVAL_MS = 15_000;
export const HEARTBEAT_IDLE_THRESHOLD_SECONDS = 300; export const HEARTBEAT_IDLE_THRESHOLD_SECONDS = 300;
export const HEARTBEAT_STUCK_THRESHOLD_SECONDS = 900;
/** /**
* Heartbeat interval in ms, honoring MOSAIC_HEARTBEAT_INTERVAL (seconds) so the * Heartbeat interval in ms, honoring MOSAIC_HEARTBEAT_INTERVAL (seconds) so the
@@ -407,20 +406,14 @@ export function heartbeatIntervalMs(): number {
return Number.isFinite(sec) && sec > 0 ? sec * 1000 : HEARTBEAT_INTERVAL_MS; return Number.isFinite(sec) && sec > 0 ? sec * 1000 : HEARTBEAT_INTERVAL_MS;
} }
/** Idle threshold in seconds, honoring MOSAIC_HEARTBEAT_IDLE_THRESHOLD. */ /** Activity threshold in seconds, honoring MOSAIC_HEARTBEAT_IDLE_THRESHOLD. */
export function idleThresholdSeconds(): number { export function idleThresholdSeconds(): number {
const sec = Number.parseInt(process.env.MOSAIC_HEARTBEAT_IDLE_THRESHOLD ?? '', 10); const sec = Number.parseInt(process.env.MOSAIC_HEARTBEAT_IDLE_THRESHOLD ?? '', 10);
return Number.isFinite(sec) && sec > 0 ? sec : HEARTBEAT_IDLE_THRESHOLD_SECONDS; return Number.isFinite(sec) && sec > 0 ? sec : HEARTBEAT_IDLE_THRESHOLD_SECONDS;
} }
/** Stuck threshold in seconds, honoring MOSAIC_HEARTBEAT_STUCK_THRESHOLD. */
export function stuckThresholdSeconds(): number {
const sec = Number.parseInt(process.env.MOSAIC_HEARTBEAT_STUCK_THRESHOLD ?? '', 10);
return Number.isFinite(sec) && sec > 0 ? sec : HEARTBEAT_STUCK_THRESHOLD_SECONDS;
}
export const HEARTBEAT_HEALTHY_MULTIPLIER = 3; export const HEARTBEAT_HEALTHY_MULTIPLIER = 3;
export type ReadinessState = 'working' | 'idle' | 'stuck' | 'stale' | 'dead' | 'unknown'; export type ReadinessState = 'working' | 'available' | 'stuck' | 'stale' | 'dead' | 'unknown';
export interface ReadinessSignals { export interface ReadinessSignals {
paneAlive: boolean; paneAlive: boolean;
@@ -431,7 +424,6 @@ export interface ReadinessSignals {
export interface ReadinessThresholds { export interface ReadinessThresholds {
idleThresholdSeconds: number; idleThresholdSeconds: number;
stuckThresholdSeconds: number;
} }
/** /**
@@ -456,12 +448,8 @@ export function classifyReadiness(
const idleThreshold = Number.isFinite(thresholds?.idleThresholdSeconds) const idleThreshold = Number.isFinite(thresholds?.idleThresholdSeconds)
? Number(thresholds?.idleThresholdSeconds) ? Number(thresholds?.idleThresholdSeconds)
: idleThresholdSeconds(); : idleThresholdSeconds();
const stuckThreshold = Number.isFinite(thresholds?.stuckThresholdSeconds) // Follow-up: stuck pending per-agent assignment awareness: assigned task + idle past threshold => stuck.
? Number(thresholds?.stuckThresholdSeconds) if (idleSeconds >= idleThreshold) return 'available';
: stuckThresholdSeconds();
if (idleSeconds >= stuckThreshold) return 'stuck';
if (idleSeconds >= idleThreshold) return 'idle';
return 'working'; return 'working';
} catch { } catch {
return 'unknown'; return 'unknown';
@@ -524,7 +512,7 @@ export function buildSystemdShowCommand(agentName: string): string[] {
/** /**
* Returns the tmux list-panes command for an agent pane. * Returns the tmux list-panes command for an agent pane.
* Format: `#{pane_pid} #{pane_current_command} #{pane_dead} #{pane_activity}` * Format: `#{pane_pid} #{pane_current_command} #{pane_dead} #{pane_activity} #{window_activity} #{session_activity}`
*/ */
export function buildTmuxListPanesCommand(agentName: string, socketName = ''): string[] { export function buildTmuxListPanesCommand(agentName: string, socketName = ''): string[] {
return [ return [
@@ -534,7 +522,7 @@ export function buildTmuxListPanesCommand(agentName: string, socketName = ''): s
'-t', '-t',
`=${agentName}:0.0`, `=${agentName}:0.0`,
'-F', '-F',
'#{pane_pid} #{pane_current_command} #{pane_dead} #{pane_activity}', '#{pane_pid} #{pane_current_command} #{pane_dead} #{pane_activity} #{window_activity} #{session_activity}',
]; ];
} }
@@ -634,8 +622,8 @@ export function parseSystemdShow(output: string): {
} }
/** /**
* Parse the output of `tmux list-panes -F '#{pane_pid} #{pane_current_command} #{pane_dead} #{pane_activity}'` * Parse the output of `tmux list-panes -F '#{pane_pid} #{pane_current_command} #{pane_dead} #{pane_activity} #{window_activity} #{session_activity}'`
* pane_activity is a Unix epoch timestamp (seconds). * Activity fields are Unix epoch timestamps (seconds), ordered most precise to coarsest.
*/ */
export function parseTmuxListPanes( export function parseTmuxListPanes(
output: string, output: string,
@@ -645,16 +633,18 @@ export function parseTmuxListPanes(
if (!line) { if (!line) {
return { pid: null, command: null, dead: true, idleSeconds: null }; return { pid: null, command: null, dead: true, idleSeconds: null };
} }
// format: <pid> <command> <dead(0|1)> <activity_epoch> // format: <pid> <command> <dead(0|1)> <pane_activity> <window_activity> <session_activity>
const parts = line.split(' '); const parts = line.split(' ');
const pid = parts[0] ? (Number.isFinite(Number(parts[0])) ? Number(parts[0]) : null) : null; const pid = parts[0] ? (Number.isFinite(Number(parts[0])) ? Number(parts[0]) : null) : null;
const command = parts[1] ?? null; const command = parts[1] ?? null;
const dead = parts[2] === '1'; const dead = parts[2] === '1';
const activityEpoch = parts[3] ? Number(parts[3]) : NaN; const activityEpoch = parts
const idleSeconds = .slice(3, 6)
Number.isFinite(activityEpoch) && activityEpoch > 0 .map((part) => (part ? Number(part) : NaN))
? Math.floor((nowMs - activityEpoch * 1000) / 1000) .find((epoch) => Number.isFinite(epoch) && epoch > 0);
: null; const idleSeconds = activityEpoch
? Math.max(0, Math.floor((nowMs - activityEpoch * 1000) / 1000))
: null;
return { pid, command, dead, idleSeconds }; return { pid, command, dead, idleSeconds };
} }
@@ -1087,7 +1077,6 @@ export function registerFleetCommand(program: Command, deps: FleetCommandDeps =
const rows: AgentPsRow[] = []; const rows: AgentPsRow[] = [];
const readinessThresholds = { const readinessThresholds = {
idleThresholdSeconds: idleThresholdSeconds(), idleThresholdSeconds: idleThresholdSeconds(),
stuckThresholdSeconds: stuckThresholdSeconds(),
}; };
// Build the set of roster agent names for quick lookup when filtering socket sessions. // Build the set of roster agent names for quick lookup when filtering socket sessions.
@@ -1262,8 +1251,6 @@ export function registerFleetCommand(program: Command, deps: FleetCommandDeps =
if (!row.managed) flags.push('UNMANAGED'); if (!row.managed) flags.push('UNMANAGED');
if (row.driftFlag) flags.push('DRIFT'); if (row.driftFlag) flags.push('DRIFT');
if (row.bootEnableWarning) flags.push('BOOT-ENABLE'); if (row.bootEnableWarning) flags.push('BOOT-ENABLE');
if (row.readiness === 'idle') flags.push('IDLE');
if (row.readiness === 'stuck') flags.push('STUCK');
console.log( console.log(
[ [