Commit Graph

59 Commits

Author SHA1 Message Date
Jarvis
786762587d fix(fleet): serialize restart-lock transitions to close concurrent-breaker race (review #680)
All checks were successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/push/ci Pipeline was successful
Stale/max-wait takeover was not safe against concurrent breakers: two
breakers could both judge the lock stale and both proceed, re-introducing
the tight-loop. POSIX/Node has no content- or inode-conditional unlink or
rename, so "judge stale, then replace" can never be atomic with pure path
ops.

Serialize ALL lock transitions (acquire, release, takeover) under one
short-lived registry mutex held only across a few fs ops, never across the
restart itself. This makes check-then-mutate atomic, so exactly one breaker
can take over a stale lock while the others wait and re-evaluate.

The mutex itself uses mtime-based staleness (open('wx') creates an empty
inode before the token is written; a content check would reap a lock that is
still being acquired). The mutex populates-or-cleans-up on write failure so a
half-created mutex never leaks.

Regression coverage at two widths: a 2-breaker barrier test (exactly one
takes over, the other waits) and the existing 3-breaker test (maxActive===1,
distinct tokens, final lock released).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-24 20:26:39 -05:00
Jarvis
43ad813e0d fix(fleet): make restart-lock release/break ownership-safe (review #680)
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was canceled
Addresses the reviewer's blocker (comment 15915): release() unconditionally
unlinked restart.lock, so after a stale/max-wait break an OLD owner could
delete a NEWER owner's lock, letting a third restart interleave and defeating
the guard.

- Each acquire writes a unique owner token (randomUUID) into the lock file.
- release() only unlinks while that token is still on disk; once another caller
  has broken and re-owned the lock, the timed-out original owner's release() is
  a no-op and leaves the new owner's lock intact.
- Breaking a stale/timed-out lock now takes ownership atomically via
  write-temp + rename (atomic replace) instead of a blind unlink-then-recreate;
  a breaker that loses a concurrent takeover reads back a foreign token and
  keeps waiting rather than assuming ownership.

Regression test (does not let a timed-out owner drop a lock another restart
broke and re-owned) reproduces the three-restart interleave: R1 hangs (stale),
R2 breaks + re-owns, R1.release() must NOT drop R2's lock. Fails on the old
blind-unlink path (ENOENT), passes now. Also adds explicit single-agent
restart-path guard coverage (review should-fix).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-24 19:18:38 -05:00
Jarvis
9c2e4f0b2d fix(fleet): guard mosaic fleet restart against tight-loop re-entry race
`mosaic fleet restart` runs as a fresh process each invocation, issuing
`systemctl --user restart` for the tmux holder and then each agent. The
agent sessions live inside the holder's tmux, so restarting the holder
tears them down. With no mutual exclusion, a second restart entering
while the first is still mid-teardown (upgrade `--relaunch`, a watchdog,
or a hurried operator) interleaves: agents relaunch against a
half-torn-down holder, fail, and tight-loop.

Add a cross-process teardown-settle guard: a lock file under
`<mosaicHome>/fleet/run/restart.lock` acquired with O_CREAT|O_EXCL.
A re-entrant restart waits (bounded, injectable sleep) for the in-flight
restart to release the lock before relaunching, breaks a stale lock left
by a crashed owner, and after a max wait breaks the lock to avoid a
permanent deadlock. Both full-fleet and single-agent restart paths are
guarded; start/stop/status are unchanged.

Regression test reproduces the race: with an in-flight lock held, the
restart must wait before issuing any systemctl command — fails on the
unguarded code path, passes with the guard. Adds stale-lock-break and
lock-release coverage.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-24 19:18:38 -05:00
d7eaa19380 feat(fleet): provision roster from system-type profile (H3) (#665)
All checks were successful
ci/woodpecker/push/publish Pipeline was successful
ci/woodpecker/push/ci Pipeline was successful
2026-06-24 19:48:54 +00:00
0d17a29ebe feat(fleet): export MOSAIC_AGENT_CLASS into agent env (A3a) (#663)
All checks were successful
ci/woodpecker/push/publish Pipeline was successful
ci/woodpecker/push/ci Pipeline was successful
2026-06-24 17:19:59 +00:00
28cfecda94 feat(fleet): inject persona contract at launch (A3b) (#664)
Some checks failed
ci/woodpecker/push/publish Pipeline was successful
ci/woodpecker/push/ci Pipeline failed
2026-06-24 17:06:51 +00:00
6c84ccd0b1 feat(fleet): dedicated orchestrator persona, split from planner (#662)
All checks were successful
ci/woodpecker/push/publish Pipeline was successful
ci/woodpecker/push/ci Pipeline was successful
2026-06-24 16:42:23 +00:00
84d2757817 feat(fleet): update-surviving persona customization (H4) (#661)
All checks were successful
ci/woodpecker/push/publish Pipeline was successful
ci/woodpecker/push/ci Pipeline was successful
2026-06-24 16:21:01 +00:00
a738ac1410 feat(fleet): system-type profiles (H2) (#660)
All checks were successful
ci/woodpecker/push/publish Pipeline was successful
ci/woodpecker/push/ci Pipeline was successful
2026-06-24 16:02:25 +00:00
a094c86eea feat(fleet): North Star scope — general-purpose system, personas & system profiles (workstream H) (#658)
Some checks failed
ci/woodpecker/push/publish Pipeline was canceled
ci/woodpecker/push/ci Pipeline was canceled
2026-06-24 15:25:57 +00:00
f852250419 feat(fleet): native Mosaic backlog on @mosaicstack/db (atomic claim + TTL) (#657)
Some checks failed
ci/woodpecker/push/ci-image Pipeline was successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was canceled
2026-06-24 14:55:10 +00:00
61b1bdac2a feat(fleet): add machine-readable NORTH_STAR.yaml + Markdown projection (#656)
All checks were successful
ci/woodpecker/push/publish Pipeline was successful
ci/woodpecker/push/ci Pipeline was successful
2026-06-24 14:40:09 +00:00
937077f6be fix(fleet): report idle agents as available, reserve stuck for genuine blocks (#653)
All checks were successful
ci/woodpecker/push/publish Pipeline was successful
ci/woodpecker/push/ci Pipeline was successful
2026-06-24 13:58:22 +00:00
70661e3fab fix(fleet): derive pane idle from window activity fallback (#651)
All checks were successful
ci/woodpecker/push/publish Pipeline was successful
ci/woodpecker/push/ci Pipeline was successful
2026-06-24 06:37:45 +00:00
d887555852 feat(fleet): classify agent readiness in fleet ps (#649)
All checks were successful
ci/woodpecker/push/publish Pipeline was successful
ci/woodpecker/push/ci Pipeline was successful
2026-06-24 05:55:47 +00:00
4e84f8e850 feat(fleet): comms-block emitter + FLEET-LAUNCH runbook (#633) (#638)
Some checks failed
ci/woodpecker/push/ci Pipeline was canceled
ci/woodpecker/push/publish Pipeline was canceled
Co-authored-by: Jason Woltje <jason@diversecanvas.com>
Co-committed-by: Jason Woltje <jason@diversecanvas.com>
2026-06-22 22:23:50 +00:00
d539d61e0e refactor(fleet): rename tmux socket mosaic-factory → mosaic-fleet (#630)
All checks were successful
ci/woodpecker/push/publish Pipeline was successful
ci/woodpecker/push/ci Pipeline was successful
Co-authored-by: Jason Woltje <jason@diversecanvas.com>
Co-committed-by: Jason Woltje <jason@diversecanvas.com>
2026-06-22 21:08:43 +00:00
7342415a32 fix(fleet): consume model_hint + fix socket-default trap (stand-up fixes) (#627)
All checks were successful
ci/woodpecker/push/publish Pipeline was successful
ci/woodpecker/push/ci Pipeline was successful
Co-authored-by: Jason Woltje <jason@diversecanvas.com>
Co-committed-by: Jason Woltje <jason@diversecanvas.com>
2026-06-22 19:18:01 +00:00
095e19443b feat(fleet): onboarding-injection — comms cheat-sheet + peer roster per agent (#621)
All checks were successful
ci/woodpecker/push/publish Pipeline was successful
ci/woodpecker/push/ci Pipeline was successful
Co-authored-by: Jason Woltje <jason@diversecanvas.com>
Co-committed-by: Jason Woltje <jason@diversecanvas.com>
2026-06-22 17:54:54 +00:00
2bf66136e4 feat(fleet): enhancer role + two-agent floor (orchestrator + enhancer) (#615)
All checks were successful
ci/woodpecker/push/publish Pipeline was successful
ci/woodpecker/push/ci Pipeline was successful
Co-authored-by: Jason Woltje <jason@diversecanvas.com>
Co-committed-by: Jason Woltje <jason@diversecanvas.com>
2026-06-22 13:15:59 +00:00
d46ac40890 fix(fleet): boot-survival symmetry — disable-on-remove + add-enable + init-R5 (#612)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
Co-authored-by: Jason Woltje <jason@diversecanvas.com>
Co-committed-by: Jason Woltje <jason@diversecanvas.com>
2026-06-22 08:12:58 +00:00
23343bb7f0 feat(mosaic): P5 — overlay composer (compose-contract + *.local overlays) (#605)
Some checks are pending
ci/woodpecker/push/ci Pipeline is pending
ci/woodpecker/push/publish Pipeline is pending
Co-authored-by: Jason Woltje <jason@diversecanvas.com>
Co-committed-by: Jason Woltje <jason@diversecanvas.com>
2026-06-22 02:16:05 +00:00
6dbe452a9f fix(fleet): watch viewer-session leak + workdir test settle-race (#601)
Some checks failed
ci/woodpecker/push/publish Pipeline was canceled
ci/woodpecker/push/ci Pipeline was canceled
Co-authored-by: Jason Woltje <jason@diversecanvas.com>
Co-committed-by: Jason Woltje <jason@diversecanvas.com>
2026-06-22 01:43:21 +00:00
59c755067e feat(fleet): F3-m2 — native Pi heartbeat + model surface + mosaic_mission_status tool (#602)
Some checks are pending
ci/woodpecker/push/ci Pipeline is pending
ci/woodpecker/push/publish Pipeline is pending
Co-authored-by: Jason Woltje <jason@diversecanvas.com>
Co-committed-by: Jason Woltje <jason@diversecanvas.com>
2026-06-22 01:43:18 +00:00
6ffb27787e fix(fleet): complete HB reader/writer consistency + sidecar hardening (#599)
Some checks failed
ci/woodpecker/push/ci Pipeline was canceled
ci/woodpecker/push/publish Pipeline was canceled
Co-authored-by: Jason Woltje <jason@diversecanvas.com>
Co-committed-by: Jason Woltje <jason@diversecanvas.com>
2026-06-22 01:22:35 +00:00
67df06f1c4 feat(fleet): orchestrator-mutable fleet — fleet add/remove (F5/R9) (#596)
Some checks are pending
ci/woodpecker/push/ci Pipeline is pending
ci/woodpecker/push/publish Pipeline is pending
2026-06-21 23:26:21 +00:00
60a309d5a4 fix(fleet): heartbeat consistency — MOSAIC_HOME path + configurable interval (#595)
Some checks are pending
ci/woodpecker/push/ci Pipeline is pending
ci/woodpecker/push/publish Pipeline is pending
Co-authored-by: Jason Woltje <jason@diversecanvas.com>
Co-committed-by: Jason Woltje <jason@diversecanvas.com>
2026-06-21 23:25:53 +00:00
ca19d57bba feat(fleet): config-type presets + AI-free init wizard (F1) (#591)
Some checks are pending
ci/woodpecker/push/ci Pipeline is pending
ci/woodpecker/push/publish Pipeline is pending
2026-06-21 23:07:41 +00:00
5bef2c35eb feat(fleet): fleet ps surfaces unmanaged socket sessions (#586)
Some checks failed
ci/woodpecker/push/ci Pipeline was canceled
ci/woodpecker/push/publish Pipeline was canceled
2026-06-21 22:37:34 +00:00
afcbbb302f feat(fleet): auto-enable units on install + drift recognizes wrapped runtimes (#583)
Some checks failed
ci/woodpecker/push/ci Pipeline failed
ci/woodpecker/push/publish Pipeline was successful
2026-06-21 20:02:19 +00:00
af2eede7a9 feat(fleet): Phase-2 observability — fleet ps + watch + send verify (#579)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
2026-06-21 04:23:51 +00:00
5118be74cb feat(framework): P3 — extract Constitution (L0) + gut AGENTS dispatcher (#575)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
Co-authored-by: Jason Woltje <jason@diversecanvas.com>
Co-committed-by: Jason Woltje <jason@diversecanvas.com>
2026-06-21 03:20:32 +00:00
e834bbb83c fix(fleet): install executable tmux helpers (#568)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
2026-06-20 22:27:46 +00:00
7498fcb20d fix(fleet): preserve agent env overrides on install (#567)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
2026-06-20 21:50:46 +00:00
b5c1381e45 fix(fleet): harden operator sends for release (#565)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
2026-06-20 20:41:11 +00:00
6dfd78f643 feat(fleet): add local canary CLI (#563)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
2026-06-20 17:49:01 +00:00
87f561c1f8 fix(launch): include Pi native skill roots in 'all' mode; dedup 'discover' force-loads (#556)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
2026-06-19 19:58:09 +00:00
8c45857859 feat(launch): force-load fleet-critical Pi skills + reconcile skill docs (#555)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
2026-06-19 18:31:02 +00:00
98a771c8f8 Fix Gitea wrapper login resolution (#538)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
2026-06-12 02:34:18 +00:00
dde95a59b3 fix(pi): reduce startup skill-token overhead (#527)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
2026-06-05 18:36:42 +00:00
74fe60d8d6 feat(federation): admin controller + CLI federation commands (FED-M2-08) (#498)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
2026-04-22 04:39:46 +00:00
1a4b1ebbf1 feat(gateway,storage): mosaic gateway doctor with tier health JSON (FED-M1-06) (#475)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
2026-04-20 01:00:39 +00:00
b2cec8c6ba fix(mosaic): stop yolo runtime from leaking runtime name as first user message (#455)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
Fixes mosaicstack/stack#454

Co-authored-by: Jason Woltje <jason@diversecanvas.com>
Co-committed-by: Jason Woltje <jason@diversecanvas.com>
2026-04-11 16:57:43 +00:00
026382325c feat(framework): superpowers enforcement, typecheck hook, file-ownership rules (#451)
All checks were successful
ci/woodpecker/manual/ci Pipeline was successful
ci/woodpecker/manual/publish Pipeline was successful
Co-authored-by: Jason Woltje <jason@diversecanvas.com>
Co-committed-by: Jason Woltje <jason@diversecanvas.com>
2026-04-07 00:44:22 +00:00
732f8a49cf feat: unified first-run flow — merge wizard + gateway install (IUH-M03) (#433)
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline failed
2026-04-05 19:13:02 +00:00
cd8b1f666d feat: wizard remediation — password mask, hooks preview, headless (IUH-M02) (#431)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
2026-04-05 17:47:53 +00:00
25cada7735 feat: mosaic uninstall (IUH-M01) (#429)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
2026-04-05 17:06:21 +00:00
872c124581 feat(mosaic): unified first-run UX wizard -> gateway install -> verify (#418)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
2026-04-05 07:29:17 +00:00
a531029c5b feat(mosaic): mosaic telemetry command (M6 CU-06-01..05) (#417)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
2026-04-05 07:06:42 +00:00
119ff0eb1b fix(mosaic): gateway token recovery review remediations (#414)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
2026-04-05 06:13:29 +00:00