Tests 3, 4, 5 previously returned synthetic pane PIDs (99999/99998/99997)
from their fake list-panes shims but did not set MOSAIC_HEARTBEAT_RUN_DIR,
causing the launcher to fall back to the real ~/.config/mosaic/fleet/run
and potentially spawn a background sidecar against an arbitrary host PID.
Fix:
- list-panes in tests 3/4/5 now returns empty string → PANE_PID stays
unset → no sidecar is spawned for tests where heartbeat is not under test.
- MOSAIC_HEARTBEAT_RUN_DIR is exported to a per-test mktemp dir in each
fake-tmux test (3, 4, 5) as defence-in-depth so even if the sidecar
code path changes, it can never write to the real fleet run dir.
- New temp dirs are registered in CLEANUP_DIRS so they are removed by the
existing EXIT trap.
- Tests 6 and 7 (the dedicated heartbeat tests) are unchanged: test 6 uses
a real tmux pane PID + its own HB_RUN_DIR, test 7 intercepts via a fake
setsid shim that captures args and exits immediately.
- All 7 tests pass; verify-sanitized.sh passes; no stray sidecar processes
or unexpected .hb files are written to ~/.config/mosaic/fleet/run.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01RMoEx7hfdFGjUiCHuN1RRi
Replace the terminal `exec tmux` with a plain `tmux new-session -d` so the
launcher continues running after creating the pane. The script then resolves
the pane PID via `tmux list-panes -F '#{pane_pid}'` (with a brief retry loop)
and spawns a detached, runtime-agnostic heartbeat sidecar via `setsid bash -c
... &` + `disown`. The sidecar loops while `kill -0 <pane_pid>` succeeds,
writing ~/.config/mosaic/fleet/run/<AGENT>.hb atomically (tmp + mv) every
MOSAIC_HEARTBEAT_INTERVAL seconds (default 15), then exits naturally when the
runtime process dies — making `mosaic fleet ps` show stale then dead.
HB_RUN_DIR and interval are configurable via env; sidecar startup is
best-effort (failures warn but do not abort the launch). Two new shell tests
cover pane-PID resolution (test 6, real tmux) and sidecar invocation
correctness (test 7, fake-tmux + fake-setsid shims).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01RMoEx7hfdFGjUiCHuN1RRi
Fresh `mosaic gateway install` (npm) left the gateway DB schema empty —
sign-in 500'd with `relation "users" does not exist`, and every entry
point (auth, bootstrap setup) failed because they all query the users
table first. Five stacked bugs on the local (PGlite) tier:
1. `packages/db/package.json` `files: ["dist"]` excluded the `drizzle/`
SQL migrations from the published tarball.
2. `runMigrations()` only supports postgres-js — unusable for embedded
PGlite.
3. `apps/gateway/src/database/database.module.ts` never invoked
migrations at startup.
4. `createPgliteDb` didn't load pgvector, so migration 0001's
`CREATE EXTENSION vector` failed.
5. Drizzle's PG migrator wraps every migration in one outer
transaction, which trips Postgres' `check_safe_enum_use` on
migration 0009 (`ALTER TYPE ADD VALUE 'pending'` → `SET DEFAULT
'pending'` in the same tx).
Changes:
- Ship `drizzle/` in the published tarball.
- `createPgliteDb` loads `@electric-sql/pglite/vector`.
- New `runPgliteMigrations(handle)` walks the Drizzle journal and
runs each statement-breakpoint chunk through PGlite's `client.exec()`
(autocommit per statement). Records into `drizzle.__drizzle_migrations`
for interop with the postgres-js path. Per-statement try/catch
surfaces which statement of which migration failed.
- `DatabaseModule` runs migrations in `OnModuleInit` before
`app.listen()`. Local tier: explicit `runPgliteMigrations` then
`storageAdapter.migrate()`. Postgres tier: just `storageAdapter.migrate()`,
which already calls `runMigrations(url)` internally — no double-call.
- Removed `packages/storage/src/test-utils/pglite-with-vector.ts`. The
"intentionally not exported" rationale is moot now that migration
0001 forces pgvector load anyway. The integration test uses
`createPgliteDb` + `runPgliteMigrations` from `@mosaicstack/db`.
Tests: BetterAuth tables exist after migrate; idempotent (re-runs 0009);
partial-failure surfaces statement-level context and leaves no ledger row.
QA on a fresh PGlite install:
- `Applying PGlite schema migrations...` then `Initializing storage
adapter (pglite)...` in startup log.
- `GET /api/bootstrap/status` → `{"needsSetup":true}` HTTP 200 (was 500).
- `POST /api/bootstrap/setup` reaches Zod validator (was 500).
Scope: this PR fixes the local (PGlite) tier. Postgres-tier first
install still has the outer-transaction problem and a journal ordering
bug (0009's `when` < 0008's). Documented inline as TODO and in the
scratchpad — needs a separate change with real-Postgres validation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>