Fresh `mosaic gateway install` (npm) left the gateway DB schema empty —
sign-in 500'd with `relation "users" does not exist`, and every entry
point (auth, bootstrap setup) failed because they all query the users
table first. Five stacked bugs on the local (PGlite) tier:
1. `packages/db/package.json` `files: ["dist"]` excluded the `drizzle/`
SQL migrations from the published tarball.
2. `runMigrations()` only supports postgres-js — unusable for embedded
PGlite.
3. `apps/gateway/src/database/database.module.ts` never invoked
migrations at startup.
4. `createPgliteDb` didn't load pgvector, so migration 0001's
`CREATE EXTENSION vector` failed.
5. Drizzle's PG migrator wraps every migration in one outer
transaction, which trips Postgres' `check_safe_enum_use` on
migration 0009 (`ALTER TYPE ADD VALUE 'pending'` → `SET DEFAULT
'pending'` in the same tx).
Changes:
- Ship `drizzle/` in the published tarball.
- `createPgliteDb` loads `@electric-sql/pglite/vector`.
- New `runPgliteMigrations(handle)` walks the Drizzle journal and
runs each statement-breakpoint chunk through PGlite's `client.exec()`
(autocommit per statement). Records into `drizzle.__drizzle_migrations`
for interop with the postgres-js path. Per-statement try/catch
surfaces which statement of which migration failed.
- `DatabaseModule` runs migrations in `OnModuleInit` before
`app.listen()`. Local tier: explicit `runPgliteMigrations` then
`storageAdapter.migrate()`. Postgres tier: just `storageAdapter.migrate()`,
which already calls `runMigrations(url)` internally — no double-call.
- Removed `packages/storage/src/test-utils/pglite-with-vector.ts`. The
"intentionally not exported" rationale is moot now that migration
0001 forces pgvector load anyway. The integration test uses
`createPgliteDb` + `runPgliteMigrations` from `@mosaicstack/db`.
Tests: BetterAuth tables exist after migrate; idempotent (re-runs 0009);
partial-failure surfaces statement-level context and leaves no ledger row.
QA on a fresh PGlite install:
- `Applying PGlite schema migrations...` then `Initializing storage
adapter (pglite)...` in startup log.
- `GET /api/bootstrap/status` → `{"needsSetup":true}` HTTP 200 (was 500).
- `POST /api/bootstrap/setup` reaches Zod validator (was 500).
Scope: this PR fixes the local (PGlite) tier. Postgres-tier first
install still has the outer-transaction problem and a journal ordering
bug (0009's `when` < 0008's). Documented inline as TODO and in the
scratchpad — needs a separate change with real-Postgres validation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
126 lines
6.2 KiB
Markdown
126 lines
6.2 KiB
Markdown
# fix(db): bootstrap migrations on local-tier gateway startup
|
|
|
|
## Problem
|
|
|
|
Fresh `mosaic gateway install` (npm-installed) leaves the gateway DB schema empty:
|
|
|
|
```
|
|
relation "users" does not exist
|
|
```
|
|
|
|
Sign-in 500s, `auth users create` says "Not signed in", `admin/bootstrap setup`
|
|
also fails — every entry point queries `users` before doing anything else.
|
|
|
|
## Scope
|
|
|
|
This PR fixes the **local (PGlite) tier** end-to-end. The postgres-tier path
|
|
has additional pre-existing bugs (see "Known issues, out of scope" below) and
|
|
needs a separate change with real Postgres validation.
|
|
|
|
## Root causes addressed (5 stacked bugs on the local-tier path)
|
|
|
|
1. **`packages/db/package.json` `files: ["dist"]`** — the `drizzle/` SQL
|
|
migrations folder is excluded from the published tarball. Even if a
|
|
migrate runner existed, it would have nothing to apply.
|
|
|
|
2. **`packages/db/src/migrate.ts`** only supports `postgres-js`. Local-tier
|
|
gateways use embedded PGlite, which can't be reached over a postgres wire
|
|
protocol — so `runMigrations()` is unusable for the local tier.
|
|
|
|
3. **`apps/gateway/src/database/database.module.ts`** never invokes
|
|
migrations at startup. The module creates the DB handle and storage
|
|
adapter, but no consumer calls `.migrate()` on either. `mosaic storage
|
|
migrate` CLI even claims "pglite runs schema setup automatically on first
|
|
connection via `adapter.migrate()`" — but `adapter.migrate()` is only
|
|
called by tests, never at runtime.
|
|
|
|
4. **`createPgliteDb` does not load the pgvector extension.** Migration 0001
|
|
declares `CREATE EXTENSION IF NOT EXISTS vector;` for the
|
|
`insights.embedding` column. Bare PGlite has no pgvector — the migration
|
|
fails on extension control file lookup.
|
|
|
|
5. **Drizzle's PG migrator wraps every migration in one outer transaction.**
|
|
Migration 0009 does `ALTER TYPE grant_status ADD VALUE 'pending'` and then
|
|
`ALTER TABLE federation_grants ALTER COLUMN status SET DEFAULT 'pending'`.
|
|
Postgres' `check_safe_enum_use` rejects the second statement because the
|
|
new enum value isn't committed yet. Splitting the migration into two
|
|
files doesn't help — drizzle batches all migrations into one outer tx.
|
|
|
|
## Fix
|
|
|
|
- `packages/db/package.json` — ship `drizzle/` in `files`.
|
|
- `packages/db/src/client-pglite.ts` — load `@electric-sql/pglite/vector`.
|
|
- `packages/db/src/migrate.ts` — add `runPgliteMigrations(handle)`. Walks the
|
|
Drizzle journal and runs each statement-breakpoint chunk through PGlite's
|
|
`client.exec()` (Simple Query protocol → autocommit per statement). Writes
|
|
to the standard `drizzle.__drizzle_migrations` ledger so the result is
|
|
interoperable with `runMigrations()` on a postgres-backed deployment.
|
|
Per-statement try/catch surfaces which statement of which migration failed
|
|
and the ledger row is only written on full success.
|
|
- `packages/db/src/index.ts` — re-export.
|
|
- `apps/gateway/src/database/database.module.ts` — implement `OnModuleInit`:
|
|
- Local tier → `runPgliteMigrations(handle)`, then `storageAdapter.migrate()`
|
|
(the local storage adapter has its own kv tables in a separate PGlite dir).
|
|
- Postgres tier → `storageAdapter.migrate()` only, since
|
|
`PostgresAdapter.migrate()` already calls `runMigrations(url)` against
|
|
the same DATABASE_URL — we deliberately don't double-call.
|
|
|
|
NestJS awaits `onModuleInit` before `app.listen()`, so DB-dependent modules
|
|
see a populated schema before any HTTP traffic is accepted.
|
|
|
|
- `packages/storage/src/test-utils/pglite-with-vector.ts` — **deleted**.
|
|
The "intentionally not exported" rationale is moot now that migration 0001
|
|
forces pgvector load anyway. `migrate-tier.integration.test.ts` switched
|
|
to `createPgliteDb` + `runPgliteMigrations` from `@mosaicstack/db`.
|
|
|
|
## Tests
|
|
|
|
`packages/db/src/migrate.test.ts`:
|
|
|
|
- Verifies `runPgliteMigrations` creates the BetterAuth tables (the original
|
|
failure mode).
|
|
- Idempotence (transitively re-runs migration 0009).
|
|
- Partial-failure: pre-creates a conflicting `users` table, asserts the
|
|
thrown error includes statement context (`hash=… statement #N failed`)
|
|
and that no ledger row was written.
|
|
|
|
## QA evidence
|
|
|
|
End-to-end on a fresh PGlite install:
|
|
|
|
- `[DatabaseModule] Applying PGlite schema migrations...` then
|
|
`Initializing storage adapter (pglite)...` in startup log.
|
|
- `GET /api/bootstrap/status` → `{"needsSetup":true}` HTTP 200 (was 500
|
|
with `relation "users" does not exist`).
|
|
- `POST /api/bootstrap/setup` with empty body → HTTP 400 with Zod
|
|
validation error (was 500), confirming the request reached the
|
|
validator past the table-existence check.
|
|
|
|
## Known issues, out of scope (file separately)
|
|
|
|
- **Postgres-tier first install is still broken.** `runMigrations()` uses
|
|
Drizzle's `migratePostgres`, which has the same outer-transaction problem
|
|
as PGlite's migrator. A fresh standalone-tier install would also fail at
|
|
migration 0009. Inline TODO in `migrate.ts:31-35` flags this. Fixing it
|
|
needs either (a) a shared per-statement loop reused for both drivers, or
|
|
(b) splitting migration 0009.
|
|
- **`drizzle/meta/_journal.json` has 0009 ordered before 0008** (`when`
|
|
values `1745280000000` < `1776822435828`). `migratePostgres` skips by
|
|
`created_at < folderMillis`, so on a postgres deployment that already
|
|
applied 0008, 0009 would be skipped forever. Our hash-based skip in the
|
|
PGlite path sidesteps this.
|
|
- **No advisory lock around the migration loop.** Two gateway processes
|
|
pointed at the same DATABASE_URL would race. PGlite is single-process by
|
|
file lock so the local tier is fine; postgres-tier deployments should add
|
|
`pg_advisory_lock(<deterministic-id>)` around the loop in a follow-up.
|
|
- **`mosaic storage migrate` CLI message is misleading** — it claims
|
|
"automatic on first connection via adapter.migrate()" but the adapter
|
|
doesn't self-migrate. With this PR the gateway invokes it explicitly, but
|
|
the CLI message could still be tightened.
|
|
- **Crash mid-migration leaves a partial-state PGlite DB without a ledger
|
|
row.** Detected loudly on next boot (the replay errors on "already
|
|
exists"), but recovery is manual (drop the partially-applied objects or
|
|
insert the migration hash into `drizzle.__drizzle_migrations`). A robust
|
|
fix would add a "started_at" column to a sidecar table to detect
|
|
half-applied state and refuse to start with actionable guidance.
|