fix(db): bootstrap migrations on local-tier gateway startup
Fresh `mosaic gateway install` (npm) left the gateway DB schema empty —
sign-in 500'd with `relation "users" does not exist`, and every entry
point (auth, bootstrap setup) failed because they all query the users
table first. Five stacked bugs on the local (PGlite) tier:
1. `packages/db/package.json` `files: ["dist"]` excluded the `drizzle/`
SQL migrations from the published tarball.
2. `runMigrations()` only supports postgres-js — unusable for embedded
PGlite.
3. `apps/gateway/src/database/database.module.ts` never invoked
migrations at startup.
4. `createPgliteDb` didn't load pgvector, so migration 0001's
`CREATE EXTENSION vector` failed.
5. Drizzle's PG migrator wraps every migration in one outer
transaction, which trips Postgres' `check_safe_enum_use` on
migration 0009 (`ALTER TYPE ADD VALUE 'pending'` → `SET DEFAULT
'pending'` in the same tx).
Changes:
- Ship `drizzle/` in the published tarball.
- `createPgliteDb` loads `@electric-sql/pglite/vector`.
- New `runPgliteMigrations(handle)` walks the Drizzle journal and
runs each statement-breakpoint chunk through PGlite's `client.exec()`
(autocommit per statement). Records into `drizzle.__drizzle_migrations`
for interop with the postgres-js path. Per-statement try/catch
surfaces which statement of which migration failed.
- `DatabaseModule` runs migrations in `OnModuleInit` before
`app.listen()`. Local tier: explicit `runPgliteMigrations` then
`storageAdapter.migrate()`. Postgres tier: just `storageAdapter.migrate()`,
which already calls `runMigrations(url)` internally — no double-call.
- Removed `packages/storage/src/test-utils/pglite-with-vector.ts`. The
"intentionally not exported" rationale is moot now that migration
0001 forces pgvector load anyway. The integration test uses
`createPgliteDb` + `runPgliteMigrations` from `@mosaicstack/db`.
Tests: BetterAuth tables exist after migrate; idempotent (re-runs 0009);
partial-failure surfaces statement-level context and leaves no ledger row.
QA on a fresh PGlite install:
- `Applying PGlite schema migrations...` then `Initializing storage
adapter (pglite)...` in startup log.
- `GET /api/bootstrap/status` → `{"needsSetup":true}` HTTP 200 (was 500).
- `POST /api/bootstrap/setup` reaches Zod validator (was 500).
Scope: this PR fixes the local (PGlite) tier. Postgres-tier first
install still has the outer-transaction problem and a journal ordering
bug (0009's `when` < 0008's). Documented inline as TODO and in the
scratchpad — needs a separate change with real-Postgres validation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
125
scratchpads/fix-db-bootstrap-migrations.md
Normal file
125
scratchpads/fix-db-bootstrap-migrations.md
Normal file
@@ -0,0 +1,125 @@
|
||||
# fix(db): bootstrap migrations on local-tier gateway startup
|
||||
|
||||
## Problem
|
||||
|
||||
Fresh `mosaic gateway install` (npm-installed) leaves the gateway DB schema empty:
|
||||
|
||||
```
|
||||
relation "users" does not exist
|
||||
```
|
||||
|
||||
Sign-in 500s, `auth users create` says "Not signed in", `admin/bootstrap setup`
|
||||
also fails — every entry point queries `users` before doing anything else.
|
||||
|
||||
## Scope
|
||||
|
||||
This PR fixes the **local (PGlite) tier** end-to-end. The postgres-tier path
|
||||
has additional pre-existing bugs (see "Known issues, out of scope" below) and
|
||||
needs a separate change with real Postgres validation.
|
||||
|
||||
## Root causes addressed (5 stacked bugs on the local-tier path)
|
||||
|
||||
1. **`packages/db/package.json` `files: ["dist"]`** — the `drizzle/` SQL
|
||||
migrations folder is excluded from the published tarball. Even if a
|
||||
migrate runner existed, it would have nothing to apply.
|
||||
|
||||
2. **`packages/db/src/migrate.ts`** only supports `postgres-js`. Local-tier
|
||||
gateways use embedded PGlite, which can't be reached over a postgres wire
|
||||
protocol — so `runMigrations()` is unusable for the local tier.
|
||||
|
||||
3. **`apps/gateway/src/database/database.module.ts`** never invokes
|
||||
migrations at startup. The module creates the DB handle and storage
|
||||
adapter, but no consumer calls `.migrate()` on either. `mosaic storage
|
||||
migrate` CLI even claims "pglite runs schema setup automatically on first
|
||||
connection via `adapter.migrate()`" — but `adapter.migrate()` is only
|
||||
called by tests, never at runtime.
|
||||
|
||||
4. **`createPgliteDb` does not load the pgvector extension.** Migration 0001
|
||||
declares `CREATE EXTENSION IF NOT EXISTS vector;` for the
|
||||
`insights.embedding` column. Bare PGlite has no pgvector — the migration
|
||||
fails on extension control file lookup.
|
||||
|
||||
5. **Drizzle's PG migrator wraps every migration in one outer transaction.**
|
||||
Migration 0009 does `ALTER TYPE grant_status ADD VALUE 'pending'` and then
|
||||
`ALTER TABLE federation_grants ALTER COLUMN status SET DEFAULT 'pending'`.
|
||||
Postgres' `check_safe_enum_use` rejects the second statement because the
|
||||
new enum value isn't committed yet. Splitting the migration into two
|
||||
files doesn't help — drizzle batches all migrations into one outer tx.
|
||||
|
||||
## Fix
|
||||
|
||||
- `packages/db/package.json` — ship `drizzle/` in `files`.
|
||||
- `packages/db/src/client-pglite.ts` — load `@electric-sql/pglite/vector`.
|
||||
- `packages/db/src/migrate.ts` — add `runPgliteMigrations(handle)`. Walks the
|
||||
Drizzle journal and runs each statement-breakpoint chunk through PGlite's
|
||||
`client.exec()` (Simple Query protocol → autocommit per statement). Writes
|
||||
to the standard `drizzle.__drizzle_migrations` ledger so the result is
|
||||
interoperable with `runMigrations()` on a postgres-backed deployment.
|
||||
Per-statement try/catch surfaces which statement of which migration failed
|
||||
and the ledger row is only written on full success.
|
||||
- `packages/db/src/index.ts` — re-export.
|
||||
- `apps/gateway/src/database/database.module.ts` — implement `OnModuleInit`:
|
||||
- Local tier → `runPgliteMigrations(handle)`, then `storageAdapter.migrate()`
|
||||
(the local storage adapter has its own kv tables in a separate PGlite dir).
|
||||
- Postgres tier → `storageAdapter.migrate()` only, since
|
||||
`PostgresAdapter.migrate()` already calls `runMigrations(url)` against
|
||||
the same DATABASE_URL — we deliberately don't double-call.
|
||||
|
||||
NestJS awaits `onModuleInit` before `app.listen()`, so DB-dependent modules
|
||||
see a populated schema before any HTTP traffic is accepted.
|
||||
|
||||
- `packages/storage/src/test-utils/pglite-with-vector.ts` — **deleted**.
|
||||
The "intentionally not exported" rationale is moot now that migration 0001
|
||||
forces pgvector load anyway. `migrate-tier.integration.test.ts` switched
|
||||
to `createPgliteDb` + `runPgliteMigrations` from `@mosaicstack/db`.
|
||||
|
||||
## Tests
|
||||
|
||||
`packages/db/src/migrate.test.ts`:
|
||||
|
||||
- Verifies `runPgliteMigrations` creates the BetterAuth tables (the original
|
||||
failure mode).
|
||||
- Idempotence (transitively re-runs migration 0009).
|
||||
- Partial-failure: pre-creates a conflicting `users` table, asserts the
|
||||
thrown error includes statement context (`hash=… statement #N failed`)
|
||||
and that no ledger row was written.
|
||||
|
||||
## QA evidence
|
||||
|
||||
End-to-end on a fresh PGlite install:
|
||||
|
||||
- `[DatabaseModule] Applying PGlite schema migrations...` then
|
||||
`Initializing storage adapter (pglite)...` in startup log.
|
||||
- `GET /api/bootstrap/status` → `{"needsSetup":true}` HTTP 200 (was 500
|
||||
with `relation "users" does not exist`).
|
||||
- `POST /api/bootstrap/setup` with empty body → HTTP 400 with Zod
|
||||
validation error (was 500), confirming the request reached the
|
||||
validator past the table-existence check.
|
||||
|
||||
## Known issues, out of scope (file separately)
|
||||
|
||||
- **Postgres-tier first install is still broken.** `runMigrations()` uses
|
||||
Drizzle's `migratePostgres`, which has the same outer-transaction problem
|
||||
as PGlite's migrator. A fresh standalone-tier install would also fail at
|
||||
migration 0009. Inline TODO in `migrate.ts:31-35` flags this. Fixing it
|
||||
needs either (a) a shared per-statement loop reused for both drivers, or
|
||||
(b) splitting migration 0009.
|
||||
- **`drizzle/meta/_journal.json` has 0009 ordered before 0008** (`when`
|
||||
values `1745280000000` < `1776822435828`). `migratePostgres` skips by
|
||||
`created_at < folderMillis`, so on a postgres deployment that already
|
||||
applied 0008, 0009 would be skipped forever. Our hash-based skip in the
|
||||
PGlite path sidesteps this.
|
||||
- **No advisory lock around the migration loop.** Two gateway processes
|
||||
pointed at the same DATABASE_URL would race. PGlite is single-process by
|
||||
file lock so the local tier is fine; postgres-tier deployments should add
|
||||
`pg_advisory_lock(<deterministic-id>)` around the loop in a follow-up.
|
||||
- **`mosaic storage migrate` CLI message is misleading** — it claims
|
||||
"automatic on first connection via adapter.migrate()" but the adapter
|
||||
doesn't self-migrate. With this PR the gateway invokes it explicitly, but
|
||||
the CLI message could still be tightened.
|
||||
- **Crash mid-migration leaves a partial-state PGlite DB without a ledger
|
||||
row.** Detected loudly on next boot (the replay errors on "already
|
||||
exists"), but recovery is manual (drop the partially-applied objects or
|
||||
insert the migration hash into `drizzle.__drizzle_migrations`). A robust
|
||||
fix would add a "started_at" column to a sidecar table to detect
|
||||
half-applied state and refuse to start with actionable guidance.
|
||||
Reference in New Issue
Block a user