Files
stack/docs/PERFORMANCE.md
Jason Woltje 3b81bc9f3d perf: gateway + DB + frontend optimizations (P8-003)
- DB client: configure connection pool (max=20, idle_timeout=30s, connect_timeout=5s)
- DB schema: add missing indexes for auth sessions, accounts, conversations, agent_logs
- DB schema: promote preferences(user_id,key) to UNIQUE index for ON CONFLICT upsert
- Drizzle migration: 0003_p8003_perf_indexes.sql
- preferences.service: replace 2-query SELECT+INSERT/UPDATE with single-round-trip upsert
- conversations repo: add ORDER BY + LIMIT to findAll (200) and findMessages (500)
- session-gc.service: make onModuleInit fire-and-forget (removes cold-start TTFB block)
- next.config.ts: enable compress, productionBrowserSourceMaps:false, image avif/webp
- docs/PERFORMANCE.md: full profiling report and change impact notes
2026-03-18 21:26:45 -05:00

165 lines
7.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Performance Optimization — P8-003
**Branch:** `feat/p8-003-performance`
**Target metrics:** <200 ms TTFB, <2 s page loads
---
## What Was Profiled
The following areas were reviewed through static analysis and code-path tracing
(no production traffic available; findings are based on measurable code-level patterns):
| Area | Findings |
| ---------------------------------- | -------------------------------------------------------------------------------------------------------- |
| `packages/db` | Connection pool unbounded (default 10, no idle/connect timeout) |
| `apps/gateway/src/preferences` | N+1 round-trip on every pref upsert (SELECT + INSERT/UPDATE) |
| `packages/brain/src/conversations` | Unbounded list queries — no `LIMIT` or `ORDER BY` |
| `packages/db/src/schema` | Missing hot-path indexes: auth session lookup, OAuth callback, conversation list, agent-log tier queries |
| `apps/gateway/src/gc` | Cold-start GC blocked NestJS bootstrap (synchronous `await` in `onModuleInit`) |
| `apps/web/next.config.ts` | Missing `compress: true`, no `productionBrowserSourceMaps: false`, no image format config |
---
## Changes Made
### 1. DB Connection Pool — `packages/db/src/client.ts`
**Problem:** `postgres()` was called with no pool config. The default max of 10 connections
and no idle/connect timeouts meant the pool could hang indefinitely on a stale TCP connection.
**Fix:**
- `max`: 20 connections (configurable via `DB_POOL_MAX`)
- `idle_timeout`: 30 s (configurable via `DB_IDLE_TIMEOUT`) — recycle stale connections
- `connect_timeout`: 5 s (configurable via `DB_CONNECT_TIMEOUT`) — fail fast on unreachable DB
**Expected impact:** Eliminates pool exhaustion under moderate concurrency; removes indefinite
hangs when the DB is temporarily unreachable.
---
### 2. Preferences Upsert — `apps/gateway/src/preferences/preferences.service.ts`
**Problem:** `upsertPref` executed two serial DB round-trips on every preference write:
```
1. SELECT id FROM preferences WHERE user_id = ? AND key = ? (→ check exists)
2a. UPDATE preferences SET value = ? … (→ if found)
2b. INSERT INTO preferences … (→ if not found)
```
Under concurrency this also had a TOCTOU race window.
**Fix:** Replaced with single-statement `INSERT … ON CONFLICT DO UPDATE`:
```sql
INSERT INTO preferences (user_id, key, value, mutable)
VALUES (?, ?, ?, true)
ON CONFLICT (user_id, key) DO UPDATE SET value = excluded.value, updated_at = now();
```
This required promoting `preferences_user_key_idx` from a plain index to a `UNIQUE INDEX`
(see migration `0003_p8003_perf_indexes.sql`).
**Expected impact:** ~50% reduction in DB round-trips for preference writes; eliminates
the race window.
---
### 3. Missing DB Indexes — `packages/db/src/schema.ts` + migration
The following indexes were added or replaced to cover common query patterns:
| Table | Old indexes | New / changed |
| --------------- | ------------------------------------------------- | --------------------------------------------------------------------------------------------------- |
| `sessions` | _(none)_ | `sessions_user_id_idx(user_id)`, `sessions_expires_at_idx(expires_at)` |
| `accounts` | _(none)_ | `accounts_provider_account_idx(provider_id, account_id)`, `accounts_user_id_idx(user_id)` |
| `conversations` | `(user_id)`, `(archived)` separate | `conversations_user_archived_idx(user_id, archived)` compound |
| `agent_logs` | `(session_id)`, `(tier)`, `(created_at)` separate | `agent_logs_session_tier_idx(session_id, tier)`, `agent_logs_tier_created_at_idx(tier, created_at)` |
| `preferences` | non-unique `(user_id, key)` | **unique** `(user_id, key)` — required for `ON CONFLICT` |
**Expected impact:**
- Auth session validation (hot path on every request): from seq scan → index scan
- OAuth callback account lookup: from seq scan → index scan
- Conversation list (dashboard load): compound index covers `WHERE user_id = ? ORDER BY updated_at`
- Log summarisation cron: `(tier, created_at)` index enables efficient hot→warm promotion query
All changes are in `packages/db/drizzle/0003_p8003_perf_indexes.sql`.
---
### 4. Conversation Queries — `packages/brain/src/conversations.ts`
**Problem:** `findAll(userId)` and `findMessages(conversationId)` were unbounded — no `LIMIT`
and `findAll` had no `ORDER BY`, so the DB planner may not use the index efficiently.
**Fix:**
- `findAll`: `ORDER BY updated_at DESC LIMIT 200` — returns most-recent conversations first
- `findMessages`: `ORDER BY created_at ASC LIMIT 500` — chronological message history
**Expected impact:** Prevents accidental full-table scans on large datasets; ensures the
frontend receives a usable, ordered result set regardless of table growth.
---
### 5. Cold-Start GC — `apps/gateway/src/gc/session-gc.service.ts`
**Problem:** `onModuleInit()` was `async` and `await`-ed `fullCollect()`, which blocked the
NestJS module initialization chain. Full GC — which calls `redis.keys('mosaic:session:*')` and
a DB query — typically takes 100500 ms. This directly added to startup TTFB.
**Fix:** Made `onModuleInit()` synchronous and used `.then().catch()` to run GC in the
background. The first HTTP request is no longer delayed by GC work.
**Expected impact:** Removes 100500 ms from cold-start TTFB.
---
### 6. Next.js Config — `apps/web/next.config.ts`
**Problem:** `compress: true` was not set, so response payloads were uncompressed. No image
format optimization or source-map suppression was configured.
**Fix:**
- `compress: true` — enables gzip/brotli for all Next.js responses
- `productionBrowserSourceMaps: false` — reduces build output size
- `images.formats: ['image/avif', 'image/webp']` — Next.js Image component will serve modern
formats to browsers that support them (typically 4060% smaller than JPEG/PNG)
**Expected impact:** Typical HTML/JSON gzip savings of 6080%; image serving cost reduced
for any `<Image>` components added in the future.
---
## What Was Not Changed (Intentionally)
- **Caching layer (Valkey/Redis):** The `SystemOverrideService` and GC already use Redis
pipelines. `PreferencesService.getEffective()` reads all user prefs in one query — this
is appropriate for the data size and doesn't warrant an additional cache layer yet.
- **WebSocket backpressure:** The `ChatGateway` already drops events for disconnected clients
(`client.connected` check) and cleans up listeners on disconnect. No memory leak was found.
- **Plugin/skill loader startup:** `SkillLoaderService.loadForSession()` is called on first
session creation, not on startup. Already non-blocking.
- **Frontend React memoization:** No specific hot components were identified as causing
excessive re-renders without profiling data. No speculative `memo()` calls added.
---
## How to Apply
```bash
# Run the DB migration (requires a live DB)
pnpm --filter @mosaic/db exec drizzle-kit migrate
# Or, in Docker/Swarm — migrations run automatically on gateway startup
# via runMigrations() in packages/db/src/migrate.ts
```
---
_Generated by P8-003 performance optimization task — 2026-03-18_