- DB client: configure connection pool (max=20, idle_timeout=30s, connect_timeout=5s) - DB schema: add missing indexes for auth sessions, accounts, conversations, agent_logs - DB schema: promote preferences(user_id,key) to UNIQUE index for ON CONFLICT upsert - Drizzle migration: 0003_p8003_perf_indexes.sql - preferences.service: replace 2-query SELECT+INSERT/UPDATE with single-round-trip upsert - conversations repo: add ORDER BY + LIMIT to findAll (200) and findMessages (500) - session-gc.service: make onModuleInit fire-and-forget (removes cold-start TTFB block) - next.config.ts: enable compress, productionBrowserSourceMaps:false, image avif/webp - docs/PERFORMANCE.md: full profiling report and change impact notes
165 lines
7.8 KiB
Markdown
165 lines
7.8 KiB
Markdown
# Performance Optimization — P8-003
|
||
|
||
**Branch:** `feat/p8-003-performance`
|
||
**Target metrics:** <200 ms TTFB, <2 s page loads
|
||
|
||
---
|
||
|
||
## What Was Profiled
|
||
|
||
The following areas were reviewed through static analysis and code-path tracing
|
||
(no production traffic available; findings are based on measurable code-level patterns):
|
||
|
||
| Area | Findings |
|
||
| ---------------------------------- | -------------------------------------------------------------------------------------------------------- |
|
||
| `packages/db` | Connection pool unbounded (default 10, no idle/connect timeout) |
|
||
| `apps/gateway/src/preferences` | N+1 round-trip on every pref upsert (SELECT + INSERT/UPDATE) |
|
||
| `packages/brain/src/conversations` | Unbounded list queries — no `LIMIT` or `ORDER BY` |
|
||
| `packages/db/src/schema` | Missing hot-path indexes: auth session lookup, OAuth callback, conversation list, agent-log tier queries |
|
||
| `apps/gateway/src/gc` | Cold-start GC blocked NestJS bootstrap (synchronous `await` in `onModuleInit`) |
|
||
| `apps/web/next.config.ts` | Missing `compress: true`, no `productionBrowserSourceMaps: false`, no image format config |
|
||
|
||
---
|
||
|
||
## Changes Made
|
||
|
||
### 1. DB Connection Pool — `packages/db/src/client.ts`
|
||
|
||
**Problem:** `postgres()` was called with no pool config. The default max of 10 connections
|
||
and no idle/connect timeouts meant the pool could hang indefinitely on a stale TCP connection.
|
||
|
||
**Fix:**
|
||
|
||
- `max`: 20 connections (configurable via `DB_POOL_MAX`)
|
||
- `idle_timeout`: 30 s (configurable via `DB_IDLE_TIMEOUT`) — recycle stale connections
|
||
- `connect_timeout`: 5 s (configurable via `DB_CONNECT_TIMEOUT`) — fail fast on unreachable DB
|
||
|
||
**Expected impact:** Eliminates pool exhaustion under moderate concurrency; removes indefinite
|
||
hangs when the DB is temporarily unreachable.
|
||
|
||
---
|
||
|
||
### 2. Preferences Upsert — `apps/gateway/src/preferences/preferences.service.ts`
|
||
|
||
**Problem:** `upsertPref` executed two serial DB round-trips on every preference write:
|
||
|
||
```
|
||
1. SELECT id FROM preferences WHERE user_id = ? AND key = ? (→ check exists)
|
||
2a. UPDATE preferences SET value = ? … (→ if found)
|
||
2b. INSERT INTO preferences … (→ if not found)
|
||
```
|
||
|
||
Under concurrency this also had a TOCTOU race window.
|
||
|
||
**Fix:** Replaced with single-statement `INSERT … ON CONFLICT DO UPDATE`:
|
||
|
||
```sql
|
||
INSERT INTO preferences (user_id, key, value, mutable)
|
||
VALUES (?, ?, ?, true)
|
||
ON CONFLICT (user_id, key) DO UPDATE SET value = excluded.value, updated_at = now();
|
||
```
|
||
|
||
This required promoting `preferences_user_key_idx` from a plain index to a `UNIQUE INDEX`
|
||
(see migration `0003_p8003_perf_indexes.sql`).
|
||
|
||
**Expected impact:** ~50% reduction in DB round-trips for preference writes; eliminates
|
||
the race window.
|
||
|
||
---
|
||
|
||
### 3. Missing DB Indexes — `packages/db/src/schema.ts` + migration
|
||
|
||
The following indexes were added or replaced to cover common query patterns:
|
||
|
||
| Table | Old indexes | New / changed |
|
||
| --------------- | ------------------------------------------------- | --------------------------------------------------------------------------------------------------- |
|
||
| `sessions` | _(none)_ | `sessions_user_id_idx(user_id)`, `sessions_expires_at_idx(expires_at)` |
|
||
| `accounts` | _(none)_ | `accounts_provider_account_idx(provider_id, account_id)`, `accounts_user_id_idx(user_id)` |
|
||
| `conversations` | `(user_id)`, `(archived)` separate | `conversations_user_archived_idx(user_id, archived)` compound |
|
||
| `agent_logs` | `(session_id)`, `(tier)`, `(created_at)` separate | `agent_logs_session_tier_idx(session_id, tier)`, `agent_logs_tier_created_at_idx(tier, created_at)` |
|
||
| `preferences` | non-unique `(user_id, key)` | **unique** `(user_id, key)` — required for `ON CONFLICT` |
|
||
|
||
**Expected impact:**
|
||
|
||
- Auth session validation (hot path on every request): from seq scan → index scan
|
||
- OAuth callback account lookup: from seq scan → index scan
|
||
- Conversation list (dashboard load): compound index covers `WHERE user_id = ? ORDER BY updated_at`
|
||
- Log summarisation cron: `(tier, created_at)` index enables efficient hot→warm promotion query
|
||
|
||
All changes are in `packages/db/drizzle/0003_p8003_perf_indexes.sql`.
|
||
|
||
---
|
||
|
||
### 4. Conversation Queries — `packages/brain/src/conversations.ts`
|
||
|
||
**Problem:** `findAll(userId)` and `findMessages(conversationId)` were unbounded — no `LIMIT`
|
||
and `findAll` had no `ORDER BY`, so the DB planner may not use the index efficiently.
|
||
|
||
**Fix:**
|
||
|
||
- `findAll`: `ORDER BY updated_at DESC LIMIT 200` — returns most-recent conversations first
|
||
- `findMessages`: `ORDER BY created_at ASC LIMIT 500` — chronological message history
|
||
|
||
**Expected impact:** Prevents accidental full-table scans on large datasets; ensures the
|
||
frontend receives a usable, ordered result set regardless of table growth.
|
||
|
||
---
|
||
|
||
### 5. Cold-Start GC — `apps/gateway/src/gc/session-gc.service.ts`
|
||
|
||
**Problem:** `onModuleInit()` was `async` and `await`-ed `fullCollect()`, which blocked the
|
||
NestJS module initialization chain. Full GC — which calls `redis.keys('mosaic:session:*')` and
|
||
a DB query — typically takes 100–500 ms. This directly added to startup TTFB.
|
||
|
||
**Fix:** Made `onModuleInit()` synchronous and used `.then().catch()` to run GC in the
|
||
background. The first HTTP request is no longer delayed by GC work.
|
||
|
||
**Expected impact:** Removes 100–500 ms from cold-start TTFB.
|
||
|
||
---
|
||
|
||
### 6. Next.js Config — `apps/web/next.config.ts`
|
||
|
||
**Problem:** `compress: true` was not set, so response payloads were uncompressed. No image
|
||
format optimization or source-map suppression was configured.
|
||
|
||
**Fix:**
|
||
|
||
- `compress: true` — enables gzip/brotli for all Next.js responses
|
||
- `productionBrowserSourceMaps: false` — reduces build output size
|
||
- `images.formats: ['image/avif', 'image/webp']` — Next.js Image component will serve modern
|
||
formats to browsers that support them (typically 40–60% smaller than JPEG/PNG)
|
||
|
||
**Expected impact:** Typical HTML/JSON gzip savings of 60–80%; image serving cost reduced
|
||
for any `<Image>` components added in the future.
|
||
|
||
---
|
||
|
||
## What Was Not Changed (Intentionally)
|
||
|
||
- **Caching layer (Valkey/Redis):** The `SystemOverrideService` and GC already use Redis
|
||
pipelines. `PreferencesService.getEffective()` reads all user prefs in one query — this
|
||
is appropriate for the data size and doesn't warrant an additional cache layer yet.
|
||
- **WebSocket backpressure:** The `ChatGateway` already drops events for disconnected clients
|
||
(`client.connected` check) and cleans up listeners on disconnect. No memory leak was found.
|
||
- **Plugin/skill loader startup:** `SkillLoaderService.loadForSession()` is called on first
|
||
session creation, not on startup. Already non-blocking.
|
||
- **Frontend React memoization:** No specific hot components were identified as causing
|
||
excessive re-renders without profiling data. No speculative `memo()` calls added.
|
||
|
||
---
|
||
|
||
## How to Apply
|
||
|
||
```bash
|
||
# Run the DB migration (requires a live DB)
|
||
pnpm --filter @mosaic/db exec drizzle-kit migrate
|
||
|
||
# Or, in Docker/Swarm — migrations run automatically on gateway startup
|
||
# via runMigrations() in packages/db/src/migrate.ts
|
||
```
|
||
|
||
---
|
||
|
||
_Generated by P8-003 performance optimization task — 2026-03-18_
|