- DB client: configure connection pool (max=20, idle_timeout=30s, connect_timeout=5s) - DB schema: add missing indexes for auth sessions, accounts, conversations, agent_logs - DB schema: promote preferences(user_id,key) to UNIQUE index for ON CONFLICT upsert - Drizzle migration: 0003_p8003_perf_indexes.sql - preferences.service: replace 2-query SELECT+INSERT/UPDATE with single-round-trip upsert - conversations repo: add ORDER BY + LIMIT to findAll (200) and findMessages (500) - session-gc.service: make onModuleInit fire-and-forget (removes cold-start TTFB block) - next.config.ts: enable compress, productionBrowserSourceMaps:false, image avif/webp - docs/PERFORMANCE.md: full profiling report and change impact notes
7.8 KiB
Performance Optimization — P8-003
Branch: feat/p8-003-performance
Target metrics: <200 ms TTFB, <2 s page loads
What Was Profiled
The following areas were reviewed through static analysis and code-path tracing (no production traffic available; findings are based on measurable code-level patterns):
| Area | Findings |
|---|---|
packages/db |
Connection pool unbounded (default 10, no idle/connect timeout) |
apps/gateway/src/preferences |
N+1 round-trip on every pref upsert (SELECT + INSERT/UPDATE) |
packages/brain/src/conversations |
Unbounded list queries — no LIMIT or ORDER BY |
packages/db/src/schema |
Missing hot-path indexes: auth session lookup, OAuth callback, conversation list, agent-log tier queries |
apps/gateway/src/gc |
Cold-start GC blocked NestJS bootstrap (synchronous await in onModuleInit) |
apps/web/next.config.ts |
Missing compress: true, no productionBrowserSourceMaps: false, no image format config |
Changes Made
1. DB Connection Pool — packages/db/src/client.ts
Problem: postgres() was called with no pool config. The default max of 10 connections
and no idle/connect timeouts meant the pool could hang indefinitely on a stale TCP connection.
Fix:
max: 20 connections (configurable viaDB_POOL_MAX)idle_timeout: 30 s (configurable viaDB_IDLE_TIMEOUT) — recycle stale connectionsconnect_timeout: 5 s (configurable viaDB_CONNECT_TIMEOUT) — fail fast on unreachable DB
Expected impact: Eliminates pool exhaustion under moderate concurrency; removes indefinite hangs when the DB is temporarily unreachable.
2. Preferences Upsert — apps/gateway/src/preferences/preferences.service.ts
Problem: upsertPref executed two serial DB round-trips on every preference write:
1. SELECT id FROM preferences WHERE user_id = ? AND key = ? (→ check exists)
2a. UPDATE preferences SET value = ? … (→ if found)
2b. INSERT INTO preferences … (→ if not found)
Under concurrency this also had a TOCTOU race window.
Fix: Replaced with single-statement INSERT … ON CONFLICT DO UPDATE:
INSERT INTO preferences (user_id, key, value, mutable)
VALUES (?, ?, ?, true)
ON CONFLICT (user_id, key) DO UPDATE SET value = excluded.value, updated_at = now();
This required promoting preferences_user_key_idx from a plain index to a UNIQUE INDEX
(see migration 0003_p8003_perf_indexes.sql).
Expected impact: ~50% reduction in DB round-trips for preference writes; eliminates the race window.
3. Missing DB Indexes — packages/db/src/schema.ts + migration
The following indexes were added or replaced to cover common query patterns:
| Table | Old indexes | New / changed |
|---|---|---|
sessions |
(none) | sessions_user_id_idx(user_id), sessions_expires_at_idx(expires_at) |
accounts |
(none) | accounts_provider_account_idx(provider_id, account_id), accounts_user_id_idx(user_id) |
conversations |
(user_id), (archived) separate |
conversations_user_archived_idx(user_id, archived) compound |
agent_logs |
(session_id), (tier), (created_at) separate |
agent_logs_session_tier_idx(session_id, tier), agent_logs_tier_created_at_idx(tier, created_at) |
preferences |
non-unique (user_id, key) |
unique (user_id, key) — required for ON CONFLICT |
Expected impact:
- Auth session validation (hot path on every request): from seq scan → index scan
- OAuth callback account lookup: from seq scan → index scan
- Conversation list (dashboard load): compound index covers
WHERE user_id = ? ORDER BY updated_at - Log summarisation cron:
(tier, created_at)index enables efficient hot→warm promotion query
All changes are in packages/db/drizzle/0003_p8003_perf_indexes.sql.
4. Conversation Queries — packages/brain/src/conversations.ts
Problem: findAll(userId) and findMessages(conversationId) were unbounded — no LIMIT
and findAll had no ORDER BY, so the DB planner may not use the index efficiently.
Fix:
findAll:ORDER BY updated_at DESC LIMIT 200— returns most-recent conversations firstfindMessages:ORDER BY created_at ASC LIMIT 500— chronological message history
Expected impact: Prevents accidental full-table scans on large datasets; ensures the frontend receives a usable, ordered result set regardless of table growth.
5. Cold-Start GC — apps/gateway/src/gc/session-gc.service.ts
Problem: onModuleInit() was async and await-ed fullCollect(), which blocked the
NestJS module initialization chain. Full GC — which calls redis.keys('mosaic:session:*') and
a DB query — typically takes 100–500 ms. This directly added to startup TTFB.
Fix: Made onModuleInit() synchronous and used .then().catch() to run GC in the
background. The first HTTP request is no longer delayed by GC work.
Expected impact: Removes 100–500 ms from cold-start TTFB.
6. Next.js Config — apps/web/next.config.ts
Problem: compress: true was not set, so response payloads were uncompressed. No image
format optimization or source-map suppression was configured.
Fix:
compress: true— enables gzip/brotli for all Next.js responsesproductionBrowserSourceMaps: false— reduces build output sizeimages.formats: ['image/avif', 'image/webp']— Next.js Image component will serve modern formats to browsers that support them (typically 40–60% smaller than JPEG/PNG)
Expected impact: Typical HTML/JSON gzip savings of 60–80%; image serving cost reduced
for any <Image> components added in the future.
What Was Not Changed (Intentionally)
- Caching layer (Valkey/Redis): The
SystemOverrideServiceand GC already use Redis pipelines.PreferencesService.getEffective()reads all user prefs in one query — this is appropriate for the data size and doesn't warrant an additional cache layer yet. - WebSocket backpressure: The
ChatGatewayalready drops events for disconnected clients (client.connectedcheck) and cleans up listeners on disconnect. No memory leak was found. - Plugin/skill loader startup:
SkillLoaderService.loadForSession()is called on first session creation, not on startup. Already non-blocking. - Frontend React memoization: No specific hot components were identified as causing
excessive re-renders without profiling data. No speculative
memo()calls added.
How to Apply
# Run the DB migration (requires a live DB)
pnpm --filter @mosaic/db exec drizzle-kit migrate
# Or, in Docker/Swarm — migrations run automatically on gateway startup
# via runMigrations() in packages/db/src/migrate.ts
Generated by P8-003 performance optimization task — 2026-03-18