Files
stack/docs/PERFORMANCE.md
Jarvis 774b76447d
Some checks failed
ci/woodpecker/pr/ci Pipeline failed
ci/woodpecker/push/ci Pipeline failed
fix: rename all packages from @mosaic/* to @mosaicstack/*
- Updated all package.json name fields and dependency references
- Updated all TypeScript/JavaScript imports
- Updated .woodpecker/publish.yml filters and registry paths
- Updated tools/install.sh scope default
- Updated .npmrc registry paths (worktree + host)
- Enhanced update-checker.ts with checkForAllUpdates() multi-package support
- Updated CLI update command to show table of all packages
- Added KNOWN_PACKAGES, formatAllPackagesTable, getInstallAllCommand
- Marked checkForUpdate() with @deprecated JSDoc

Closes #391
2026-04-04 21:43:23 -05:00

7.8 KiB
Raw Blame History

Performance Optimization — P8-003

Branch: feat/p8-003-performance
Target metrics: <200 ms TTFB, <2 s page loads


What Was Profiled

The following areas were reviewed through static analysis and code-path tracing (no production traffic available; findings are based on measurable code-level patterns):

Area Findings
packages/db Connection pool unbounded (default 10, no idle/connect timeout)
apps/gateway/src/preferences N+1 round-trip on every pref upsert (SELECT + INSERT/UPDATE)
packages/brain/src/conversations Unbounded list queries — no LIMIT or ORDER BY
packages/db/src/schema Missing hot-path indexes: auth session lookup, OAuth callback, conversation list, agent-log tier queries
apps/gateway/src/gc Cold-start GC blocked NestJS bootstrap (synchronous await in onModuleInit)
apps/web/next.config.ts Missing compress: true, no productionBrowserSourceMaps: false, no image format config

Changes Made

1. DB Connection Pool — packages/db/src/client.ts

Problem: postgres() was called with no pool config. The default max of 10 connections and no idle/connect timeouts meant the pool could hang indefinitely on a stale TCP connection.

Fix:

  • max: 20 connections (configurable via DB_POOL_MAX)
  • idle_timeout: 30 s (configurable via DB_IDLE_TIMEOUT) — recycle stale connections
  • connect_timeout: 5 s (configurable via DB_CONNECT_TIMEOUT) — fail fast on unreachable DB

Expected impact: Eliminates pool exhaustion under moderate concurrency; removes indefinite hangs when the DB is temporarily unreachable.


2. Preferences Upsert — apps/gateway/src/preferences/preferences.service.ts

Problem: upsertPref executed two serial DB round-trips on every preference write:

1. SELECT id FROM preferences WHERE user_id = ? AND key = ?  (→ check exists)
2a. UPDATE preferences SET value = ? …                        (→ if found)
2b. INSERT INTO preferences …                                 (→ if not found)

Under concurrency this also had a TOCTOU race window.

Fix: Replaced with single-statement INSERT … ON CONFLICT DO UPDATE:

INSERT INTO preferences (user_id, key, value, mutable)
VALUES (?, ?, ?, true)
ON CONFLICT (user_id, key) DO UPDATE SET value = excluded.value, updated_at = now();

This required promoting preferences_user_key_idx from a plain index to a UNIQUE INDEX (see migration 0003_p8003_perf_indexes.sql).

Expected impact: ~50% reduction in DB round-trips for preference writes; eliminates the race window.


3. Missing DB Indexes — packages/db/src/schema.ts + migration

The following indexes were added or replaced to cover common query patterns:

Table Old indexes New / changed
sessions (none) sessions_user_id_idx(user_id), sessions_expires_at_idx(expires_at)
accounts (none) accounts_provider_account_idx(provider_id, account_id), accounts_user_id_idx(user_id)
conversations (user_id), (archived) separate conversations_user_archived_idx(user_id, archived) compound
agent_logs (session_id), (tier), (created_at) separate agent_logs_session_tier_idx(session_id, tier), agent_logs_tier_created_at_idx(tier, created_at)
preferences non-unique (user_id, key) unique (user_id, key) — required for ON CONFLICT

Expected impact:

  • Auth session validation (hot path on every request): from seq scan → index scan
  • OAuth callback account lookup: from seq scan → index scan
  • Conversation list (dashboard load): compound index covers WHERE user_id = ? ORDER BY updated_at
  • Log summarisation cron: (tier, created_at) index enables efficient hot→warm promotion query

All changes are in packages/db/drizzle/0003_p8003_perf_indexes.sql.


4. Conversation Queries — packages/brain/src/conversations.ts

Problem: findAll(userId) and findMessages(conversationId) were unbounded — no LIMIT and findAll had no ORDER BY, so the DB planner may not use the index efficiently.

Fix:

  • findAll: ORDER BY updated_at DESC LIMIT 200 — returns most-recent conversations first
  • findMessages: ORDER BY created_at ASC LIMIT 500 — chronological message history

Expected impact: Prevents accidental full-table scans on large datasets; ensures the frontend receives a usable, ordered result set regardless of table growth.


5. Cold-Start GC — apps/gateway/src/gc/session-gc.service.ts

Problem: onModuleInit() was async and await-ed fullCollect(), which blocked the NestJS module initialization chain. Full GC — which calls redis.keys('mosaic:session:*') and a DB query — typically takes 100500 ms. This directly added to startup TTFB.

Fix: Made onModuleInit() synchronous and used .then().catch() to run GC in the background. The first HTTP request is no longer delayed by GC work.

Expected impact: Removes 100500 ms from cold-start TTFB.


6. Next.js Config — apps/web/next.config.ts

Problem: compress: true was not set, so response payloads were uncompressed. No image format optimization or source-map suppression was configured.

Fix:

  • compress: true — enables gzip/brotli for all Next.js responses
  • productionBrowserSourceMaps: false — reduces build output size
  • images.formats: ['image/avif', 'image/webp'] — Next.js Image component will serve modern formats to browsers that support them (typically 4060% smaller than JPEG/PNG)

Expected impact: Typical HTML/JSON gzip savings of 6080%; image serving cost reduced for any <Image> components added in the future.


What Was Not Changed (Intentionally)

  • Caching layer (Valkey/Redis): The SystemOverrideService and GC already use Redis pipelines. PreferencesService.getEffective() reads all user prefs in one query — this is appropriate for the data size and doesn't warrant an additional cache layer yet.
  • WebSocket backpressure: The ChatGateway already drops events for disconnected clients (client.connected check) and cleans up listeners on disconnect. No memory leak was found.
  • Plugin/skill loader startup: SkillLoaderService.loadForSession() is called on first session creation, not on startup. Already non-blocking.
  • Frontend React memoization: No specific hot components were identified as causing excessive re-renders without profiling data. No speculative memo() calls added.

How to Apply

# Run the DB migration (requires a live DB)
pnpm --filter @mosaicstack/db exec drizzle-kit migrate

# Or, in Docker/Swarm — migrations run automatically on gateway startup
# via runMigrations() in packages/db/src/migrate.ts

Generated by P8-003 performance optimization task — 2026-03-18