feat(gateway): tier-detector with fail-fast PG/Valkey/pgvector probes (FED-M1-04)

Implements `apps/gateway/src/bootstrap/tier-detector.ts` invoked from
`main.ts` before NestJS bootstraps. For each tier:

- `local`: no-op (PGlite is in-process)
- `standalone`: probe Postgres + Valkey
- `federated`: probe Postgres + Valkey + pgvector extension; reject
  config upfront if `queue.type !== 'bullmq'`

Each probe has a 5-second hard cap and emits a structured
`TierDetectionError` with service / host / port / remediation. The
remediation field discriminates pgvector failure modes ("library not
available" vs "permission denied") so operators get actionable hints
without leaking credentials.

Adds `postgres` and `ioredis` as direct gateway deps; previously only
transitive. 12 unit tests cover happy paths and each fail-fast branch.

Refs #460
This commit is contained in:
Jarvis
2026-04-19 19:02:12 -05:00
parent 58169f9979
commit 7383380f64
7 changed files with 626 additions and 2 deletions

View File

@@ -343,3 +343,39 @@ Affected files (storage-tier semantics only — Team/workspace usages unaffected
- `MVP-T04` (sync `.mosaic/orchestrator/mission.json`) still deferred.
- `team` tier rename touches install wizard headless env vars (`MOSAIC_STORAGE_TIER=team`); will need 0.0.x deprecation note in scratchpad if release notes are written this milestone.
---
## Session 17 — 2026-04-19 — claude
**Mode:** Delivery (W1 / FED-M1 execution; resumed after compaction)
**Branches landed this run:** `feat/federation-m1-tier-config` (PR #470), `feat/federation-m1-compose` (PR #471), `feat/federation-m1-pgvector` (PR #472)
**Branch active at end:** `feat/federation-m1-detector` (FED-M1-04, ready to push)
**Tasks closed:** FED-M1-01, FED-M1-02, FED-M1-03 (all merged to `main` via squash, CI green, issue #460 still open as milestone).
**FED-M1-04 — tier-detector:** Worker delivered `apps/gateway/src/bootstrap/tier-detector.ts` (~210 lines) + `tier-detector.spec.ts` (12 tests). Independent code review (sonnet) returned `changes-required` with 3 issues:
1. CRITICAL: `probeValkey` missing `connectTimeout: 5000` on the ioredis Redis client (defaulted to 10s, violated fail-fast spec).
2. IMPORTANT: `probePgvector` catch block did not discriminate "library not installed" (use `pgvector/pgvector:pg17`) from permission errors.
3. IMPORTANT: Federated tier silently skipped Valkey probe when `queue.type !== 'bullmq'` (computed Valkey URL conditionally).
Worker fix-up round addressed all three:
- L147: `connectTimeout: 5000` added to Redis options
- L113-117: catch block branches on `extension "vector" is not available` substring → distinct remediation per failure mode
- L206-215: federated branch fails fast with `service: 'config'` if `queue.type !== 'bullmq'`, then probes Valkey unconditionally
- 4 new tests (8 → 12 total) cover each fix specifically
Independent verifier (haiku) confirmed all 6 verification claims (line numbers, test presence, suite green: 12/12 PASS).
**Process note — review pipeline working as designed:**
Initial verifier (haiku) on the first delivery returned "OK to ship" but missed the 3 deeper issues that the sonnet code-reviewer caught. This validates the user's "always verify subagent claims independently with another subagent" rule — but specifically with the **right tier** for the task: code review needs sonnet-level reasoning, while haiku is fine for verifying surface claims (line counts, file existence) once review issues are known. Going forward: code review uses sonnet (`feature-dev:code-reviewer`), claim verification uses haiku.
**Followup tasks tracked but deferred:**
- #7: `tier=local` hardcoded in gateway-config resume branches (~262, ~317) — pre-existing bug, fix during M1-06 (doctor) or M1-09 (regression).
- #8: confirm `packages/config/dist` not git-tracked.
**Next:** PR for FED-M1-04 → CI wait → merge. Then FED-M1-05 (migration script, codex/sonnet, 10K).