docs(federation): close FED-M1 milestone
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful

- TASKS.md: mark FED-M1-12 done with PR/issue/tag references
- MISSION-MANIFEST.md: phase=M1 complete, progress 1/7, M1 row done with PR range #470-#481, session log appended
- scratchpad: Session 19 entry covering M1-09 → M1-12 with PR ledger and M1 retrospective learnings

Refs #460
This commit is contained in:
Jarvis
2026-04-19 21:12:52 -05:00
parent 78841f228a
commit b9fb8aab57
3 changed files with 128 additions and 31 deletions

View File

@@ -429,3 +429,97 @@ Side change: `packages/storage/package.json` gained `"type": "module"` (codebase
- #8: confirm `packages/config/dist` not git-tracked
**Next:** FED-M1-09 — standalone regression e2e (haiku canary, ~4K). Verifies that the existing `standalone` tier behavior still works end-to-end on the federation-touched build, since M1 changes touched shared paths (storage, config, gateway boot).
---
## Session 19 — 2026-04-19 — FED-M1-09 → FED-M1-12 (M1 close)
**Branches landed this run:** `feat/federation-m1-regression` (PR #478, M1-09), `feat/federation-m1-security-review` (PR #479, M1-10), `feat/federation-m1-docs` (PR #480, M1-11), `feat/federation-m1-close` (PR #481, M1-12)
**Branch active at end:** none — M1 closed, all branches deleted, issue #460 closed, release tag `fed-v0.1.0-m1` published
**M1 progress:** 12 of 12 tasks done. **Milestone complete.**
### FED-M1-09 — Standalone regression canary
Verification-only milestone. Re-ran the existing standalone/local test suites against current `main` (with M1-01 → M1-08 merged):
- 4 target gateway test files: 148/148 pass (conversation-persistence, cross-user-isolation, resource-ownership, session-hardening)
- Full gateway suite: 351 pass, 4 skipped (FEDERATED_INTEGRATION-gated only)
- Storage unit tests: 85 pass, 1 skipped (integration-gated)
- Top-level `pnpm test`: all green; only env-gated skips
No regression in standalone or local tier. Federation M1 changes are non-disruptive.
### FED-M1-10 — Security review (two rounds, 7 findings)
Independent security review surfaced three high-impact and four medium findings; all fixed in same PR.
**Round 1 (4 findings):**
- MEDIUM: Credential leak via `postgres`/`ioredis` driver error messages (DSN strings) re-thrown by `migrate-tier.ts` → caller; `cli.ts:402` outer catch
- MEDIUM: Same leak in `tier-detection.ts` `probePostgresMeasured` / `probePgvectorMeasured` → emitted as JSON by `mosaic gateway doctor --json`
- LOW-MEDIUM: No advisory lock on `migrate-tier`; two concurrent invocations could both pass `checkTargetPreconditions` (non-atomic) and race
- ADVISORY: `SKIP_TABLES` lacked rationale comment
**Fixes:**
- New internal helper `packages/storage/src/redact-error.ts` — regex `(postgres(?:ql)?|rediss?):\/\/[^@\s]*@``<scheme>://***@`. NOT exported from package public surface. 10 unit tests covering all schemes, multi-URL, no-creds, case-insensitive.
- `redactErrMsg` applied at all 5 leak sites
- `PostgresMigrationTarget.tryAcquireAdvisoryLock()` / `releaseAdvisoryLock()` using session-scoped `pg_try_advisory_lock(hashtext('mosaic-migrate-tier'))`. Acquired before preflight, released in `finally`. Dry-run skips. Non-blocking.
- `SKIP_TABLES` comment expanded with rationale for skipped tables (TTL'd / one-time / env-bound) AND why `accounts` (OAuth) and `provider_credentials` (AI keys) are intentionally migrated (durable user-bound, not deployment-bound).
**Round 2 (3 findings missed by first round):**
- HIGH: Round 1 regex only covered `postgres` scheme, not `redis`/`rediss` — extended to `(postgres(?:ql)?|rediss?)`
- HIGH: `probeValkeyMeasured` was missed in Round 1 → applied `redactErrMsg`
- MEDIUM: `cli.ts:402` migrate-tier outer catch was missed in Round 1 → applied `redactErrMsg`
**Process validation:** the two-round review pattern proved load-bearing for security work. A single review-then-fix cycle would have shipped the Valkey credential leak.
### FED-M1-11 — Docs (haiku)
- `docs/federation/SETUP.md` (119 lines): federated tier setup — what it is, prerequisites, docker compose start, mosaic.config.json snippet, doctor health check, troubleshooting
- `docs/guides/migrate-tier.md` (147 lines): when to migrate, dry-run first, what migrates/skips with rationale, idempotency + advisory-lock semantics, no in-place rollback
- `README.md` Configuration blurb linking to both
- Runbook deferred to FED-M7 per TASKS.md scope rule
### FED-M1-12 — Aggregate close (this PR)
- Marked M1-12 done in TASKS.md
- MISSION-MANIFEST.md: phase → "M1 complete", progress 1/7, M1 row done with PR range #470-#481, session log appended
- This Session 19 entry added
- Issue #460 closed via `~/.config/mosaic/tools/git/issue-close.sh -i 460`
- Release tag `fed-v0.1.0-m1` created and pushed to gitea
### M1 PR ledger
| PR | Task | Branch |
| ---- | ----------------------------------------- | ---------------------------------- |
| #470 | M1-01 (tier config schema) | feat/federation-m1-tier-config |
| #471 | M1-02 (compose overlay) | feat/federation-m1-compose |
| #472 | M1-03 (pgvector adapter) | feat/federation-m1-pgvector |
| #473 | M1-04 (tier-detector) | feat/federation-m1-detector |
| #474 | M1-05 (migrate-tier script) | feat/federation-m1-migrate |
| #475 | M1-06 (gateway doctor) | feat/federation-m1-doctor |
| #476 | M1-07 (boot integration tests) | feat/federation-m1-integration |
| #477 | M1-08 (migrate integration test + P0 fix) | feat/federation-m1-migrate-test |
| #478 | M1-09 (standalone regression) | feat/federation-m1-regression |
| #479 | M1-10 (security review fixes) | feat/federation-m1-security-review |
| #480 | M1-11 (docs) | feat/federation-m1-docs |
| #481 | M1-12 (aggregate close) | feat/federation-m1-close |
### Process learnings (M1 retrospective)
1. **Two-round security review is non-negotiable for security work.** First round caught postgres credential leaks; second round caught equivalent valkey leaks the worker missed when extending the regex. Single-round would have shipped HIGH severity issues.
2. **Real-services integration tests catch what mocked unit tests cannot.** M1-08 caught a P0 in M1-05 (camelCase column names) that 32 mocked unit tests missed because both source and target were mocked. Going forward: at least one real-services test per code-mutating PR where feasible.
3. **Test-utils for live services co-locate with consumer, not in shared library.** M1-08 reviewer caught `createPgliteDbWithVector` initially being added to `@mosaicstack/db` public exports — would have polluted prod consumers with WASM bundle. Moved to `packages/storage/src/test-utils/`.
4. **Per-task budgets including tests/review/docs more accurate than PRD's implementation-only estimates.** M1 PRD estimated 20K; actual ~74K. Future milestones should budget the full delivery cycle.
5. **TASKS.md status updates ride feature branches, never direct-to-main.** Caught one violation early in M1; pattern held for all 12 tasks.
6. **Subagent tier matters.** Code review needs sonnet-level reasoning (haiku missed deep issues in M1-04); claim verification (line counts, file existence) is fine on haiku.
**Followup tasks still deferred (carry forward to M2):**
- #7: `tier=local` hardcoded in gateway-config resume branches (~262, ~317)
- #8: confirm `packages/config/dist` not git-tracked
**Next mission step:** FED-M2 (Step-CA + grant schema + admin CLI). Per TASKS.md scope rule, M2 will be decomposed when it enters active planning. Issue #461 tracks scope.