diff --git a/docs/federation/MILESTONES.md b/docs/federation/MILESTONES.md new file mode 100644 index 0000000..e9d9ba3 --- /dev/null +++ b/docs/federation/MILESTONES.md @@ -0,0 +1,368 @@ +# Mosaic Stack — Federation Implementation Milestones + +**Companion to:** `PRD.md` +**Approach:** Each milestone is a verifiable slice. A milestone is "done" only when its acceptance tests pass in CI against a real (not mocked) dependency stack. + +--- + +## Milestone Dependency Graph + +``` +M1 (federated tier infra) + └── M2 (Step-CA + grant schema + CLI) + └── M3 (mTLS handshake + list/get + scope enforcement) + ├── M4 (search + audit + rate limit) + │ └── M5 (cache + offline degradation + OTEL) + ├── M6 (revocation + auto-renewal) ◄── can start after M3 + └── M7 (multi-user hardening + e2e suite) ◄── depends on M4+M5+M6 +``` + +M5 and M6 can run in parallel once M4 is merged. + +--- + +## Test Strategy (applies to all milestones) + +Three layers, all required before a milestone ships: + +| Layer | Scope | Runtime | +| ------------------ | --------------------------------------------- | ------------------------------------------------------------------------ | +| **Unit** | Per-module logic, pure functions, adapters | Vitest, no I/O | +| **Integration** | Single gateway against real PG/Valkey/Step-CA | Vitest + Docker Compose test profile | +| **Federation E2E** | Two gateways on a Docker network, real mTLS | Playwright/custom harness (`tools/federation-harness/`) introduced in M3 | + +Every milestone adds tests to these layers. A milestone cannot be claimed complete if the federation E2E harness fails (applies from M3 onward). + +**Quality gates per milestone** (same as stack-wide): + +- `pnpm typecheck` green +- `pnpm lint` green +- `pnpm test` green (unit + integration) +- `pnpm test:federation` green (M3+) +- Independent code review passed +- Docs updated (`docs/federation/`) +- Merged PR on `main`, CI terminal green, linked issue closed + +--- + +## M1 — Federated Tier Infrastructure + +**Goal:** A gateway can run in `federated` tier with containerized Postgres + Valkey + pgvector, with no federation logic active yet. + +**Scope:** + +- Add `"tier": "federated"` to `mosaic.config.json` schema and validators +- Docker Compose `federated` profile (`docker-compose.federated.yml`) adds: Postgres+pgvector (5433), Valkey (6380), dedicated volumes +- Tier detector in gateway bootstrap: reads config, asserts required services reachable, refuses to start otherwise +- `pgvector` extension installed + verified on startup +- Migration logic: safe upgrade path from `local`/`standalone` → `federated` (data export/import script, one-way) +- `mosaic doctor` reports tier + service health +- Gateway continues to serve as a normal standalone instance (no federation yet) + +**Deliverables:** + +- `mosaic.config.json` schema v2 (tier enum includes `federated`) +- `apps/gateway/src/bootstrap/tier-detector.ts` +- `docker-compose.federated.yml` +- `scripts/migrate-to-federated.ts` +- Updated `mosaic doctor` output +- Updated `packages/storage/src/adapters/postgres.ts` with pgvector support + +**Acceptance tests:** +| # | Test | Layer | +| - | ---------------------------------------------------------------------------------------- | ----------- | +| 1 | Gateway boots in `federated` tier with all services present | Integration | +| 2 | Gateway refuses to boot in `federated` tier when Postgres unreachable (fail-fast, clear) | Integration | +| 3 | `pgvector` extension available in target DB (`SELECT * FROM pg_extension WHERE extname='vector'`) | Integration | +| 4 | Migration script moves a populated `local` (PGlite) instance to `federated` (Postgres) with no data loss | Integration | +| 5 | `mosaic doctor` reports correct tier and all services green | Unit | +| 6 | Existing standalone behavior regression: agent session works end-to-end, no federation references | E2E (single-gateway) | + +**Estimated budget:** ~20K tokens (infra + config + migration script) +**Risk notes:** Pgvector install on existing PG installs is occasionally finicky; test the migration path on a realistic DB snapshot. + +--- + +## M2 — Step-CA + Grant Schema + Admin CLI + +**Goal:** An admin can create a federation grant and its counterparty can enroll. No runtime traffic flows yet. + +**Scope:** + +- Embed Step-CA as a Docker Compose sidecar with a persistent CA volume +- Gateway exposes a short-lived enrollment endpoint (single-use token from the grant) +- DB schema: `federation_grants`, `federation_peers`, `federation_audit_log` (table only, not yet written to) +- Sealed storage for `client_key_pem` using the existing credential sealing key +- Admin CLI: + - `mosaic federation grant create --user --peer --scope ` + - `mosaic federation grant list` + - `mosaic federation grant show ` + - `mosaic federation peer add ` + - `mosaic federation peer list` +- Step-CA signs the cert with SAN OIDs for `grantId` + `subjectUserId` +- Grant status transitions: `pending` → `active` on successful enrollment + +**Deliverables:** + +- `packages/db` migration: three federation tables + enum types +- `apps/gateway/src/federation/ca.service.ts` (Step-CA client) +- `apps/gateway/src/federation/grants.service.ts` +- `apps/gateway/src/federation/enrollment.controller.ts` +- `packages/mosaic/src/commands/federation/` (grant + peer subcommands) +- `docker-compose.federated.yml` adds Step-CA service +- Scope JSON schema + validator + +**Acceptance tests:** +| # | Test | Layer | +| - | ---------------------------------------------------------------------------------------- | ----------- | +| 1 | `grant create` writes a `pending` row with a scoped bundle | Integration | +| 2 | Enrollment endpoint signs a CSR and returns a cert with expected SAN OIDs | Integration | +| 3 | Enrollment token is single-use; second attempt returns 410 | Integration | +| 4 | Cert `subjectUserId` OID matches the grant's `subject_user_id` | Unit | +| 5 | `client_key_pem` is at-rest encrypted; raw DB read shows ciphertext, not PEM | Integration | +| 6 | `peer add ` on Server A yields an `active` peer record with a valid cert + key | E2E (two gateways, no traffic) | +| 7 | Scope JSON with unknown resource type rejected at `grant create` | Unit | +| 8 | `grant list` and `peer list` render active / pending / revoked accurately | Unit | + +**Estimated budget:** ~30K tokens (schema + CA integration + CLI + sealing) +**Risk notes:** Step-CA's API surface is well-documented but the sealing integration with existing provider-credential encryption is a cross-module concern — walk that seam deliberately. + +--- + +## M3 — mTLS Handshake + `list` + `get` with Scope Enforcement + +**Goal:** Two federated gateways exchange real data over mTLS with scope intersecting native RBAC. + +**Scope:** + +- `FederationClient` (outbound): picks cert from `federation_peers`, does mTLS call +- `FederationServer` (inbound): NestJS guard validates client cert, extracts `grantId` + `subjectUserId`, loads grant +- Scope enforcement pipeline: + 1. Resource allowlist / excluded-list check + 2. Native RBAC evaluation as the `subjectUserId` + 3. Scope filter intersection (`include_teams`, `include_personal`) + 4. `max_rows_per_query` cap +- Verbs: `list`, `get`, `capabilities` +- Gateway query layer accepts `source: "local" | "federated:" | "all"`; fan-out for `"all"` +- **Federation E2E harness** (`tools/federation-harness/`): docker-compose.two-gateways.yml, seed script, assertion helpers — this is its own deliverable + +**Deliverables:** + +- `apps/gateway/src/federation/client/federation-client.service.ts` +- `apps/gateway/src/federation/server/federation-auth.guard.ts` +- `apps/gateway/src/federation/server/scope.service.ts` +- `apps/gateway/src/federation/server/verbs/{list,get,capabilities}.controller.ts` +- `apps/gateway/src/federation/client/query-source.service.ts` (fan-out/merge) +- `tools/federation-harness/` (compose + seed + test helpers) +- `packages/types` — federation request/response DTOs in `federation.dto.ts` + +**Acceptance tests:** +| # | Test | Layer | +| -- | -------------------------------------------------------------------------------------------------------- | ----- | +| 1 | A→B `list tasks` returns subjectUser's tasks intersected with scope | E2E | +| 2 | A→B `list tasks` with `include_teams: [T1]` excludes T2 tasks the user owns | E2E | +| 3 | A→B `get credential ` returns 403 when `credentials` is in `excluded_resources` | E2E | +| 4 | Client presenting cert for grant X cannot query subjectUser of grant Y (cross-user isolation) | E2E | +| 5 | Cert signed by untrusted CA rejected at TLS layer (no NestJS handler reached) | E2E | +| 6 | Malformed SAN OIDs → 401; cert valid but grant revoked in DB → 403 | Integration | +| 7 | `max_rows_per_query` caps response; request for more paginated | Integration | +| 8 | `source: "all"` fan-out merges local + federated results, each tagged with `_source` | Integration | +| 9 | Federation responses never persist: verify DB row count unchanged after `list` round-trip | E2E | +| 10 | Scope cannot grant more than native RBAC: user without access to team T still gets [] even if scope allows T | E2E | + +**Estimated budget:** ~40K tokens (largest milestone — core federation logic + harness) +**Risk notes:** This is the critical trust boundary. Code review should focus on scope enforcement bypass and cert-SAN-spoofing paths. Every 403/401 path needs a test. + +--- + +## M4 — `search` Verb + Audit Log + Rate Limit + +**Goal:** Keyword search over allowed resources with full audit and per-grant rate limiting. + +**Scope:** + +- `search` verb across `resources` allowlist (intersection of scope + native RBAC) +- Keyword search (reuse existing `packages/memory/src/adapters/keyword.ts`); pgvector search stays out of v1 search verb +- Every federated request (all verbs) writes to `federation_audit_log`: `grant_id`, `verb`, `resource`, `query_hash`, `outcome`, `bytes_out`, `latency_ms` +- No request body captured; `query_hash` is SHA-256 of normalized query params +- Token-bucket rate limit per grant (default 60/min, override per grant) +- 429 response with `Retry-After` header and structured body +- 90-day hot retention for audit log; cold-tier rollover deferred to M7 + +**Deliverables:** + +- `apps/gateway/src/federation/server/verbs/search.controller.ts` +- `apps/gateway/src/federation/server/audit.service.ts` (async write, no blocking) +- `apps/gateway/src/federation/server/rate-limit.guard.ts` +- Tests in harness + +**Acceptance tests:** +| # | Test | Layer | +| - | ------------------------------------------------------------------------------------------------- | ----------- | +| 1 | `search` returns ranked hits only from allowed resources | E2E | +| 2 | `search` excluding `credentials` does not return a match even when keyword matches a credential name | E2E | +| 3 | Every successful request appears in `federation_audit_log` within 1s | Integration | +| 4 | Denied request (403) is also audited with `outcome='denied'` | Integration | +| 5 | Audit row stores query hash but NOT query body | Unit | +| 6 | 61st request in 60s window returns 429 with `Retry-After` | E2E | +| 7 | Per-grant override (e.g., 600/min) takes effect without restart | Integration | +| 8 | Audit writes are async: request latency unchanged when audit write slow (simulated) | Integration | + +**Estimated budget:** ~20K tokens +**Risk notes:** Ensure audit writes can't block or error-out the request path; use a bounded queue and drop-with-counter pattern rather than in-line writes. + +--- + +## M5 — Cache + Offline Degradation + Observability + +**Goal:** Sessions feel fast and stay useful when the peer is slow or down. + +**Scope:** + +- In-memory response cache keyed by `(grant_id, verb, resource, query_hash)`, TTL 30s default +- Cache NOT used for `search`; only `list` and `get` +- Cache flushed on cert rotation and grant revocation +- Circuit breaker per peer: after N failures, fast-fail for cooldown window +- `_source` tagging extended with `_cached: true` when served from cache +- Agent-visible "federation offline for ``" signal emitted once per session per peer +- OTEL spans: `federation.request` with attrs `grant_id`, `peer`, `verb`, `resource`, `outcome`, `latency_ms`, `cached` +- W3C `traceparent` propagated across the mTLS boundary (both directions) +- `mosaic federation status` CLI subcommand + +**Deliverables:** + +- `apps/gateway/src/federation/client/response-cache.service.ts` +- `apps/gateway/src/federation/client/circuit-breaker.service.ts` +- `apps/gateway/src/federation/observability/` (span helpers) +- `packages/mosaic/src/commands/federation/status.ts` + +**Acceptance tests:** +| # | Test | Layer | +| - | --------------------------------------------------------------------------------------------- | ----- | +| 1 | Two identical `list` calls within 30s: second served from cache, flagged `_cached` | Integration | +| 2 | `search` is never cached: two identical searches both hit the peer | Integration | +| 3 | After grant revocation, peer's cache is flushed immediately | Integration | +| 4 | After N consecutive failures, circuit opens; subsequent requests fail-fast without network call | E2E | +| 5 | Circuit closes after cooldown and next success | E2E | +| 6 | With peer offline, session completes using local data, one "federation offline" signal surfaced | E2E | +| 7 | OTEL traces show spans on both gateways correlated by `traceparent` | E2E | +| 8 | `mosaic federation status` prints peer state, cert expiry, last success/failure, circuit state | Unit | + +**Estimated budget:** ~20K tokens +**Risk notes:** Caching correctness under revocation must be provable — write tests that intentionally race revocation against cached hits. + +--- + +## M6 — Revocation, Auto-Renewal, CRL + +**Goal:** Grant lifecycle works end-to-end: admin revoke, revoke-on-delete, automatic cert renewal, CRL distribution. + +**Scope:** + +- `mosaic federation grant revoke ` → status `revoked`, CRL updated, audit entry +- DB hook: deleting a user cascades `revoke-on-delete` on all grants where that user is subject +- Step-CA CRL endpoint exposed; serving gateway enforces CRL check on every handshake (cached CRL, refresh interval 60s) +- Client-side cert renewal job: at T-7 days, submit renewal CSR; rotate cert atomically; flush cache +- On renewal failure, peer marked `degraded` and admin-visible alert emitted +- Server A detects revocation on next request (TLS handshake fails with specific error) → peer marked `revoked`, user notified + +**Deliverables:** + +- `apps/gateway/src/federation/server/crl.service.ts` + endpoint +- `apps/gateway/src/federation/server/revocation.service.ts` +- DB cascade trigger or ORM hook for user deletion → grant revocation +- `apps/gateway/src/federation/client/renewal.job.ts` (scheduled) +- `packages/mosaic/src/commands/federation/grant.ts` gains `revoke` subcommand + +**Acceptance tests:** +| # | Test | Layer | +| - | ----------------------------------------------------------------------------------------- | ----- | +| 1 | Admin `grant revoke` → A's next request fails with TLS-level error | E2E | +| 2 | Deleting subject user on B auto-revokes all grants where that user was the subject | Integration | +| 3 | CRL endpoint serves correct list; revoked cert present | Integration | +| 4 | Server rejects cert listed in CRL even if cert itself is still time-valid | E2E | +| 5 | Cert at T-7 days triggers renewal job; new cert issued and installed without dropped requests | E2E | +| 6 | Renewal failure marks peer `degraded` and surfaces alert | Integration | +| 7 | A marks peer `revoked` after a revocation-caused handshake failure (not on transient network errors) | E2E | + +**Estimated budget:** ~20K tokens +**Risk notes:** The atomic cert swap during renewal is the sharpest edge here — any in-flight request mid-swap must either complete on old or retry on new, never fail mid-call. + +--- + +## M7 — Multi-User RBAC Hardening + Team-Scoped Grants + Acceptance Suite + +**Goal:** The full multi-tenant scenario from §4 user stories works end-to-end, with no cross-user leakage under any circumstance. + +**Scope:** + +- Three-user scenario on Server B (E1, E2, E3) each with their own Server A +- Team-scoped grants exercised: each employee's team-data visible on their own A, but E1's personal data never visible on E2's A +- User-facing UI surfaces on both gateways for: peer list, grant list, audit log viewer, scope editor +- Negative-path test matrix (every denial path from PRD §8) +- All PRD §15 acceptance criteria mapped to automated tests in the harness +- Security review: cert-spoofing, scope-bypass, audit-bypass paths explicitly tested +- Cold-storage rollover for audit log >90 days +- Docs: operator runbook, onboarding guide, troubleshooting guide + +**Deliverables:** + +- Full federation acceptance suite in `tools/federation-harness/acceptance/` +- `apps/web` surfaces for peer/grant/audit management +- `docs/federation/RUNBOOK.md`, `docs/federation/ONBOARDING.md`, `docs/federation/TROUBLESHOOTING.md` +- Audit cold-tier job (daily cron, moves rows >90d to separate table or object storage) + +**Acceptance tests:** +Every PRD §15 criterion must be automated and green. Additionally: + +| # | Test | Layer | +| --- | ----------------------------------------------------------------------------------------------------- | ---------------- | +| 1 | 3-employee scenario: each A sees only its user's data from B | E2E | +| 2 | Grant with team scope returns team data; same grant denied access to another employee's personal data | E2E | +| 3 | Concurrent sessions from E1's and E2's Server A to B interleave without any leakage | E2E | +| 4 | Audit log across 3-user test shows per-grant trails with no mis-attributed rows | E2E | +| 5 | Scope editor UI round-trip: edit → save → next request uses new scope | E2E | +| 6 | Attempt to use a revoked grant's cert against a different grant's endpoint: rejected | E2E | +| 7 | 90-day-old audit rows moved to cold tier; queryable via explicit historical query | Integration | +| 8 | Runbook steps validated: an operator following the runbook can onboard, rotate, and revoke | Manual checklist | + +**Estimated budget:** ~25K tokens +**Risk notes:** This is the security-critical milestone. Budget review time here is non-negotiable — plan for two independent code reviews (internal + security-focused) before merge. + +--- + +## Total Budget & Timeline Sketch + +| Milestone | Tokens (est.) | Can parallelize? | +| --------- | ------------- | ---------------------- | +| M1 | 20K | No (foundation) | +| M2 | 30K | No (needs M1) | +| M3 | 40K | No (needs M2) | +| M4 | 20K | No (needs M3) | +| M5 | 20K | Yes (with M6 after M4) | +| M6 | 20K | Yes (with M5 after M3) | +| M7 | 25K | No (needs all) | +| **Total** | **~175K** | | + +Parallelization of M5 and M6 after M4 saves one milestone's worth of serial time. + +--- + +## Exit Criteria (federation feature complete) + +All of the following must be green on `main`: + +- Every PRD §15 acceptance criterion automated and passing +- Every milestone's acceptance table green +- Security review sign-off on M7 +- Runbook walk-through completed by operator (not author) +- `mosaic doctor` recognizes federated tier and reports peer health accurately +- Two-gateway production deployment (woltje.com ↔ uscllc.com) operational for ≥7 days without incident + +--- + +## Next Step After This Doc Is Approved + +1. File tracking issues on `git.mosaicstack.dev/mosaicstack/stack` — one per milestone, labeled `epic:federation` +2. Populate `docs/TASKS.md` with M1's task breakdown (per-task agent assignment, budget, dependencies) +3. Begin M1 implementation diff --git a/docs/federation/MISSION-MANIFEST.md b/docs/federation/MISSION-MANIFEST.md new file mode 100644 index 0000000..9ab7869 --- /dev/null +++ b/docs/federation/MISSION-MANIFEST.md @@ -0,0 +1,85 @@ +# Mission Manifest — Federation v1 + +> Persistent document tracking full mission scope, status, and session history. +> Updated by the orchestrator at each phase transition and milestone completion. + +## Mission + +**ID:** federation-v1-20260419 +**Statement:** Jarvis operates across 3–4 workstations in two physical locations (home, USC). The user currently reaches back to a single jarvis-brain checkout from every session; a prior OpenBrain attempt caused cache, latency, and opacity pain. This mission builds asymmetric federation between Mosaic Stack gateways so that a session on a user's home gateway can query their work gateway in real time without data ever persisting across the boundary, with full multi-tenant isolation and standard-PKI (X.509 / Step-CA) trust management. +**Phase:** Planning complete — M1 implementation not started +**Current Milestone:** FED-M1 +**Progress:** 0 / 7 milestones +**Status:** active +**Last Updated:** 2026-04-19 (PRD + MILESTONES + tracking issues filed) +**Parent Mission:** None — new mission + +## Context + +Federation is the solution to what originally drove OpenBrain. The prior attempt coupled every agent session to a remote service, introduced cache/latency/opacity pain, and created a hard dependency that punished offline use. This redesign: + +1. Makes federation **gateway-to-gateway**, not agent-to-service +2. Keeps each user's home instance as source of truth for their data +3. Exposes scoped, read-only data on demand without persisting across the boundary +4. Uses X.509 mTLS via Step-CA so rotation/revocation/CRL/OCSP are standard +5. Supports multi-tenant serving sides (employees on uscllc.com each federating back to their own home gateway) with no cross-user leakage +6. Requires federation-tier instances on both sides (PG + pgvector + Valkey) — local/standalone tiers cannot federate +7. Works over public HTTPS (no VPN required); Tailscale is an optional overlay + +Key design references: + +- `docs/federation/PRD.md` — 16-section product requirements +- `docs/federation/MILESTONES.md` — 7-milestone decomposition with per-milestone acceptance tests +- `docs/federation/TASKS.md` — per-task breakdown (M1 populated; M2-M7 deferred to mission planning) +- `docs/research/mempalace-evaluation/` (in jarvis-brain) — why we didn't adopt MemPalace + +## Success Criteria + +- [ ] AC-1: Two Mosaic Stack gateways on different hosts can establish a federation grant via CLI-driven onboarding +- [ ] AC-2: Server A can query Server B for `tasks`, `notes`, `memory` respecting scope filters +- [ ] AC-3: User on B with no grant cannot be queried by A, even if A has a valid grant for another user (cross-user isolation) +- [ ] AC-4: Revoking a grant on B causes A's next request to fail with a clear error within one request cycle +- [ ] AC-5: Cert rotation happens automatically at T-7 days; in-progress session survives rotation without user action +- [ ] AC-6: Rate-limit enforcement returns 429 with `Retry-After`; client backs off +- [ ] AC-7: With B unreachable, a session on A completes using local data and surfaces "federation offline for ``" once per session +- [ ] AC-8: Every federated request appears in B's `federation_audit_log` within 1 second +- [ ] AC-9: Scope excluding `credentials` means credentials are never returned — even via `search` with matching keywords +- [ ] AC-10: `mosaic federation status` shows cert expiry, grant status, last success/failure per peer +- [ ] AC-11: Full 3-employee multi-tenant scenario passes with no cross-user leakage +- [ ] AC-12: Two-gateway production deployment (woltje.com ↔ uscllc.com) operational ≥7 days without incident +- [ ] AC-13: All 7 milestones ship as merged PRs with green CI and closed issues + +## Milestones + +| # | ID | Name | Status | Branch | Issue | Started | Completed | +| --- | ------ | --------------------------------------------- | ----------- | ------ | ----- | ------- | --------- | +| 1 | FED-M1 | Federated tier infrastructure | not-started | — | #460 | — | — | +| 2 | FED-M2 | Step-CA + grant schema + admin CLI | not-started | — | #461 | — | — | +| 3 | FED-M3 | mTLS handshake + list/get + scope enforcement | not-started | — | #462 | — | — | +| 4 | FED-M4 | search verb + audit log + rate limit | not-started | — | #463 | — | — | +| 5 | FED-M5 | Cache + offline degradation + OTEL | not-started | — | #464 | — | — | +| 6 | FED-M6 | Revocation + auto-renewal + CRL | not-started | — | #465 | — | — | +| 7 | FED-M7 | Multi-user RBAC hardening + acceptance suite | not-started | — | #466 | — | — | + +## Budget + +| Milestone | Est. tokens | Parallelizable? | +| --------- | ----------- | ---------------------- | +| FED-M1 | 20K | No (foundation) | +| FED-M2 | 30K | No (needs M1) | +| FED-M3 | 40K | No (needs M2) | +| FED-M4 | 20K | No (needs M3) | +| FED-M5 | 20K | Yes (with M6 after M4) | +| FED-M6 | 20K | Yes (with M5 after M3) | +| FED-M7 | 25K | No (needs all) | +| **Total** | **~175K** | | + +## Session History + +| Session | Date | Runtime | Outcome | +| ------- | ---------- | ------- | --------------------------------------------------- | +| S1 | 2026-04-19 | claude | PRD authored, MILESTONES decomposed, 7 issues filed | + +## Next Step + +Begin FED-M1 implementation: federated tier infrastructure. Breakdown in `docs/federation/TASKS.md`. diff --git a/docs/federation/PRD.md b/docs/federation/PRD.md new file mode 100644 index 0000000..5643c0c --- /dev/null +++ b/docs/federation/PRD.md @@ -0,0 +1,330 @@ +# Mosaic Stack — Federation PRD + +**Status:** Draft v1 (locked for implementation) +**Owner:** Jason +**Date:** 2026-04-19 +**Scope:** Enables cross-instance data federation between Mosaic Stack gateways with asymmetric trust, multi-tenant scoping, and no cross-boundary data persistence. + +--- + +## 1. Problem Statement + +Jarvis operates across 3–4 workstations in two physical locations (home, USC). The user currently reaches back to a single jarvis-brain checkout from every session, and has tried OpenBrain to solve cross-session state — with poor results (cache invalidation, latency, opacity, hard dependency on a remote service). + +The goal is a federation model where each user's **home instance** remains the source of truth for their personal data, and **work/shared instances** expose scoped data to that user's home instance on demand — without persisting anything across the boundary. + +## 2. Goals + +1. A user logged into their **home gateway** (Server A) can query their **work gateway** (Server B) in real time during a session. +2. Data returned from Server B is used in-session only; never written to Server A storage. +3. Server B has multiple users, each with their own Server A. No user's data leaks to another user. +4. Federation works over public HTTPS (no VPN required). Tailscale is a supported optional overlay. +5. Sync latency target: seconds, or at the next data need of the agent. +6. Graceful degradation: if the remote instance is unreachable, the local session continues with local data and a clear "federation offline" signal. +7. Teams exist on both sides. A federation grant can share **team-owned** data without exposing other team members' personal data. +8. Auth and revocation use standard PKI (X.509) so that certificate tooling (Step-CA, rotation, OCSP, CRL) is available out of the box. + +## 3. Non-Goals (v1) + +- Mesh federation (N-to-N). v1 is strictly A↔B pairs. +- Cross-instance writes. All federation is **read-only** on the remote side. +- Shared agent sessions across instances. Sessions live on one instance; federation is data-plane only. +- Cross-instance SSO. Each instance owns its own BetterAuth identity store; federation is service-to-service, not user-to-user. +- Realtime push from B→A. v1 is pull-only (A pulls from B during a session). +- Global search index. Federation is query-by-query, not index replication. + +## 4. User Stories + +- **US-1 (Solo user at home):** As the sole user on Server A, I want my agent session on workstation-1 to see the same data it saw on workstation-2, without running OpenBrain. +- **US-2 (Cross-location):** As a user with a home server and a work server, I want a session on my home laptop to transparently pull my USC-owned tasks/notes when I ask for them. +- **US-3 (Work admin):** As the admin of mosaic.uscllc.com, I want to grant each employee's home gateway scoped read access to only their own data plus explicitly-shared team data. +- **US-4 (Privacy boundary):** As employee A on mosaic.uscllc.com, my data must never appear in a session on employee B's home gateway — even if both are federated with uscllc.com. +- **US-5 (Revocation):** As a work admin, when I delete an employee, their home gateway loses access within one request cycle. +- **US-6 (Offline):** As a user in a hotel with flaky wifi, my local session keeps working; federation calls fail fast and are reported as "offline," not hung. + +## 5. Architecture Overview + +``` +┌─────────────────────────────────────┐ mTLS / X.509 ┌─────────────────────────────────────┐ +│ Server A — mosaic.woltje.com │ ───────────────────────► │ Server B — mosaic.uscllc.com │ +│ (home, master for Jason) │ ◄── JSON over HTTPS │ (work, multi-tenant) │ +│ │ │ │ +│ ┌──────────────┐ ┌──────────────┐ │ │ ┌──────────────┐ ┌──────────────┐ │ +│ │ Gateway │ │ Postgres │ │ │ │ Gateway │ │ Postgres │ │ +│ │ (NestJS) │──│ (local SSOT)│ │ │ │ (NestJS) │──│ (tenant SSOT)│ │ +│ └──────┬───────┘ └──────────────┘ │ │ └──────┬───────┘ └──────────────┘ │ +│ │ │ │ │ │ +│ │ FederationClient │ │ │ FederationServer │ +│ │ (outbound, scoped query) │ │ │ (inbound, RBAC-gated) │ +│ └───────────────────────────┼──────────────────────────┼────────┘ │ +│ │ │ │ +│ Step-CA (issues A's client cert) │ │ Step-CA (issues B's server cert, │ +│ │ │ trusts A's CA root on grant)│ +└─────────────────────────────────────┘ └──────────────────────────────────────┘ +``` + +- Federation is a **transport-layer** concern between two gateways, implemented as a new internal module on each gateway. +- Both sides run the same code. Direction (client vs. server role) is per-request. +- Nothing in the agent runtime changes — agents query the gateway; the gateway decides local vs. remote. + +## 6. Transport & Authentication + +**Transport:** HTTPS with mutual TLS (mTLS). + +**Identity:** X.509 client certificates issued by Step-CA. Each federation grant materializes as a client cert on the requesting side and a trust-anchor entry (CA root or explicit cert) on the serving side. + +**Why mTLS over HMAC bearer tokens:** + +- Standard rotation/revocation semantics (renew, CRL, OCSP). +- The cert subject carries identity claims (user, grant_id) that don't need a separate DB lookup to verify authenticity. +- Client certs never transit request bodies, so they can't be logged by accident. +- Transport is pinned at the TLS layer, not re-validated per-handler. + +**Cert contents (SAN + subject):** + +- `CN=grant-` +- `O=` (e.g., `mosaic.woltje.com`) +- Custom OIDs embedded in SAN otherName: + - `mosaic.federation.grantId` (UUID) + - `mosaic.federation.subjectUserId` (user on the **serving** side that this grant acts-as) +- Default lifetime: **30 days**, with auto-renewal at T-7 days if the grant is still active. + +**Step-CA topology (v1):** Each server runs its own Step-CA instance. During onboarding, the serving side imports the requesting side's CA root. A central/shared Step-CA is out of scope for v1. + +**Handshake:** + +1. Client (A) opens HTTPS to B with its grant cert. +2. B validates cert chain against trusted CA roots for that grant. +3. B extracts `grantId` and `subjectUserId` from the cert. +4. B loads the grant record, checks it is `active`, not revoked, and not expired. +5. B enforces scope and rate-limit for this grant. +6. Request proceeds; response returned. + +## 7. Data Model + +All tables live on **each instance's own Postgres**. Federation grants are bilateral — each side has a record of the grant. + +### 7.1 `federation_grants` (on serving side, Server B) + +| Field | Type | Notes | +| --------------------------- | ----------- | ------------------------------------------------- | +| `id` | uuid PK | | +| `subject_user_id` | uuid FK | Which local user this grant acts-as | +| `requesting_server` | text | Hostname of requesting gateway (e.g., woltje.com) | +| `requesting_ca_fingerprint` | text | SHA-256 of trusted CA root | +| `active_cert_fingerprint` | text | SHA-256 of currently valid client cert | +| `scope` | jsonb | See §8 | +| `rate_limit_rpm` | int | Default 60 | +| `status` | enum | `pending`, `active`, `suspended`, `revoked` | +| `created_at` | timestamptz | | +| `activated_at` | timestamptz | | +| `revoked_at` | timestamptz | | +| `last_used_at` | timestamptz | | +| `notes` | text | Admin-visible description | + +### 7.2 `federation_peers` (on requesting side, Server A) + +| Field | Type | Notes | +| --------------------- | ----------- | ------------------------------------------------ | +| `id` | uuid PK | | +| `peer_hostname` | text | e.g., `mosaic.uscllc.com` | +| `peer_ca_fingerprint` | text | SHA-256 of peer's CA root | +| `grant_id` | uuid | The grant ID assigned by the peer | +| `local_user_id` | uuid FK | Who on Server A this federation belongs to | +| `client_cert_pem` | text (enc) | Current client cert (PEM); rotated automatically | +| `client_key_pem` | text (enc) | Private key (encrypted at rest) | +| `cert_expires_at` | timestamptz | | +| `status` | enum | `pending`, `active`, `degraded`, `revoked` | +| `last_success_at` | timestamptz | | +| `last_failure_at` | timestamptz | | +| `notes` | text | | + +### 7.3 `federation_audit_log` (on serving side, Server B) + +| Field | Type | Notes | +| ------------- | ----------- | ------------------------------------------------ | +| `id` | uuid PK | | +| `grant_id` | uuid FK | | +| `occurred_at` | timestamptz | indexed | +| `verb` | text | `query`, `handshake`, `rejected`, `rate_limited` | +| `resource` | text | e.g., `tasks`, `notes`, `credentials` | +| `query_hash` | text | SHA-256 of normalized query (no payload stored) | +| `outcome` | text | `ok`, `denied`, `error` | +| `bytes_out` | int | | +| `latency_ms` | int | | + +**Audit policy:** Every federation request is logged on the serving side. Read-only requests only — no body capture. Retention: 90 days hot, then roll to cold storage. + +## 8. RBAC & Scope + +Every federation grant has a scope object that answers three questions for every inbound request: + +1. **Who is acting?** — `subject_user_id` from the cert. +2. **What resources?** — an allowlist of resource types (`tasks`, `notes`, `credentials`, `memory`, `teams/:id/tasks`, …). +3. **Filter expression** — predicates applied on top of the subject's normal RBAC (see below). + +### 8.1 Scope schema + +```json +{ + "resources": ["tasks", "notes", "memory"], + "filters": { + "tasks": { "include_teams": ["team_uuid_1", "team_uuid_2"], "include_personal": true }, + "notes": { "include_personal": true, "include_teams": [] }, + "memory": { "include_personal": true } + }, + "excluded_resources": ["credentials", "api_keys"], + "max_rows_per_query": 500 +} +``` + +### 8.2 Access rule (enforced on serving side) + +For every inbound federated query on resource R: + +1. Resolve effective identity → `subject_user_id`. +2. Check R is in `scope.resources` and NOT in `scope.excluded_resources`. Otherwise 403. +3. Evaluate the user's **normal RBAC** (what would they see if they logged into Server B directly)? +4. Intersect with the scope filter (e.g., only team X, only personal). +5. Apply `max_rows_per_query`. +6. Return; log to audit. + +### 8.3 Team boundary guarantees + +- Scope filters are additive, never subtractive of the native RBAC. A grant cannot grant access the user would not have had themselves. +- `include_teams` means "only these teams," not "these teams in addition to all teams." +- `include_personal: false` hides the user's personal data entirely from federation, even if they own it — useful for work-only accounts. + +### 8.4 No cross-user leakage + +When Server B has multiple users (employees) all federating back to their own Server A: + +- Each employee has their own grant with their own `subject_user_id`. +- The cert is bound to a specific grant; there is no mechanism by which one grant's cert can be used to impersonate another. +- Audit log is per-grant. + +## 9. Query Model + +Federation exposes a **narrow read API**, not arbitrary SQL. + +### 9.1 Supported verbs (v1) + +| Verb | Purpose | Returns | +| -------------- | ------------------------------------------ | ------------------------------- | +| `list` | Paginated list of a resource type | Array of resources | +| `get` | Fetch a single resource by id | One resource or 404 | +| `search` | Keyword search within allowed resources | Ranked list of hits | +| `capabilities` | What this grant is allowed to do right now | Scope object + rate-limit state | + +### 9.2 Not in v1 + +- Write verbs. +- Aggregations / analytics. +- Streaming / subscriptions (future: see §13). + +### 9.3 Agent-facing integration + +Agents never call federation directly. Instead: + +- The gateway query layer accepts `source: "local" | "federated:" | "all"`. +- `"all"` fans out in parallel, merges results, tags each with `_source`. +- Federation results are in-memory only; the gateway does not persist them. + +## 10. Caching + +- **In-memory response cache** with short TTL (default 30s) for `list` and `get`. `search` is not cached. +- Cache is keyed by `(grant_id, verb, resource, query_hash)`. +- Cache is flushed on cert rotation and on grant revocation. +- No disk cache. No cross-session cache. + +## 11. Bootstrap & Onboarding + +### 11.1 Instance capability tiers + +| Tier | Storage | Queue | Memory | Can federate? | +| ------------ | -------- | ------- | -------- | --------------------- | +| `local` | PGlite | in-proc | keyword | No | +| `standalone` | Postgres | Valkey | keyword | No (can be client) | +| `federated` | Postgres | Valkey | pgvector | Yes (server + client) | + +Federation requires `federated` tier on **both** sides. + +### 11.2 Onboarding flow (admin-driven) + +1. Admin on Server B runs `mosaic federation grant create --user --peer --scope-file scope.json`. +2. Server B generates a `grant_id`, prints a one-time enrollment URL containing the grant ID + B's CA root fingerprint. +3. Admin on Server A (or the user themselves, if allowed) runs `mosaic federation peer add `. +4. Server A's Step-CA generates a CSR for the new grant. A submits the CSR to B over a short-lived enrollment endpoint (single-use token in the enrollment URL). +5. B's Step-CA signs the cert (with grant ID embedded in SAN OIDs), returns it. +6. A stores the signed cert + private key (encrypted) in `federation_peers`. +7. Grant status flips from `pending` to `active` on both sides. +8. Cert auto-renews at T-7 days using the standard Step-CA renewal flow as long as the grant remains active. + +### 11.3 Revocation + +- **Admin-initiated:** `mosaic federation grant revoke ` on B flips status to `revoked`, adds the cert to B's CRL, and writes an audit entry. +- **Revoke-on-delete:** Deleting a user on B automatically revokes all grants where that user is the subject. +- Server A learns of revocation on the next request (TLS handshake fails) and flips the peer to `revoked`. + +### 11.4 Rate limit + +Default `60 req/min` per grant. Configurable per grant. Enforced at the serving side. A rate-limited request returns `429` with `Retry-After`. + +## 12. Operational Concerns + +- **Observability:** Each federation request emits an OTEL span with `grant_id`, `peer`, `verb`, `resource`, `outcome`, `latency_ms`. Traces correlate across both servers via W3C traceparent. +- **Health check:** `mosaic federation status` on each side shows active grants, last-success times, cert expirations, and any CRL mismatches. +- **Backpressure:** If the serving side is overloaded, it returns `503` with a structured body; the client marks the peer `degraded` and falls back to local-only until the next successful handshake. +- **Secrets:** `client_key_pem` in `federation_peers` is encrypted with the gateway's key (sealed with the instance's master key — same mechanism as `provider_credentials`). +- **Credentials never cross:** The `credentials` resource type is in the default excluded list. It must be explicitly added to scope (admin action, logged) and even then is per-grant and per-user. + +## 13. Future (post-v1) + +- B→A push (e.g., "notify A when a task assigned to subject changes") via Socket.IO over mTLS. +- Mesh (N-to-N) federation. +- Write verbs with conflict resolution. +- Shared Step-CA (a "root of roots") so that onboarding doesn't require exchanging CA roots. +- Federated memory search over vector indexes with homomorphic filtering. + +## 14. Locked Decisions (was "Open Questions") + +| # | Question | Decision | +| --- | ------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------- | +| 1 | What happens to a grant when its subject user is deleted? | **Revoke-on-delete.** All grants where the user is subject are auto-revoked and CRL'd. | +| 2 | Do we audit read-only requests? | **Yes.** All federated reads are audited on the serving side. Bodies are not captured; query hash + metadata only. | +| 3 | Default rate limit? | **60 requests per minute per grant,** override-able per grant. | +| 4 | How do we verify the requesting-server's identity beyond the grant token? | **X.509 client cert tied to the user,** issued by Step-CA (per-server) or locally generated. Cert subject carries `grantId` + `subjectUserId`. | + +### M1 decisions + +- **Postgres deployment:** **Containerized** alongside the gateway in M1 (Docker Compose profile). Moving to a dedicated host is a M5+ operational concern, not a v1 feature. +- **Instance signing key:** **Separate** from the Step-CA key. Step-CA signs federation certs; the instance master key seals at-rest secrets (client keys, provider credentials). Different blast-radius, different rotation cadences. + +## 15. Acceptance Criteria + +- [ ] Two Mosaic Stack gateways on different hosts can establish a federation grant via the CLI-driven onboarding flow. +- [ ] Server A can query Server B for `tasks`, `notes`, `memory` respecting scope filters. +- [ ] A user on B with no grant cannot be queried by A, even if A has a valid grant for another user. +- [ ] Revoking a grant on B causes A's next request to fail with a clear error within one request cycle. +- [ ] Cert rotation happens automatically at T-7 days; an in-progress session survives rotation without user action. +- [ ] Rate-limit enforcement returns 429 with `Retry-After`; client backs off. +- [ ] With B unreachable, a session on A completes using local data and surfaces a "federation offline for ``" signal once. +- [ ] Every federated request appears in B's `federation_audit_log` within 1 second. +- [ ] A scope excluding `credentials` means credentials are not returnable even via `search` with matching keywords. +- [ ] `mosaic federation status` shows cert expiry, grant status, and last success/failure per peer. + +## 16. Implementation Milestones (reference) + +Milestones live in `docs/federation/MILESTONES.md` (to be authored next). High-level: + +- **M1:** Server A runs `federated` tier standalone (Postgres + Valkey + pgvector, containerized). No peer yet. +- **M2:** Step-CA embedded; `federation_grants` / `federation_peers` schema + admin CLI. +- **M3:** Handshake + `list`/`get` verbs with scope enforcement. +- **M4:** `search` verb, audit log, rate limits. +- **M5:** Cache layer, offline-degradation UX, observability surfaces. +- **M6:** Revocation flows (admin + revoke-on-delete), cert auto-renewal. +- **M7:** Multi-user RBAC hardening on B, team-scoped grants end-to-end, acceptance suite green. + +--- + +**Next step after PRD sign-off:** author `docs/federation/MILESTONES.md` with per-milestone acceptance tests and estimated token budget, then file tracking issues on `git.mosaicstack.dev/mosaicstack/stack`. diff --git a/docs/federation/TASKS.md b/docs/federation/TASKS.md new file mode 100644 index 0000000..d004b66 --- /dev/null +++ b/docs/federation/TASKS.md @@ -0,0 +1,76 @@ +# Tasks — Federation v1 + +> Single-writer: orchestrator only. Workers read but never modify. +> +> **Mission:** federation-v1-20260419 +> **Schema:** `| id | status | description | issue | agent | branch | depends_on | estimate | notes |` +> **Status values:** `not-started` | `in-progress` | `done` | `blocked` | `failed` | `needs-qa` +> **Agent values:** `codex` | `glm-5.1` | `haiku` | `sonnet` | `opus` | `—` (auto) +> +> **Scope of this file:** M1 is fully decomposed below. M2–M7 are placeholders pending each milestone's entry into active planning — the orchestrator expands them one milestone at a time to avoid speculative decomposition of work whose shape will depend on what M1 surfaces. + +--- + +## Milestone 1 — Federated tier infrastructure (FED-M1) + +Goal: Gateway runs in `federated` tier with containerized PG+pgvector+Valkey. No federation logic yet. Existing standalone behavior does not regress. + +| id | status | description | issue | agent | branch | depends_on | estimate | notes | +| --------- | ----------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ----- | ------ | ------------------------------- | ---------- | -------- | ----------------------------------------------------------------------------------------------------------------- | +| FED-M1-01 | not-started | Extend `mosaic.config.json` schema: add `"federated"` to `tier` enum in validator + TS types. Keep `local` and `standalone` working. Update schema docs/README where referenced. | #460 | codex | feat/federation-m1-tier-config | — | 4K | Schema lives in `packages/types`; validator in gateway bootstrap. No behavior change yet — enum only. | +| FED-M1-02 | not-started | Author `docker-compose.federated.yml` as an overlay profile: Postgres 16 + pgvector extension (port 5433), Valkey (6380), named volumes, healthchecks. Compose-up should boot cleanly on a clean machine. | #460 | codex | feat/federation-m1-compose | FED-M1-01 | 5K | Overlay on existing `docker-compose.yml`; no changes to base file. Add `profile: federated` gating. | +| FED-M1-03 | not-started | Add pgvector support to `packages/storage/src/adapters/postgres.ts`: create extension on init (idempotent), expose vector column type in schema helpers. No adapter changes for non-federated tiers. | #460 | codex | feat/federation-m1-pgvector | FED-M1-02 | 8K | Extension create is idempotent `CREATE EXTENSION IF NOT EXISTS vector`. Gate on tier = federated. | +| FED-M1-04 | not-started | Implement `apps/gateway/src/bootstrap/tier-detector.ts`: reads config, asserts PG/Valkey/pgvector reachable for `federated`, fail-fast with actionable error message on failure. Unit tests for each failure mode. | #460 | codex | feat/federation-m1-detector | FED-M1-03 | 8K | Structured error type with remediation hints. Logs which service failed, with host:port attempted. | +| FED-M1-05 | not-started | Write `scripts/migrate-to-federated.ts`: one-way migration from `local` (PGlite) / `standalone` (PG without pgvector) → `federated`. Dumps, transforms, loads; dry-run + confirm UX. Idempotent on re-run. | #460 | codex | feat/federation-m1-migrate | FED-M1-04 | 10K | Do NOT run automatically. CLI subcommand `mosaic migrate tier --to federated --dry-run`. Safety rails. | +| FED-M1-06 | not-started | Update `mosaic doctor`: report current tier, required services, actual health per service, pgvector presence, overall green/yellow/red. Machine-readable JSON output flag for CI use. | #460 | sonnet | feat/federation-m1-doctor | FED-M1-04 | 6K | Existing doctor output evolves; add `--json` flag. Green/yellow/red + remediation suggestions per issue. | +| FED-M1-07 | not-started | Integration test: gateway boots in `federated` tier with docker-compose `federated` profile; refuses to boot when PG unreachable (asserts fail-fast); pgvector extension query succeeds. | #460 | sonnet | feat/federation-m1-integration | FED-M1-04 | 8K | Vitest + docker-compose test profile. One test file per assertion; real services, no mocks. | +| FED-M1-08 | not-started | Integration test for migration script: seed a local PGlite with representative data (tasks, notes, users, teams), run migration, assert row counts + key samples equal on federated PG. | #460 | sonnet | feat/federation-m1-migrate-test | FED-M1-05 | 6K | Runs against docker-compose federated profile; uses temp PGlite file; deterministic seed. | +| FED-M1-09 | not-started | Standalone regression: full agent-session E2E on existing `standalone` tier with a gateway built from this branch. Must pass without referencing any federation module. | #460 | haiku | feat/federation-m1-regression | FED-M1-07 | 4K | Reuse existing e2e harness; just re-point at the federation branch build. Canary that we didn't break it. | +| FED-M1-10 | not-started | Code review pass: security-focused on the migration script (data-at-rest during migration) + tier detector (error-message sensitivity leakage). Independent reviewer, not authors of tasks 01-09. | #460 | sonnet | — | FED-M1-09 | 8K | Use `feature-dev:code-reviewer` agent. Specifically: no secrets in error messages; no partial-migration footguns. | +| FED-M1-11 | not-started | Docs update: `docs/federation/` operator notes for tier setup; README blurb on federated tier; `docs/guides/` entry for migration. Do NOT touch runbook yet (deferred to FED-M7). | #460 | haiku | feat/federation-m1-docs | FED-M1-10 | 4K | Short, actionable. Link from MISSION-MANIFEST. No decisions captured here — those belong in PRD. | +| FED-M1-12 | not-started | PR, CI green, merge to main, close #460. | #460 | — | (aggregate) | FED-M1-11 | 3K | Queue-guard before push; wait for green; merge squashed; tea `issue-close` #460. | + +**M1 total estimate:** ~74K tokens (over-budget vs 20K PRD estimate — explanation below) + +**Why over-budget:** PRD's 20K estimate reflected implementation complexity only. The per-task breakdown includes tests, review, and docs as separate tasks per the delivery cycle, which catches the real cost. The final per-milestone budgets in MISSION-MANIFEST will be updated after M1 completes with actuals. + +--- + +## Milestone 2 — Step-CA + grant schema + admin CLI (FED-M2) + +_Deferred to mission planning when M1 is complete. Issue #461 tracks scope._ + +## Milestone 3 — mTLS handshake + list/get + scope enforcement (FED-M3) + +_Deferred. Issue #462._ + +## Milestone 4 — search + audit + rate limit (FED-M4) + +_Deferred. Issue #463._ + +## Milestone 5 — cache + offline + OTEL (FED-M5) + +_Deferred. Issue #464._ + +## Milestone 6 — revocation + auto-renewal + CRL (FED-M6) + +_Deferred. Issue #465._ + +## Milestone 7 — multi-user hardening + acceptance suite (FED-M7) + +_Deferred. Issue #466._ + +--- + +## Execution Notes + +**Agent assignment rationale:** + +- `codex` for most implementation tasks (OpenAI credit pool preferred for feature code) +- `sonnet` for tests (pattern-based, moderate complexity), `doctor` work (cross-cutting), and independent code review +- `haiku` for docs and the standalone regression canary (cheapest tier for mechanical/verification work) +- No `opus` in M1 — save for cross-cutting architecture decisions if they surface later + +**Branch strategy:** Each task gets its own feature branch off `main`. Tasks within a milestone merge in dependency order. Final aggregate PR (FED-M1-12) isn't a branch of its own — it's the merge of the last upstream task that closes the issue. + +**Queue guard:** Every push and every merge in this mission must run `~/.config/mosaic/tools/git/ci-queue-wait.sh --purpose push|merge` per Mosaic hard gate #6.