Compare commits

..

2 Commits

Author SHA1 Message Date
Jarvis
47aac682f5 docs(federation): PRD, milestones, mission manifest, and M1 task breakdown
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
Plans the Federation v1 mission: cross-instance data federation between
Mosaic Stack gateways with asymmetric trust (home gateway sees blended
A+B at session time; work gateway sees only its own tenants), mTLS via
X.509 / Step-CA for auth, multi-tenant RBAC with no cross-user leakage,
and no data persistence across the boundary.

- docs/federation/PRD.md — 16-section product requirements (v1 locked)
- docs/federation/MILESTONES.md — 7-milestone decomposition with
  per-milestone acceptance test tables across unit/integration/E2E layers
- docs/federation/MISSION-MANIFEST.md — mission scope, success criteria,
  milestone table linked to issues #460-#466
- docs/federation/TASKS.md — FED-M1 decomposed into 12 tasks; M2-M7
  deferred to per-milestone planning to avoid speculative decomposition

Refs: #460 #461 #462 #463 #464 #465 #466

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-19 17:04:39 -05:00
81c1775a03 chore(release): @mosaicstack/mosaic 0.0.29 (#453)
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
ci/woodpecker/tag/publish Pipeline failed
Co-authored-by: Jason Woltje <jason@diversecanvas.com>
Co-committed-by: Jason Woltje <jason@diversecanvas.com>
2026-04-08 00:42:54 +00:00
4 changed files with 859 additions and 0 deletions

View File

@@ -0,0 +1,368 @@
# Mosaic Stack — Federation Implementation Milestones
**Companion to:** `PRD.md`
**Approach:** Each milestone is a verifiable slice. A milestone is "done" only when its acceptance tests pass in CI against a real (not mocked) dependency stack.
---
## Milestone Dependency Graph
```
M1 (federated tier infra)
└── M2 (Step-CA + grant schema + CLI)
└── M3 (mTLS handshake + list/get + scope enforcement)
├── M4 (search + audit + rate limit)
│ └── M5 (cache + offline degradation + OTEL)
├── M6 (revocation + auto-renewal) ◄── can start after M3
└── M7 (multi-user hardening + e2e suite) ◄── depends on M4+M5+M6
```
M5 and M6 can run in parallel once M4 is merged.
---
## Test Strategy (applies to all milestones)
Three layers, all required before a milestone ships:
| Layer | Scope | Runtime |
| ------------------ | --------------------------------------------- | ------------------------------------------------------------------------ |
| **Unit** | Per-module logic, pure functions, adapters | Vitest, no I/O |
| **Integration** | Single gateway against real PG/Valkey/Step-CA | Vitest + Docker Compose test profile |
| **Federation E2E** | Two gateways on a Docker network, real mTLS | Playwright/custom harness (`tools/federation-harness/`) introduced in M3 |
Every milestone adds tests to these layers. A milestone cannot be claimed complete if the federation E2E harness fails (applies from M3 onward).
**Quality gates per milestone** (same as stack-wide):
- `pnpm typecheck` green
- `pnpm lint` green
- `pnpm test` green (unit + integration)
- `pnpm test:federation` green (M3+)
- Independent code review passed
- Docs updated (`docs/federation/`)
- Merged PR on `main`, CI terminal green, linked issue closed
---
## M1 — Federated Tier Infrastructure
**Goal:** A gateway can run in `federated` tier with containerized Postgres + Valkey + pgvector, with no federation logic active yet.
**Scope:**
- Add `"tier": "federated"` to `mosaic.config.json` schema and validators
- Docker Compose `federated` profile (`docker-compose.federated.yml`) adds: Postgres+pgvector (5433), Valkey (6380), dedicated volumes
- Tier detector in gateway bootstrap: reads config, asserts required services reachable, refuses to start otherwise
- `pgvector` extension installed + verified on startup
- Migration logic: safe upgrade path from `local`/`standalone``federated` (data export/import script, one-way)
- `mosaic doctor` reports tier + service health
- Gateway continues to serve as a normal standalone instance (no federation yet)
**Deliverables:**
- `mosaic.config.json` schema v2 (tier enum includes `federated`)
- `apps/gateway/src/bootstrap/tier-detector.ts`
- `docker-compose.federated.yml`
- `scripts/migrate-to-federated.ts`
- Updated `mosaic doctor` output
- Updated `packages/storage/src/adapters/postgres.ts` with pgvector support
**Acceptance tests:**
| # | Test | Layer |
| - | ---------------------------------------------------------------------------------------- | ----------- |
| 1 | Gateway boots in `federated` tier with all services present | Integration |
| 2 | Gateway refuses to boot in `federated` tier when Postgres unreachable (fail-fast, clear) | Integration |
| 3 | `pgvector` extension available in target DB (`SELECT * FROM pg_extension WHERE extname='vector'`) | Integration |
| 4 | Migration script moves a populated `local` (PGlite) instance to `federated` (Postgres) with no data loss | Integration |
| 5 | `mosaic doctor` reports correct tier and all services green | Unit |
| 6 | Existing standalone behavior regression: agent session works end-to-end, no federation references | E2E (single-gateway) |
**Estimated budget:** ~20K tokens (infra + config + migration script)
**Risk notes:** Pgvector install on existing PG installs is occasionally finicky; test the migration path on a realistic DB snapshot.
---
## M2 — Step-CA + Grant Schema + Admin CLI
**Goal:** An admin can create a federation grant and its counterparty can enroll. No runtime traffic flows yet.
**Scope:**
- Embed Step-CA as a Docker Compose sidecar with a persistent CA volume
- Gateway exposes a short-lived enrollment endpoint (single-use token from the grant)
- DB schema: `federation_grants`, `federation_peers`, `federation_audit_log` (table only, not yet written to)
- Sealed storage for `client_key_pem` using the existing credential sealing key
- Admin CLI:
- `mosaic federation grant create --user <id> --peer <host> --scope <file>`
- `mosaic federation grant list`
- `mosaic federation grant show <id>`
- `mosaic federation peer add <enrollment-url>`
- `mosaic federation peer list`
- Step-CA signs the cert with SAN OIDs for `grantId` + `subjectUserId`
- Grant status transitions: `pending``active` on successful enrollment
**Deliverables:**
- `packages/db` migration: three federation tables + enum types
- `apps/gateway/src/federation/ca.service.ts` (Step-CA client)
- `apps/gateway/src/federation/grants.service.ts`
- `apps/gateway/src/federation/enrollment.controller.ts`
- `packages/mosaic/src/commands/federation/` (grant + peer subcommands)
- `docker-compose.federated.yml` adds Step-CA service
- Scope JSON schema + validator
**Acceptance tests:**
| # | Test | Layer |
| - | ---------------------------------------------------------------------------------------- | ----------- |
| 1 | `grant create` writes a `pending` row with a scoped bundle | Integration |
| 2 | Enrollment endpoint signs a CSR and returns a cert with expected SAN OIDs | Integration |
| 3 | Enrollment token is single-use; second attempt returns 410 | Integration |
| 4 | Cert `subjectUserId` OID matches the grant's `subject_user_id` | Unit |
| 5 | `client_key_pem` is at-rest encrypted; raw DB read shows ciphertext, not PEM | Integration |
| 6 | `peer add <url>` on Server A yields an `active` peer record with a valid cert + key | E2E (two gateways, no traffic) |
| 7 | Scope JSON with unknown resource type rejected at `grant create` | Unit |
| 8 | `grant list` and `peer list` render active / pending / revoked accurately | Unit |
**Estimated budget:** ~30K tokens (schema + CA integration + CLI + sealing)
**Risk notes:** Step-CA's API surface is well-documented but the sealing integration with existing provider-credential encryption is a cross-module concern — walk that seam deliberately.
---
## M3 — mTLS Handshake + `list` + `get` with Scope Enforcement
**Goal:** Two federated gateways exchange real data over mTLS with scope intersecting native RBAC.
**Scope:**
- `FederationClient` (outbound): picks cert from `federation_peers`, does mTLS call
- `FederationServer` (inbound): NestJS guard validates client cert, extracts `grantId` + `subjectUserId`, loads grant
- Scope enforcement pipeline:
1. Resource allowlist / excluded-list check
2. Native RBAC evaluation as the `subjectUserId`
3. Scope filter intersection (`include_teams`, `include_personal`)
4. `max_rows_per_query` cap
- Verbs: `list`, `get`, `capabilities`
- Gateway query layer accepts `source: "local" | "federated:<host>" | "all"`; fan-out for `"all"`
- **Federation E2E harness** (`tools/federation-harness/`): docker-compose.two-gateways.yml, seed script, assertion helpers — this is its own deliverable
**Deliverables:**
- `apps/gateway/src/federation/client/federation-client.service.ts`
- `apps/gateway/src/federation/server/federation-auth.guard.ts`
- `apps/gateway/src/federation/server/scope.service.ts`
- `apps/gateway/src/federation/server/verbs/{list,get,capabilities}.controller.ts`
- `apps/gateway/src/federation/client/query-source.service.ts` (fan-out/merge)
- `tools/federation-harness/` (compose + seed + test helpers)
- `packages/types` — federation request/response DTOs in `federation.dto.ts`
**Acceptance tests:**
| # | Test | Layer |
| -- | -------------------------------------------------------------------------------------------------------- | ----- |
| 1 | A→B `list tasks` returns subjectUser's tasks intersected with scope | E2E |
| 2 | A→B `list tasks` with `include_teams: [T1]` excludes T2 tasks the user owns | E2E |
| 3 | A→B `get credential <id>` returns 403 when `credentials` is in `excluded_resources` | E2E |
| 4 | Client presenting cert for grant X cannot query subjectUser of grant Y (cross-user isolation) | E2E |
| 5 | Cert signed by untrusted CA rejected at TLS layer (no NestJS handler reached) | E2E |
| 6 | Malformed SAN OIDs → 401; cert valid but grant revoked in DB → 403 | Integration |
| 7 | `max_rows_per_query` caps response; request for more paginated | Integration |
| 8 | `source: "all"` fan-out merges local + federated results, each tagged with `_source` | Integration |
| 9 | Federation responses never persist: verify DB row count unchanged after `list` round-trip | E2E |
| 10 | Scope cannot grant more than native RBAC: user without access to team T still gets [] even if scope allows T | E2E |
**Estimated budget:** ~40K tokens (largest milestone — core federation logic + harness)
**Risk notes:** This is the critical trust boundary. Code review should focus on scope enforcement bypass and cert-SAN-spoofing paths. Every 403/401 path needs a test.
---
## M4 — `search` Verb + Audit Log + Rate Limit
**Goal:** Keyword search over allowed resources with full audit and per-grant rate limiting.
**Scope:**
- `search` verb across `resources` allowlist (intersection of scope + native RBAC)
- Keyword search (reuse existing `packages/memory/src/adapters/keyword.ts`); pgvector search stays out of v1 search verb
- Every federated request (all verbs) writes to `federation_audit_log`: `grant_id`, `verb`, `resource`, `query_hash`, `outcome`, `bytes_out`, `latency_ms`
- No request body captured; `query_hash` is SHA-256 of normalized query params
- Token-bucket rate limit per grant (default 60/min, override per grant)
- 429 response with `Retry-After` header and structured body
- 90-day hot retention for audit log; cold-tier rollover deferred to M7
**Deliverables:**
- `apps/gateway/src/federation/server/verbs/search.controller.ts`
- `apps/gateway/src/federation/server/audit.service.ts` (async write, no blocking)
- `apps/gateway/src/federation/server/rate-limit.guard.ts`
- Tests in harness
**Acceptance tests:**
| # | Test | Layer |
| - | ------------------------------------------------------------------------------------------------- | ----------- |
| 1 | `search` returns ranked hits only from allowed resources | E2E |
| 2 | `search` excluding `credentials` does not return a match even when keyword matches a credential name | E2E |
| 3 | Every successful request appears in `federation_audit_log` within 1s | Integration |
| 4 | Denied request (403) is also audited with `outcome='denied'` | Integration |
| 5 | Audit row stores query hash but NOT query body | Unit |
| 6 | 61st request in 60s window returns 429 with `Retry-After` | E2E |
| 7 | Per-grant override (e.g., 600/min) takes effect without restart | Integration |
| 8 | Audit writes are async: request latency unchanged when audit write slow (simulated) | Integration |
**Estimated budget:** ~20K tokens
**Risk notes:** Ensure audit writes can't block or error-out the request path; use a bounded queue and drop-with-counter pattern rather than in-line writes.
---
## M5 — Cache + Offline Degradation + Observability
**Goal:** Sessions feel fast and stay useful when the peer is slow or down.
**Scope:**
- In-memory response cache keyed by `(grant_id, verb, resource, query_hash)`, TTL 30s default
- Cache NOT used for `search`; only `list` and `get`
- Cache flushed on cert rotation and grant revocation
- Circuit breaker per peer: after N failures, fast-fail for cooldown window
- `_source` tagging extended with `_cached: true` when served from cache
- Agent-visible "federation offline for `<peer>`" signal emitted once per session per peer
- OTEL spans: `federation.request` with attrs `grant_id`, `peer`, `verb`, `resource`, `outcome`, `latency_ms`, `cached`
- W3C `traceparent` propagated across the mTLS boundary (both directions)
- `mosaic federation status` CLI subcommand
**Deliverables:**
- `apps/gateway/src/federation/client/response-cache.service.ts`
- `apps/gateway/src/federation/client/circuit-breaker.service.ts`
- `apps/gateway/src/federation/observability/` (span helpers)
- `packages/mosaic/src/commands/federation/status.ts`
**Acceptance tests:**
| # | Test | Layer |
| - | --------------------------------------------------------------------------------------------- | ----- |
| 1 | Two identical `list` calls within 30s: second served from cache, flagged `_cached` | Integration |
| 2 | `search` is never cached: two identical searches both hit the peer | Integration |
| 3 | After grant revocation, peer's cache is flushed immediately | Integration |
| 4 | After N consecutive failures, circuit opens; subsequent requests fail-fast without network call | E2E |
| 5 | Circuit closes after cooldown and next success | E2E |
| 6 | With peer offline, session completes using local data, one "federation offline" signal surfaced | E2E |
| 7 | OTEL traces show spans on both gateways correlated by `traceparent` | E2E |
| 8 | `mosaic federation status` prints peer state, cert expiry, last success/failure, circuit state | Unit |
**Estimated budget:** ~20K tokens
**Risk notes:** Caching correctness under revocation must be provable — write tests that intentionally race revocation against cached hits.
---
## M6 — Revocation, Auto-Renewal, CRL
**Goal:** Grant lifecycle works end-to-end: admin revoke, revoke-on-delete, automatic cert renewal, CRL distribution.
**Scope:**
- `mosaic federation grant revoke <id>` → status `revoked`, CRL updated, audit entry
- DB hook: deleting a user cascades `revoke-on-delete` on all grants where that user is subject
- Step-CA CRL endpoint exposed; serving gateway enforces CRL check on every handshake (cached CRL, refresh interval 60s)
- Client-side cert renewal job: at T-7 days, submit renewal CSR; rotate cert atomically; flush cache
- On renewal failure, peer marked `degraded` and admin-visible alert emitted
- Server A detects revocation on next request (TLS handshake fails with specific error) → peer marked `revoked`, user notified
**Deliverables:**
- `apps/gateway/src/federation/server/crl.service.ts` + endpoint
- `apps/gateway/src/federation/server/revocation.service.ts`
- DB cascade trigger or ORM hook for user deletion → grant revocation
- `apps/gateway/src/federation/client/renewal.job.ts` (scheduled)
- `packages/mosaic/src/commands/federation/grant.ts` gains `revoke` subcommand
**Acceptance tests:**
| # | Test | Layer |
| - | ----------------------------------------------------------------------------------------- | ----- |
| 1 | Admin `grant revoke` → A's next request fails with TLS-level error | E2E |
| 2 | Deleting subject user on B auto-revokes all grants where that user was the subject | Integration |
| 3 | CRL endpoint serves correct list; revoked cert present | Integration |
| 4 | Server rejects cert listed in CRL even if cert itself is still time-valid | E2E |
| 5 | Cert at T-7 days triggers renewal job; new cert issued and installed without dropped requests | E2E |
| 6 | Renewal failure marks peer `degraded` and surfaces alert | Integration |
| 7 | A marks peer `revoked` after a revocation-caused handshake failure (not on transient network errors) | E2E |
**Estimated budget:** ~20K tokens
**Risk notes:** The atomic cert swap during renewal is the sharpest edge here — any in-flight request mid-swap must either complete on old or retry on new, never fail mid-call.
---
## M7 — Multi-User RBAC Hardening + Team-Scoped Grants + Acceptance Suite
**Goal:** The full multi-tenant scenario from §4 user stories works end-to-end, with no cross-user leakage under any circumstance.
**Scope:**
- Three-user scenario on Server B (E1, E2, E3) each with their own Server A
- Team-scoped grants exercised: each employee's team-data visible on their own A, but E1's personal data never visible on E2's A
- User-facing UI surfaces on both gateways for: peer list, grant list, audit log viewer, scope editor
- Negative-path test matrix (every denial path from PRD §8)
- All PRD §15 acceptance criteria mapped to automated tests in the harness
- Security review: cert-spoofing, scope-bypass, audit-bypass paths explicitly tested
- Cold-storage rollover for audit log >90 days
- Docs: operator runbook, onboarding guide, troubleshooting guide
**Deliverables:**
- Full federation acceptance suite in `tools/federation-harness/acceptance/`
- `apps/web` surfaces for peer/grant/audit management
- `docs/federation/RUNBOOK.md`, `docs/federation/ONBOARDING.md`, `docs/federation/TROUBLESHOOTING.md`
- Audit cold-tier job (daily cron, moves rows >90d to separate table or object storage)
**Acceptance tests:**
Every PRD §15 criterion must be automated and green. Additionally:
| # | Test | Layer |
| --- | ----------------------------------------------------------------------------------------------------- | ---------------- |
| 1 | 3-employee scenario: each A sees only its user's data from B | E2E |
| 2 | Grant with team scope returns team data; same grant denied access to another employee's personal data | E2E |
| 3 | Concurrent sessions from E1's and E2's Server A to B interleave without any leakage | E2E |
| 4 | Audit log across 3-user test shows per-grant trails with no mis-attributed rows | E2E |
| 5 | Scope editor UI round-trip: edit → save → next request uses new scope | E2E |
| 6 | Attempt to use a revoked grant's cert against a different grant's endpoint: rejected | E2E |
| 7 | 90-day-old audit rows moved to cold tier; queryable via explicit historical query | Integration |
| 8 | Runbook steps validated: an operator following the runbook can onboard, rotate, and revoke | Manual checklist |
**Estimated budget:** ~25K tokens
**Risk notes:** This is the security-critical milestone. Budget review time here is non-negotiable — plan for two independent code reviews (internal + security-focused) before merge.
---
## Total Budget & Timeline Sketch
| Milestone | Tokens (est.) | Can parallelize? |
| --------- | ------------- | ---------------------- |
| M1 | 20K | No (foundation) |
| M2 | 30K | No (needs M1) |
| M3 | 40K | No (needs M2) |
| M4 | 20K | No (needs M3) |
| M5 | 20K | Yes (with M6 after M4) |
| M6 | 20K | Yes (with M5 after M3) |
| M7 | 25K | No (needs all) |
| **Total** | **~175K** | |
Parallelization of M5 and M6 after M4 saves one milestone's worth of serial time.
---
## Exit Criteria (federation feature complete)
All of the following must be green on `main`:
- Every PRD §15 acceptance criterion automated and passing
- Every milestone's acceptance table green
- Security review sign-off on M7
- Runbook walk-through completed by operator (not author)
- `mosaic doctor` recognizes federated tier and reports peer health accurately
- Two-gateway production deployment (woltje.com ↔ uscllc.com) operational for ≥7 days without incident
---
## Next Step After This Doc Is Approved
1. File tracking issues on `git.mosaicstack.dev/mosaicstack/stack` — one per milestone, labeled `epic:federation`
2. Populate `docs/TASKS.md` with M1's task breakdown (per-task agent assignment, budget, dependencies)
3. Begin M1 implementation

View File

@@ -0,0 +1,85 @@
# Mission Manifest — Federation v1
> Persistent document tracking full mission scope, status, and session history.
> Updated by the orchestrator at each phase transition and milestone completion.
## Mission
**ID:** federation-v1-20260419
**Statement:** Jarvis operates across 34 workstations in two physical locations (home, USC). The user currently reaches back to a single jarvis-brain checkout from every session; a prior OpenBrain attempt caused cache, latency, and opacity pain. This mission builds asymmetric federation between Mosaic Stack gateways so that a session on a user's home gateway can query their work gateway in real time without data ever persisting across the boundary, with full multi-tenant isolation and standard-PKI (X.509 / Step-CA) trust management.
**Phase:** Planning complete — M1 implementation not started
**Current Milestone:** FED-M1
**Progress:** 0 / 7 milestones
**Status:** active
**Last Updated:** 2026-04-19 (PRD + MILESTONES + tracking issues filed)
**Parent Mission:** None — new mission
## Context
Federation is the solution to what originally drove OpenBrain. The prior attempt coupled every agent session to a remote service, introduced cache/latency/opacity pain, and created a hard dependency that punished offline use. This redesign:
1. Makes federation **gateway-to-gateway**, not agent-to-service
2. Keeps each user's home instance as source of truth for their data
3. Exposes scoped, read-only data on demand without persisting across the boundary
4. Uses X.509 mTLS via Step-CA so rotation/revocation/CRL/OCSP are standard
5. Supports multi-tenant serving sides (employees on uscllc.com each federating back to their own home gateway) with no cross-user leakage
6. Requires federation-tier instances on both sides (PG + pgvector + Valkey) — local/standalone tiers cannot federate
7. Works over public HTTPS (no VPN required); Tailscale is an optional overlay
Key design references:
- `docs/federation/PRD.md` — 16-section product requirements
- `docs/federation/MILESTONES.md` — 7-milestone decomposition with per-milestone acceptance tests
- `docs/federation/TASKS.md` — per-task breakdown (M1 populated; M2-M7 deferred to mission planning)
- `docs/research/mempalace-evaluation/` (in jarvis-brain) — why we didn't adopt MemPalace
## Success Criteria
- [ ] AC-1: Two Mosaic Stack gateways on different hosts can establish a federation grant via CLI-driven onboarding
- [ ] AC-2: Server A can query Server B for `tasks`, `notes`, `memory` respecting scope filters
- [ ] AC-3: User on B with no grant cannot be queried by A, even if A has a valid grant for another user (cross-user isolation)
- [ ] AC-4: Revoking a grant on B causes A's next request to fail with a clear error within one request cycle
- [ ] AC-5: Cert rotation happens automatically at T-7 days; in-progress session survives rotation without user action
- [ ] AC-6: Rate-limit enforcement returns 429 with `Retry-After`; client backs off
- [ ] AC-7: With B unreachable, a session on A completes using local data and surfaces "federation offline for `<peer>`" once per session
- [ ] AC-8: Every federated request appears in B's `federation_audit_log` within 1 second
- [ ] AC-9: Scope excluding `credentials` means credentials are never returned — even via `search` with matching keywords
- [ ] AC-10: `mosaic federation status` shows cert expiry, grant status, last success/failure per peer
- [ ] AC-11: Full 3-employee multi-tenant scenario passes with no cross-user leakage
- [ ] AC-12: Two-gateway production deployment (woltje.com ↔ uscllc.com) operational ≥7 days without incident
- [ ] AC-13: All 7 milestones ship as merged PRs with green CI and closed issues
## Milestones
| # | ID | Name | Status | Branch | Issue | Started | Completed |
| --- | ------ | --------------------------------------------- | ----------- | ------ | ----- | ------- | --------- |
| 1 | FED-M1 | Federated tier infrastructure | not-started | — | #460 | — | — |
| 2 | FED-M2 | Step-CA + grant schema + admin CLI | not-started | — | #461 | — | — |
| 3 | FED-M3 | mTLS handshake + list/get + scope enforcement | not-started | — | #462 | — | — |
| 4 | FED-M4 | search verb + audit log + rate limit | not-started | — | #463 | — | — |
| 5 | FED-M5 | Cache + offline degradation + OTEL | not-started | — | #464 | — | — |
| 6 | FED-M6 | Revocation + auto-renewal + CRL | not-started | — | #465 | — | — |
| 7 | FED-M7 | Multi-user RBAC hardening + acceptance suite | not-started | — | #466 | — | — |
## Budget
| Milestone | Est. tokens | Parallelizable? |
| --------- | ----------- | ---------------------- |
| FED-M1 | 20K | No (foundation) |
| FED-M2 | 30K | No (needs M1) |
| FED-M3 | 40K | No (needs M2) |
| FED-M4 | 20K | No (needs M3) |
| FED-M5 | 20K | Yes (with M6 after M4) |
| FED-M6 | 20K | Yes (with M5 after M3) |
| FED-M7 | 25K | No (needs all) |
| **Total** | **~175K** | |
## Session History
| Session | Date | Runtime | Outcome |
| ------- | ---------- | ------- | --------------------------------------------------- |
| S1 | 2026-04-19 | claude | PRD authored, MILESTONES decomposed, 7 issues filed |
## Next Step
Begin FED-M1 implementation: federated tier infrastructure. Breakdown in `docs/federation/TASKS.md`.

330
docs/federation/PRD.md Normal file
View File

@@ -0,0 +1,330 @@
# Mosaic Stack — Federation PRD
**Status:** Draft v1 (locked for implementation)
**Owner:** Jason
**Date:** 2026-04-19
**Scope:** Enables cross-instance data federation between Mosaic Stack gateways with asymmetric trust, multi-tenant scoping, and no cross-boundary data persistence.
---
## 1. Problem Statement
Jarvis operates across 34 workstations in two physical locations (home, USC). The user currently reaches back to a single jarvis-brain checkout from every session, and has tried OpenBrain to solve cross-session state — with poor results (cache invalidation, latency, opacity, hard dependency on a remote service).
The goal is a federation model where each user's **home instance** remains the source of truth for their personal data, and **work/shared instances** expose scoped data to that user's home instance on demand — without persisting anything across the boundary.
## 2. Goals
1. A user logged into their **home gateway** (Server A) can query their **work gateway** (Server B) in real time during a session.
2. Data returned from Server B is used in-session only; never written to Server A storage.
3. Server B has multiple users, each with their own Server A. No user's data leaks to another user.
4. Federation works over public HTTPS (no VPN required). Tailscale is a supported optional overlay.
5. Sync latency target: seconds, or at the next data need of the agent.
6. Graceful degradation: if the remote instance is unreachable, the local session continues with local data and a clear "federation offline" signal.
7. Teams exist on both sides. A federation grant can share **team-owned** data without exposing other team members' personal data.
8. Auth and revocation use standard PKI (X.509) so that certificate tooling (Step-CA, rotation, OCSP, CRL) is available out of the box.
## 3. Non-Goals (v1)
- Mesh federation (N-to-N). v1 is strictly A↔B pairs.
- Cross-instance writes. All federation is **read-only** on the remote side.
- Shared agent sessions across instances. Sessions live on one instance; federation is data-plane only.
- Cross-instance SSO. Each instance owns its own BetterAuth identity store; federation is service-to-service, not user-to-user.
- Realtime push from B→A. v1 is pull-only (A pulls from B during a session).
- Global search index. Federation is query-by-query, not index replication.
## 4. User Stories
- **US-1 (Solo user at home):** As the sole user on Server A, I want my agent session on workstation-1 to see the same data it saw on workstation-2, without running OpenBrain.
- **US-2 (Cross-location):** As a user with a home server and a work server, I want a session on my home laptop to transparently pull my USC-owned tasks/notes when I ask for them.
- **US-3 (Work admin):** As the admin of mosaic.uscllc.com, I want to grant each employee's home gateway scoped read access to only their own data plus explicitly-shared team data.
- **US-4 (Privacy boundary):** As employee A on mosaic.uscllc.com, my data must never appear in a session on employee B's home gateway — even if both are federated with uscllc.com.
- **US-5 (Revocation):** As a work admin, when I delete an employee, their home gateway loses access within one request cycle.
- **US-6 (Offline):** As a user in a hotel with flaky wifi, my local session keeps working; federation calls fail fast and are reported as "offline," not hung.
## 5. Architecture Overview
```
┌─────────────────────────────────────┐ mTLS / X.509 ┌─────────────────────────────────────┐
│ Server A — mosaic.woltje.com │ ───────────────────────► │ Server B — mosaic.uscllc.com │
│ (home, master for Jason) │ ◄── JSON over HTTPS │ (work, multi-tenant) │
│ │ │ │
│ ┌──────────────┐ ┌──────────────┐ │ │ ┌──────────────┐ ┌──────────────┐ │
│ │ Gateway │ │ Postgres │ │ │ │ Gateway │ │ Postgres │ │
│ │ (NestJS) │──│ (local SSOT)│ │ │ │ (NestJS) │──│ (tenant SSOT)│ │
│ └──────┬───────┘ └──────────────┘ │ │ └──────┬───────┘ └──────────────┘ │
│ │ │ │ │ │
│ │ FederationClient │ │ │ FederationServer │
│ │ (outbound, scoped query) │ │ │ (inbound, RBAC-gated) │
│ └───────────────────────────┼──────────────────────────┼────────┘ │
│ │ │ │
│ Step-CA (issues A's client cert) │ │ Step-CA (issues B's server cert, │
│ │ │ trusts A's CA root on grant)│
└─────────────────────────────────────┘ └──────────────────────────────────────┘
```
- Federation is a **transport-layer** concern between two gateways, implemented as a new internal module on each gateway.
- Both sides run the same code. Direction (client vs. server role) is per-request.
- Nothing in the agent runtime changes — agents query the gateway; the gateway decides local vs. remote.
## 6. Transport & Authentication
**Transport:** HTTPS with mutual TLS (mTLS).
**Identity:** X.509 client certificates issued by Step-CA. Each federation grant materializes as a client cert on the requesting side and a trust-anchor entry (CA root or explicit cert) on the serving side.
**Why mTLS over HMAC bearer tokens:**
- Standard rotation/revocation semantics (renew, CRL, OCSP).
- The cert subject carries identity claims (user, grant_id) that don't need a separate DB lookup to verify authenticity.
- Client certs never transit request bodies, so they can't be logged by accident.
- Transport is pinned at the TLS layer, not re-validated per-handler.
**Cert contents (SAN + subject):**
- `CN=grant-<uuid>`
- `O=<requesting-server-hostname>` (e.g., `mosaic.woltje.com`)
- Custom OIDs embedded in SAN otherName:
- `mosaic.federation.grantId` (UUID)
- `mosaic.federation.subjectUserId` (user on the **serving** side that this grant acts-as)
- Default lifetime: **30 days**, with auto-renewal at T-7 days if the grant is still active.
**Step-CA topology (v1):** Each server runs its own Step-CA instance. During onboarding, the serving side imports the requesting side's CA root. A central/shared Step-CA is out of scope for v1.
**Handshake:**
1. Client (A) opens HTTPS to B with its grant cert.
2. B validates cert chain against trusted CA roots for that grant.
3. B extracts `grantId` and `subjectUserId` from the cert.
4. B loads the grant record, checks it is `active`, not revoked, and not expired.
5. B enforces scope and rate-limit for this grant.
6. Request proceeds; response returned.
## 7. Data Model
All tables live on **each instance's own Postgres**. Federation grants are bilateral — each side has a record of the grant.
### 7.1 `federation_grants` (on serving side, Server B)
| Field | Type | Notes |
| --------------------------- | ----------- | ------------------------------------------------- |
| `id` | uuid PK | |
| `subject_user_id` | uuid FK | Which local user this grant acts-as |
| `requesting_server` | text | Hostname of requesting gateway (e.g., woltje.com) |
| `requesting_ca_fingerprint` | text | SHA-256 of trusted CA root |
| `active_cert_fingerprint` | text | SHA-256 of currently valid client cert |
| `scope` | jsonb | See §8 |
| `rate_limit_rpm` | int | Default 60 |
| `status` | enum | `pending`, `active`, `suspended`, `revoked` |
| `created_at` | timestamptz | |
| `activated_at` | timestamptz | |
| `revoked_at` | timestamptz | |
| `last_used_at` | timestamptz | |
| `notes` | text | Admin-visible description |
### 7.2 `federation_peers` (on requesting side, Server A)
| Field | Type | Notes |
| --------------------- | ----------- | ------------------------------------------------ |
| `id` | uuid PK | |
| `peer_hostname` | text | e.g., `mosaic.uscllc.com` |
| `peer_ca_fingerprint` | text | SHA-256 of peer's CA root |
| `grant_id` | uuid | The grant ID assigned by the peer |
| `local_user_id` | uuid FK | Who on Server A this federation belongs to |
| `client_cert_pem` | text (enc) | Current client cert (PEM); rotated automatically |
| `client_key_pem` | text (enc) | Private key (encrypted at rest) |
| `cert_expires_at` | timestamptz | |
| `status` | enum | `pending`, `active`, `degraded`, `revoked` |
| `last_success_at` | timestamptz | |
| `last_failure_at` | timestamptz | |
| `notes` | text | |
### 7.3 `federation_audit_log` (on serving side, Server B)
| Field | Type | Notes |
| ------------- | ----------- | ------------------------------------------------ |
| `id` | uuid PK | |
| `grant_id` | uuid FK | |
| `occurred_at` | timestamptz | indexed |
| `verb` | text | `query`, `handshake`, `rejected`, `rate_limited` |
| `resource` | text | e.g., `tasks`, `notes`, `credentials` |
| `query_hash` | text | SHA-256 of normalized query (no payload stored) |
| `outcome` | text | `ok`, `denied`, `error` |
| `bytes_out` | int | |
| `latency_ms` | int | |
**Audit policy:** Every federation request is logged on the serving side. Read-only requests only — no body capture. Retention: 90 days hot, then roll to cold storage.
## 8. RBAC & Scope
Every federation grant has a scope object that answers three questions for every inbound request:
1. **Who is acting?**`subject_user_id` from the cert.
2. **What resources?** — an allowlist of resource types (`tasks`, `notes`, `credentials`, `memory`, `teams/:id/tasks`, …).
3. **Filter expression** — predicates applied on top of the subject's normal RBAC (see below).
### 8.1 Scope schema
```json
{
"resources": ["tasks", "notes", "memory"],
"filters": {
"tasks": { "include_teams": ["team_uuid_1", "team_uuid_2"], "include_personal": true },
"notes": { "include_personal": true, "include_teams": [] },
"memory": { "include_personal": true }
},
"excluded_resources": ["credentials", "api_keys"],
"max_rows_per_query": 500
}
```
### 8.2 Access rule (enforced on serving side)
For every inbound federated query on resource R:
1. Resolve effective identity → `subject_user_id`.
2. Check R is in `scope.resources` and NOT in `scope.excluded_resources`. Otherwise 403.
3. Evaluate the user's **normal RBAC** (what would they see if they logged into Server B directly)?
4. Intersect with the scope filter (e.g., only team X, only personal).
5. Apply `max_rows_per_query`.
6. Return; log to audit.
### 8.3 Team boundary guarantees
- Scope filters are additive, never subtractive of the native RBAC. A grant cannot grant access the user would not have had themselves.
- `include_teams` means "only these teams," not "these teams in addition to all teams."
- `include_personal: false` hides the user's personal data entirely from federation, even if they own it — useful for work-only accounts.
### 8.4 No cross-user leakage
When Server B has multiple users (employees) all federating back to their own Server A:
- Each employee has their own grant with their own `subject_user_id`.
- The cert is bound to a specific grant; there is no mechanism by which one grant's cert can be used to impersonate another.
- Audit log is per-grant.
## 9. Query Model
Federation exposes a **narrow read API**, not arbitrary SQL.
### 9.1 Supported verbs (v1)
| Verb | Purpose | Returns |
| -------------- | ------------------------------------------ | ------------------------------- |
| `list` | Paginated list of a resource type | Array of resources |
| `get` | Fetch a single resource by id | One resource or 404 |
| `search` | Keyword search within allowed resources | Ranked list of hits |
| `capabilities` | What this grant is allowed to do right now | Scope object + rate-limit state |
### 9.2 Not in v1
- Write verbs.
- Aggregations / analytics.
- Streaming / subscriptions (future: see §13).
### 9.3 Agent-facing integration
Agents never call federation directly. Instead:
- The gateway query layer accepts `source: "local" | "federated:<peer_hostname>" | "all"`.
- `"all"` fans out in parallel, merges results, tags each with `_source`.
- Federation results are in-memory only; the gateway does not persist them.
## 10. Caching
- **In-memory response cache** with short TTL (default 30s) for `list` and `get`. `search` is not cached.
- Cache is keyed by `(grant_id, verb, resource, query_hash)`.
- Cache is flushed on cert rotation and on grant revocation.
- No disk cache. No cross-session cache.
## 11. Bootstrap & Onboarding
### 11.1 Instance capability tiers
| Tier | Storage | Queue | Memory | Can federate? |
| ------------ | -------- | ------- | -------- | --------------------- |
| `local` | PGlite | in-proc | keyword | No |
| `standalone` | Postgres | Valkey | keyword | No (can be client) |
| `federated` | Postgres | Valkey | pgvector | Yes (server + client) |
Federation requires `federated` tier on **both** sides.
### 11.2 Onboarding flow (admin-driven)
1. Admin on Server B runs `mosaic federation grant create --user <user-id> --peer <peer-hostname> --scope-file scope.json`.
2. Server B generates a `grant_id`, prints a one-time enrollment URL containing the grant ID + B's CA root fingerprint.
3. Admin on Server A (or the user themselves, if allowed) runs `mosaic federation peer add <enrollment-url>`.
4. Server A's Step-CA generates a CSR for the new grant. A submits the CSR to B over a short-lived enrollment endpoint (single-use token in the enrollment URL).
5. B's Step-CA signs the cert (with grant ID embedded in SAN OIDs), returns it.
6. A stores the signed cert + private key (encrypted) in `federation_peers`.
7. Grant status flips from `pending` to `active` on both sides.
8. Cert auto-renews at T-7 days using the standard Step-CA renewal flow as long as the grant remains active.
### 11.3 Revocation
- **Admin-initiated:** `mosaic federation grant revoke <grant-id>` on B flips status to `revoked`, adds the cert to B's CRL, and writes an audit entry.
- **Revoke-on-delete:** Deleting a user on B automatically revokes all grants where that user is the subject.
- Server A learns of revocation on the next request (TLS handshake fails) and flips the peer to `revoked`.
### 11.4 Rate limit
Default `60 req/min` per grant. Configurable per grant. Enforced at the serving side. A rate-limited request returns `429` with `Retry-After`.
## 12. Operational Concerns
- **Observability:** Each federation request emits an OTEL span with `grant_id`, `peer`, `verb`, `resource`, `outcome`, `latency_ms`. Traces correlate across both servers via W3C traceparent.
- **Health check:** `mosaic federation status` on each side shows active grants, last-success times, cert expirations, and any CRL mismatches.
- **Backpressure:** If the serving side is overloaded, it returns `503` with a structured body; the client marks the peer `degraded` and falls back to local-only until the next successful handshake.
- **Secrets:** `client_key_pem` in `federation_peers` is encrypted with the gateway's key (sealed with the instance's master key — same mechanism as `provider_credentials`).
- **Credentials never cross:** The `credentials` resource type is in the default excluded list. It must be explicitly added to scope (admin action, logged) and even then is per-grant and per-user.
## 13. Future (post-v1)
- B→A push (e.g., "notify A when a task assigned to subject changes") via Socket.IO over mTLS.
- Mesh (N-to-N) federation.
- Write verbs with conflict resolution.
- Shared Step-CA (a "root of roots") so that onboarding doesn't require exchanging CA roots.
- Federated memory search over vector indexes with homomorphic filtering.
## 14. Locked Decisions (was "Open Questions")
| # | Question | Decision |
| --- | ------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------- |
| 1 | What happens to a grant when its subject user is deleted? | **Revoke-on-delete.** All grants where the user is subject are auto-revoked and CRL'd. |
| 2 | Do we audit read-only requests? | **Yes.** All federated reads are audited on the serving side. Bodies are not captured; query hash + metadata only. |
| 3 | Default rate limit? | **60 requests per minute per grant,** override-able per grant. |
| 4 | How do we verify the requesting-server's identity beyond the grant token? | **X.509 client cert tied to the user,** issued by Step-CA (per-server) or locally generated. Cert subject carries `grantId` + `subjectUserId`. |
### M1 decisions
- **Postgres deployment:** **Containerized** alongside the gateway in M1 (Docker Compose profile). Moving to a dedicated host is a M5+ operational concern, not a v1 feature.
- **Instance signing key:** **Separate** from the Step-CA key. Step-CA signs federation certs; the instance master key seals at-rest secrets (client keys, provider credentials). Different blast-radius, different rotation cadences.
## 15. Acceptance Criteria
- [ ] Two Mosaic Stack gateways on different hosts can establish a federation grant via the CLI-driven onboarding flow.
- [ ] Server A can query Server B for `tasks`, `notes`, `memory` respecting scope filters.
- [ ] A user on B with no grant cannot be queried by A, even if A has a valid grant for another user.
- [ ] Revoking a grant on B causes A's next request to fail with a clear error within one request cycle.
- [ ] Cert rotation happens automatically at T-7 days; an in-progress session survives rotation without user action.
- [ ] Rate-limit enforcement returns 429 with `Retry-After`; client backs off.
- [ ] With B unreachable, a session on A completes using local data and surfaces a "federation offline for `<peer>`" signal once.
- [ ] Every federated request appears in B's `federation_audit_log` within 1 second.
- [ ] A scope excluding `credentials` means credentials are not returnable even via `search` with matching keywords.
- [ ] `mosaic federation status` shows cert expiry, grant status, and last success/failure per peer.
## 16. Implementation Milestones (reference)
Milestones live in `docs/federation/MILESTONES.md` (to be authored next). High-level:
- **M1:** Server A runs `federated` tier standalone (Postgres + Valkey + pgvector, containerized). No peer yet.
- **M2:** Step-CA embedded; `federation_grants` / `federation_peers` schema + admin CLI.
- **M3:** Handshake + `list`/`get` verbs with scope enforcement.
- **M4:** `search` verb, audit log, rate limits.
- **M5:** Cache layer, offline-degradation UX, observability surfaces.
- **M6:** Revocation flows (admin + revoke-on-delete), cert auto-renewal.
- **M7:** Multi-user RBAC hardening on B, team-scoped grants end-to-end, acceptance suite green.
---
**Next step after PRD sign-off:** author `docs/federation/MILESTONES.md` with per-milestone acceptance tests and estimated token budget, then file tracking issues on `git.mosaicstack.dev/mosaicstack/stack`.

76
docs/federation/TASKS.md Normal file
View File

@@ -0,0 +1,76 @@
# Tasks — Federation v1
> Single-writer: orchestrator only. Workers read but never modify.
>
> **Mission:** federation-v1-20260419
> **Schema:** `| id | status | description | issue | agent | branch | depends_on | estimate | notes |`
> **Status values:** `not-started` | `in-progress` | `done` | `blocked` | `failed` | `needs-qa`
> **Agent values:** `codex` | `glm-5.1` | `haiku` | `sonnet` | `opus` | `—` (auto)
>
> **Scope of this file:** M1 is fully decomposed below. M2M7 are placeholders pending each milestone's entry into active planning — the orchestrator expands them one milestone at a time to avoid speculative decomposition of work whose shape will depend on what M1 surfaces.
---
## Milestone 1 — Federated tier infrastructure (FED-M1)
Goal: Gateway runs in `federated` tier with containerized PG+pgvector+Valkey. No federation logic yet. Existing standalone behavior does not regress.
| id | status | description | issue | agent | branch | depends_on | estimate | notes |
| --------- | ----------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ----- | ------ | ------------------------------- | ---------- | -------- | ----------------------------------------------------------------------------------------------------------------- |
| FED-M1-01 | not-started | Extend `mosaic.config.json` schema: add `"federated"` to `tier` enum in validator + TS types. Keep `local` and `standalone` working. Update schema docs/README where referenced. | #460 | codex | feat/federation-m1-tier-config | — | 4K | Schema lives in `packages/types`; validator in gateway bootstrap. No behavior change yet — enum only. |
| FED-M1-02 | not-started | Author `docker-compose.federated.yml` as an overlay profile: Postgres 16 + pgvector extension (port 5433), Valkey (6380), named volumes, healthchecks. Compose-up should boot cleanly on a clean machine. | #460 | codex | feat/federation-m1-compose | FED-M1-01 | 5K | Overlay on existing `docker-compose.yml`; no changes to base file. Add `profile: federated` gating. |
| FED-M1-03 | not-started | Add pgvector support to `packages/storage/src/adapters/postgres.ts`: create extension on init (idempotent), expose vector column type in schema helpers. No adapter changes for non-federated tiers. | #460 | codex | feat/federation-m1-pgvector | FED-M1-02 | 8K | Extension create is idempotent `CREATE EXTENSION IF NOT EXISTS vector`. Gate on tier = federated. |
| FED-M1-04 | not-started | Implement `apps/gateway/src/bootstrap/tier-detector.ts`: reads config, asserts PG/Valkey/pgvector reachable for `federated`, fail-fast with actionable error message on failure. Unit tests for each failure mode. | #460 | codex | feat/federation-m1-detector | FED-M1-03 | 8K | Structured error type with remediation hints. Logs which service failed, with host:port attempted. |
| FED-M1-05 | not-started | Write `scripts/migrate-to-federated.ts`: one-way migration from `local` (PGlite) / `standalone` (PG without pgvector) → `federated`. Dumps, transforms, loads; dry-run + confirm UX. Idempotent on re-run. | #460 | codex | feat/federation-m1-migrate | FED-M1-04 | 10K | Do NOT run automatically. CLI subcommand `mosaic migrate tier --to federated --dry-run`. Safety rails. |
| FED-M1-06 | not-started | Update `mosaic doctor`: report current tier, required services, actual health per service, pgvector presence, overall green/yellow/red. Machine-readable JSON output flag for CI use. | #460 | sonnet | feat/federation-m1-doctor | FED-M1-04 | 6K | Existing doctor output evolves; add `--json` flag. Green/yellow/red + remediation suggestions per issue. |
| FED-M1-07 | not-started | Integration test: gateway boots in `federated` tier with docker-compose `federated` profile; refuses to boot when PG unreachable (asserts fail-fast); pgvector extension query succeeds. | #460 | sonnet | feat/federation-m1-integration | FED-M1-04 | 8K | Vitest + docker-compose test profile. One test file per assertion; real services, no mocks. |
| FED-M1-08 | not-started | Integration test for migration script: seed a local PGlite with representative data (tasks, notes, users, teams), run migration, assert row counts + key samples equal on federated PG. | #460 | sonnet | feat/federation-m1-migrate-test | FED-M1-05 | 6K | Runs against docker-compose federated profile; uses temp PGlite file; deterministic seed. |
| FED-M1-09 | not-started | Standalone regression: full agent-session E2E on existing `standalone` tier with a gateway built from this branch. Must pass without referencing any federation module. | #460 | haiku | feat/federation-m1-regression | FED-M1-07 | 4K | Reuse existing e2e harness; just re-point at the federation branch build. Canary that we didn't break it. |
| FED-M1-10 | not-started | Code review pass: security-focused on the migration script (data-at-rest during migration) + tier detector (error-message sensitivity leakage). Independent reviewer, not authors of tasks 01-09. | #460 | sonnet | — | FED-M1-09 | 8K | Use `feature-dev:code-reviewer` agent. Specifically: no secrets in error messages; no partial-migration footguns. |
| FED-M1-11 | not-started | Docs update: `docs/federation/` operator notes for tier setup; README blurb on federated tier; `docs/guides/` entry for migration. Do NOT touch runbook yet (deferred to FED-M7). | #460 | haiku | feat/federation-m1-docs | FED-M1-10 | 4K | Short, actionable. Link from MISSION-MANIFEST. No decisions captured here — those belong in PRD. |
| FED-M1-12 | not-started | PR, CI green, merge to main, close #460. | #460 | — | (aggregate) | FED-M1-11 | 3K | Queue-guard before push; wait for green; merge squashed; tea `issue-close` #460. |
**M1 total estimate:** ~74K tokens (over-budget vs 20K PRD estimate — explanation below)
**Why over-budget:** PRD's 20K estimate reflected implementation complexity only. The per-task breakdown includes tests, review, and docs as separate tasks per the delivery cycle, which catches the real cost. The final per-milestone budgets in MISSION-MANIFEST will be updated after M1 completes with actuals.
---
## Milestone 2 — Step-CA + grant schema + admin CLI (FED-M2)
_Deferred to mission planning when M1 is complete. Issue #461 tracks scope._
## Milestone 3 — mTLS handshake + list/get + scope enforcement (FED-M3)
_Deferred. Issue #462._
## Milestone 4 — search + audit + rate limit (FED-M4)
_Deferred. Issue #463._
## Milestone 5 — cache + offline + OTEL (FED-M5)
_Deferred. Issue #464._
## Milestone 6 — revocation + auto-renewal + CRL (FED-M6)
_Deferred. Issue #465._
## Milestone 7 — multi-user hardening + acceptance suite (FED-M7)
_Deferred. Issue #466._
---
## Execution Notes
**Agent assignment rationale:**
- `codex` for most implementation tasks (OpenAI credit pool preferred for feature code)
- `sonnet` for tests (pattern-based, moderate complexity), `doctor` work (cross-cutting), and independent code review
- `haiku` for docs and the standalone regression canary (cheapest tier for mechanical/verification work)
- No `opus` in M1 — save for cross-cutting architecture decisions if they surface later
**Branch strategy:** Each task gets its own feature branch off `main`. Tasks within a milestone merge in dependency order. Final aggregate PR (FED-M1-12) isn't a branch of its own — it's the merge of the last upstream task that closes the issue.
**Queue guard:** Every push and every merge in this mission must run `~/.config/mosaic/tools/git/ci-queue-wait.sh --purpose push|merge` per Mosaic hard gate #6.