diff --git a/docs/federation/MISSION-MANIFEST.md b/docs/federation/MISSION-MANIFEST.md index c6c7918..861f17e 100644 --- a/docs/federation/MISSION-MANIFEST.md +++ b/docs/federation/MISSION-MANIFEST.md @@ -7,11 +7,11 @@ **ID:** federation-v1-20260419 **Statement:** Jarvis operates across 3–4 workstations in two physical locations (home, USC). The user currently reaches back to a single jarvis-brain checkout from every session; a prior OpenBrain attempt caused cache, latency, and opacity pain. This mission builds asymmetric federation between Mosaic Stack gateways so that a session on a user's home gateway can query their work gateway in real time without data ever persisting across the boundary, with full multi-tenant isolation and standard-PKI (X.509 / Step-CA) trust management. -**Phase:** M2 active — Step-CA + grant schema + admin CLI; parallel test-deploy workstream stood up -**Current Milestone:** FED-M2 -**Progress:** 1 / 7 milestones +**Phase:** M3 active — mTLS handshake + list/get/capabilities verbs + scope enforcement +**Current Milestone:** FED-M3 +**Progress:** 2 / 7 milestones **Status:** active -**Last Updated:** 2026-04-21 (M2 decomposed; mos-test-1/-2 designated as federation E2E test hosts) +**Last Updated:** 2026-04-21 (M2 closed via PR #503, tag `fed-v0.2.0-m2`, issue #461 closed; M3 decomposed into 14 tasks) **Parent Mission:** None — new mission ## Test Infrastructure @@ -63,8 +63,8 @@ Key design references: | # | ID | Name | Status | Branch | Issue | Started | Completed | | --- | ------ | --------------------------------------------- | ----------- | ------------------ | ----- | ---------- | ---------- | | 1 | FED-M1 | Federated tier infrastructure | done | (12 PRs #470-#481) | #460 | 2026-04-19 | 2026-04-19 | -| 2 | FED-M2 | Step-CA + grant schema + admin CLI | in-progress | (decomposition) | #461 | 2026-04-21 | — | -| 3 | FED-M3 | mTLS handshake + list/get + scope enforcement | not-started | — | #462 | — | — | +| 2 | FED-M2 | Step-CA + grant schema + admin CLI | done | (PRs #483-#503) | #461 | 2026-04-21 | 2026-04-21 | +| 3 | FED-M3 | mTLS handshake + list/get + scope enforcement | in-progress | (decomposition) | #462 | 2026-04-21 | — | | 4 | FED-M4 | search verb + audit log + rate limit | not-started | — | #463 | — | — | | 5 | FED-M5 | Cache + offline degradation + OTEL | not-started | — | #464 | — | — | | 6 | FED-M6 | Revocation + auto-renewal + CRL | not-started | — | #465 | — | — | @@ -85,17 +85,24 @@ Key design references: ## Session History -| Session | Date | Runtime | Outcome | -| ------- | ---------- | ------- | --------------------------------------------------------------------- | -| S1 | 2026-04-19 | claude | PRD authored, MILESTONES decomposed, 7 issues filed | -| S2-S4 | 2026-04-19 | claude | FED-M1 complete: 12 tasks (PRs #470-#481) merged; tag `fed-v0.1.0-m1` | +| Session | Date | Runtime | Outcome | +| ------- | ----------------------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------- | +| S1 | 2026-04-19 | claude | PRD authored, MILESTONES decomposed, 7 issues filed | +| S2-S4 | 2026-04-19 | claude | FED-M1 complete: 12 tasks (PRs #470-#481) merged; tag `fed-v0.1.0-m1` | +| S5-S22 | 2026-04-19 → 2026-04-21 | claude | FED-M2 complete: 13 tasks (PRs #483-#503) merged; tag `fed-v0.2.0-m2`; issue #461 closed. Step-CA + grant schema + admin CLI shipped. | +| S23 | 2026-04-21 | claude | M3 decomposed into 14 tasks in `docs/federation/TASKS.md`. Manifest M3 row → in-progress. Next: kickoff M3-01. | ## Next Step -FED-M2 active. Decomposition landed in `docs/federation/TASKS.md` (M2-01..M2-13 code workstream + DEPLOY-01..DEPLOY-05 parallel test-deploy workstream, ~88K total). Tracking issue #482. +FED-M3 active. Decomposition landed in `docs/federation/TASKS.md` (M3-01..M3-14, ~100K estimate). Tracking issue #462. -Parallel execution plan: +Execution plan (parallel where possible): -- **CODE workstream**: M2-01 (DB migration) starts immediately — sonnet subagent on `feat/federation-m2-schema`. Then M2-02 → M2-09 sequentially with M2-04/M2-05/M2-06/M2-07 having interleaved CA/storage/grant dependencies. -- **DEPLOY workstream**: DEPLOY-01 (image verify) → DEPLOY-02 (stack template) → DEPLOY-03/04 (mos-test-1/-2 deploy) → DEPLOY-05 (TEST-INFRA.md). Gated on Portainer wrapper PR (`PORTAINER_INSECURE` flag) merging first. -- **Re-converge** at M2-10 (E2E test) once both workstreams ready. +- **Foundation**: M3-01 (DTOs in `packages/types/src/federation/`) starts immediately — sonnet subagent on `feat/federation-m3-types`. Blocks all server + client work. +- **Server stream** (after M3-01): M3-03 (AuthGuard) + M3-04 (ScopeService) in series, then M3-05 / M3-06 / M3-07 (verbs) in parallel. +- **Client stream** (after M3-01, parallel with server): M3-08 (FederationClient) → M3-09 (QuerySourceService). +- **Harness** (parallel with everything): M3-02 (`tools/federation-harness/`) — needed for M3-11. +- **Test gates**: M3-10 (Integration) → M3-11 (E2E with harness) → M3-12 (Independent security review, two rounds budgeted). +- **Close**: M3-13 (Docs) → M3-14 (release tag `fed-v0.3.0-m3`, close #462). + +**Test-bed fallback:** `mos-test-1/-2` deploy is still blocked on `FED-M2-DEPLOY-IMG-FIX`. The harness in M3-02 ships a local two-gateway docker-compose so M3-11 is not blocked. Production-host validation is M7's responsibility (PRD AC-12). diff --git a/docs/federation/TASKS.md b/docs/federation/TASKS.md index 953e578..fef40bc 100644 --- a/docs/federation/TASKS.md +++ b/docs/federation/TASKS.md @@ -85,7 +85,38 @@ Goal: An admin can create a federation grant; counterparty enrolls; cert is sign ## Milestone 3 — mTLS handshake + list/get + scope enforcement (FED-M3) -_Deferred. Issue #462._ +Goal: Two federated gateways exchange real data over mTLS. Inbound requests pass through cert validation → grant lookup → scope enforcement → native RBAC → response. `list`, `get`, and `capabilities` verbs land. The federation E2E harness (`tools/federation-harness/`) is the new permanent test bed for M3+ and is gated on every milestone going forward. + +> **Critical trust boundary.** Every 401/403 path needs a test. Code review is non-negotiable; M3-12 budgets two review rounds. +> +> **Tracking issue:** #462. + +| id | status | description | issue | agent | branch | depends_on | estimate | notes | +| --------- | ----------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ----- | ------ | ------------------------------------ | ---------------- | -------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- | +| FED-M3-01 | not-started | `packages/types/src/federation/` — request/response DTOs for `list`, `get`, `capabilities` verbs. Wire-format zod schemas + inferred TS types. Includes `FederationRequest`, `FederationListResponse`, `FederationGetResponse`, `FederationCapabilitiesResponse`, error envelope, `_source` tag. | #462 | sonnet | feat/federation-m3-types | — | 4K | Reusable from gateway server + client + harness. Pure types — no I/O, no NestJS. | +| FED-M3-02 | not-started | `tools/federation-harness/` scaffold: `docker-compose.two-gateways.yml` (Server A + Server B + step-CA), `seed.ts` (provisions grants, peers, sample tasks/notes/credentials per scope variant), `harness.ts` helper (boots stack, returns typed clients). README documents harness use. | #462 | sonnet | feat/federation-m3-harness | DEPLOY-04 (soft) | 8K | Falls back to local docker-compose if `mos-test-1/-2` not yet redeployed (DEPLOY chain blocked on IMG-FIX). Permanent test infra used by M3+. | +| FED-M3-03 | not-started | `apps/gateway/src/federation/server/federation-auth.guard.ts` (NestJS guard). Validates inbound client cert from Fastify TLS context, extracts `grantId` + `subjectUserId` from custom OIDs, loads grant from DB, asserts `status='active'`, attaches `FederationContext` to request. | #462 | sonnet | feat/federation-m3-auth-guard | M3-01 | 8K | Reuses OID parsing logic mirrored from `ca.service.ts` post-issuance verification. 401 on malformed/missing OIDs; 403 on revoked/expired/missing grant. | +| FED-M3-04 | not-started | `apps/gateway/src/federation/server/scope.service.ts`. Pipeline: (1) resource allowlist + excluded check, (2) native RBAC eval as `subjectUserId`, (3) scope filter intersection (`include_teams`, `include_personal`), (4) `max_rows_per_query` cap. Pure service — DB calls injected. | #462 | sonnet | feat/federation-m3-scope-service | M3-01 | 10K | Hardest correctness target in M3. Reuses `parseFederationScope` (M2-03). Returns either `{ allowed: true, filter }` or structured deny reason for audit. | +| FED-M3-05 | not-started | `apps/gateway/src/federation/server/verbs/list.controller.ts`. Wires AuthGuard → ScopeService → tasks/notes/memory query layer; applies row cap; tags rows with `_source`. Resource selector via path param. | #462 | sonnet | feat/federation-m3-verb-list | M3-03, M3-04 | 6K | Routes: `POST /api/federation/v1/list/:resource`. No body persistence. Audit write deferred to M4. | +| FED-M3-06 | not-started | `apps/gateway/src/federation/server/verbs/get.controller.ts`. Single-resource fetch by id; same pipeline as list. 404 on not-found, 403 on RBAC/scope deny — both audited the same way. | #462 | sonnet | feat/federation-m3-verb-get | M3-03, M3-04 | 6K | `POST /api/federation/v1/get/:resource/:id`. Mirrors list controller patterns. | +| FED-M3-07 | not-started | `apps/gateway/src/federation/server/verbs/capabilities.controller.ts`. Read-only enumeration: returns `{ resources, excluded_resources, max_rows_per_query, supported_verbs }` derived from grant scope. Always allowed for an active grant — no RBAC eval. | #462 | sonnet | feat/federation-m3-verb-capabilities | M3-03 | 4K | `GET /api/federation/v1/capabilities`. Smallest verb; useful sanity check that mTLS + auth guard work end-to-end. | +| FED-M3-08 | not-started | `apps/gateway/src/federation/client/federation-client.service.ts`. Outbound mTLS dialer: picks `(certPem, sealed clientKey)` from `federation_peers`, unwraps key, builds undici Agent with mTLS, calls peer verb, parses typed response, wraps non-2xx into `FederationClientError`. | #462 | sonnet | feat/federation-m3-client | M3-01 | 8K | Independent of server stream — can land in parallel with M3-03/04. Cert/key cached per-peer; flushed by future M5/M6 logic. | +| FED-M3-09 | not-started | `apps/gateway/src/federation/client/query-source.service.ts`. Accepts `source: "local" \| "federated:" \| "all"` from gateway query layer; for `"all"` fans out to local + each peer in parallel; merges results; tags every row with `_source`. | #462 | sonnet | feat/federation-m3-query-source | M3-08 | 8K | Per-peer failure surfaces as `_partial: true` in response, not hard failure (sets up M5 offline UX). M5 adds caching + circuit breaker on top. | +| FED-M3-10 | not-started | Integration tests for MILESTONES.md M3 acceptance #6 (malformed OIDs → 401; valid cert + revoked grant → 403) and #7 (`max_rows_per_query` cap). Real PG, mocked TLS context (Fastify req shim). | #462 | sonnet | feat/federation-m3-integration | M3-05, M3-06 | 8K | Vitest profile gated by `FEDERATED_INTEGRATION=1`. Single-gateway suite; no harness required. | +| FED-M3-11 | not-started | E2E tests for MILESTONES.md M3 acceptance #1, #2, #3, #4, #5, #8, #9, #10 (8 cases). Uses harness from M3-02; two real gateways, real Step-CA, real mTLS. Each test asserts both happy-path response and audit/no-persist invariants. | #462 | sonnet | feat/federation-m3-e2e | M3-02, M3-09 | 12K | Largest single task. Each acceptance gets its own `it(...)` for clear failure attribution. | +| FED-M3-12 | not-started | Independent security review (sonnet, not author of M3-03/04/05/06/07/08/09): focus on cert-SAN spoofing, OID extraction edge cases, scope-bypass via filter manipulation, RBAC-bypass via subjectUser swap, response leakage when scope deny. | #462 | sonnet | feat/federation-m3-security-review | M3-11 | 10K | Two review rounds budgeted. PRD requires explicit test for every 401/403 path — review verifies coverage. | +| FED-M3-13 | not-started | Docs update: `docs/federation/SETUP.md` mTLS handshake section, new `docs/federation/HARNESS.md` for federation-harness usage, OID reference table in SETUP.md, scope enforcement pipeline diagram. Runbook still M7-deferred. | #462 | haiku | feat/federation-m3-docs | M3-12 | 5K | One ASCII diagram for the auth-guard → scope → RBAC pipeline; helps future reviewers reason about denial paths. | +| FED-M3-14 | not-started | PR aggregate close, CI green, merge to main, close #462. Release tag `fed-v0.3.0-m3`. Update mission manifest M3 row → done; M4 row → in-progress when work begins. | #462 | sonnet | chore/federation-m3-close | M3-13 | 3K | Same close pattern as M1-12 / M2-13. | + +**M3 estimate:** ~100K tokens (vs MILESTONES.md 40K — same per-task breakdown pattern as M1/M2: tests, review, and docs split out from implementation cost). Largest milestone in the federation mission. + +**Parallelization opportunities:** + +- M3-08 (client) can land in parallel with M3-03/M3-04 (server pipeline) — they only share DTOs from M3-01. +- M3-02 (harness) can land in parallel with everything except M3-11. +- M3-05/M3-06/M3-07 (verbs) are independent of each other once M3-03/M3-04 land. + +**Test bed fallback:** If `mos-test-1.woltje.com` / `mos-test-2.woltje.com` are still blocked on `FED-M2-DEPLOY-IMG-FIX` when M3-11 is ready to run, the harness's local `docker-compose.two-gateways.yml` is a sufficient stand-in. Production-host validation moves to M7 acceptance suite (PRD AC-12). ## Milestone 4 — search + audit + rate limit (FED-M4) diff --git a/docs/scratchpads/mvp-20260312.md b/docs/scratchpads/mvp-20260312.md index e2a1c03..9e87550 100644 --- a/docs/scratchpads/mvp-20260312.md +++ b/docs/scratchpads/mvp-20260312.md @@ -612,3 +612,44 @@ Independent security review surfaced three high-impact and four medium findings; 7. DEPLOY-03/04 acceptance probes (`mosaic gateway doctor --json`, pgvector `vector(3)` round-trip) 8. DEPLOY-05: author `docs/federation/TEST-INFRA.md` 9. M2-02 (Step-CA sidecar) kicks off after image health is green + +### Session 23 — 2026-04-21 — M2 close + M3 decomposition + +**Closed at compaction boundary:** all 13 M2 tasks done, PRs #494–#503 merged to `main`, tag `fed-v0.2.0-m2` published, Gitea release notes posted, issue #461 closed. Main at `4ece6dc6`. + +**M2 hardening landed in PR #501** (security review remediation): + +- CRIT-1: post-issuance OID verification in `ca.service.ts` (rejects cert if `mosaic_grant_id` / `mosaic_subject_user_id` extensions missing or mismatched) +- CRIT-2: atomic activation guard `WHERE status='pending'` on grant + `WHERE state='pending'` on peer; throws `ConflictException` if lost race +- HIGH-2: removed try/catch fallback in `extractCertNotAfter` — parse failures propagate as 500 (no silent 90-day default) +- HIGH-4: token slice for logging (`${token.slice(0, 8)}...`) — no full token in stdout +- HIGH-5: `redeem()` wrapped in try/catch with best-effort failure audit; uses `null` (not `'unknown'`) for nullable UUID FK fallback +- MED-3: `createToken` validates `grant.peerId === dto.peerId`; `BadRequestException` on mismatch + +**Remaining M2 security findings deferred to M3+:** + +- HIGH-1: peerId/subjectUserId tenancy validation on `createGrant` (M3 ScopeService work surfaces this) +- HIGH-3: Step-CA cert SHA-256 fingerprint pinning (M5 cert handling) +- MED-1: token entropy already 32 bytes — wontfix +- MED-2: per-route rate limit on enrollment endpoint (M4 rate limit work) +- MED-4: CSR CN binding to peer's commonName (M3 AuthGuard work) + +**M3 decomposition landed in this session:** + +- 14 tasks (M3-01..M3-14), ~100K estimate +- Structure mirrors M1/M2 pattern: foundation → server stream + client stream + harness in parallel → integration → E2E → security review → docs → close +- M3-02 ships local two-gateway docker-compose (`tools/federation-harness/`) so M3-11 E2E is not blocked on the Portainer test bed (which is still blocked on `FED-M2-DEPLOY-IMG-FIX`) + +**Subagent doctrine retained from M2:** + +- All worker subagents use `isolation: "worktree"` to prevent branch-race incidents +- Code review is independent (different subagent, no overlap with author of work) +- `tea pr create --repo mosaicstack/stack --login mosaicstack` is the working PR-create path; `pr-create.sh` has shell-quoting bugs (followup #45 if not already filed) +- Cost tier: foundational implementation = sonnet, docs = haiku, complex multi-file architecture (security review, scope service) = sonnet with two review rounds + +**Next concrete step:** + +1. PR for the M3 planning artifact (this commit) — branch `docs/federation-m3-planning` +2. After merge, kickoff M3-01 (DTOs) on `feat/federation-m3-types` with sonnet subagent in worktree +3. Once M3-01 lands, fan out: M3-02 (harness) || M3-03 (AuthGuard) → M3-04 (ScopeService) || M3-08 (FederationClient) +4. Re-converge at M3-10 (Integration) → M3-11 (E2E)