docs(federation): M2 mission planning — TASKS decomposition + manifest update (#483)
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline failed

This commit was merged in pull request #483.
This commit is contained in:
2026-04-22 01:24:00 +00:00
parent 45e8f02c91
commit b985d7bfe2
2 changed files with 62 additions and 7 deletions

View File

@@ -7,13 +7,22 @@
**ID:** federation-v1-20260419 **ID:** federation-v1-20260419
**Statement:** Jarvis operates across 34 workstations in two physical locations (home, USC). The user currently reaches back to a single jarvis-brain checkout from every session; a prior OpenBrain attempt caused cache, latency, and opacity pain. This mission builds asymmetric federation between Mosaic Stack gateways so that a session on a user's home gateway can query their work gateway in real time without data ever persisting across the boundary, with full multi-tenant isolation and standard-PKI (X.509 / Step-CA) trust management. **Statement:** Jarvis operates across 34 workstations in two physical locations (home, USC). The user currently reaches back to a single jarvis-brain checkout from every session; a prior OpenBrain attempt caused cache, latency, and opacity pain. This mission builds asymmetric federation between Mosaic Stack gateways so that a session on a user's home gateway can query their work gateway in real time without data ever persisting across the boundary, with full multi-tenant isolation and standard-PKI (X.509 / Step-CA) trust management.
**Phase:** M1 complete — federated tier infrastructure ready for testing **Phase:** M2 active — Step-CA + grant schema + admin CLI; parallel test-deploy workstream stood up
**Current Milestone:** FED-M2 (next; deferred to mission planning) **Current Milestone:** FED-M2
**Progress:** 1 / 7 milestones **Progress:** 1 / 7 milestones
**Status:** active **Status:** active
**Last Updated:** 2026-04-19 (M1 complete; tag `fed-v0.1.0-m1`) **Last Updated:** 2026-04-21 (M2 decomposed; mos-test-1/-2 designated as federation E2E test hosts)
**Parent Mission:** None — new mission **Parent Mission:** None — new mission
## Test Infrastructure
| Host | Role | Image | Tier |
| ----------------------- | ----------------------------------- | ------------------------------------- | --------- |
| `mos-test-1.woltje.com` | Federation Server A (querying side) | `gateway:fed-v0.1.0-m1` (M1 baseline) | federated |
| `mos-test-2.woltje.com` | Federation Server B (serving side) | `gateway:fed-v0.1.0-m1` (M1 baseline) | federated |
These are TEST hosts for federation E2E (M3+). Distinct from PRD AC-12 production targets (`woltje.com``uscllc.com`). Deployment workstream tracked in `docs/federation/TASKS.md` under FED-M2-DEPLOY-\*.
## Context ## Context
Federation is the solution to what originally drove OpenBrain. The prior attempt coupled every agent session to a remote service, introduced cache/latency/opacity pain, and created a hard dependency that punished offline use. This redesign: Federation is the solution to what originally drove OpenBrain. The prior attempt coupled every agent session to a remote service, introduced cache/latency/opacity pain, and created a hard dependency that punished offline use. This redesign:
@@ -54,7 +63,7 @@ Key design references:
| # | ID | Name | Status | Branch | Issue | Started | Completed | | # | ID | Name | Status | Branch | Issue | Started | Completed |
| --- | ------ | --------------------------------------------- | ----------- | ------------------ | ----- | ---------- | ---------- | | --- | ------ | --------------------------------------------- | ----------- | ------------------ | ----- | ---------- | ---------- |
| 1 | FED-M1 | Federated tier infrastructure | done | (12 PRs #470-#481) | #460 | 2026-04-19 | 2026-04-19 | | 1 | FED-M1 | Federated tier infrastructure | done | (12 PRs #470-#481) | #460 | 2026-04-19 | 2026-04-19 |
| 2 | FED-M2 | Step-CA + grant schema + admin CLI | not-started | — | #461 | | — | | 2 | FED-M2 | Step-CA + grant schema + admin CLI | in-progress | (decomposition) | #461 | 2026-04-21 | — |
| 3 | FED-M3 | mTLS handshake + list/get + scope enforcement | not-started | — | #462 | — | — | | 3 | FED-M3 | mTLS handshake + list/get + scope enforcement | not-started | — | #462 | — | — |
| 4 | FED-M4 | search verb + audit log + rate limit | not-started | — | #463 | — | — | | 4 | FED-M4 | search verb + audit log + rate limit | not-started | — | #463 | — | — |
| 5 | FED-M5 | Cache + offline degradation + OTEL | not-started | — | #464 | — | — | | 5 | FED-M5 | Cache + offline degradation + OTEL | not-started | — | #464 | — | — |
@@ -83,6 +92,10 @@ Key design references:
## Next Step ## Next Step
FED-M1 complete (12 PRs #470-#481, tag `fed-v0.1.0-m1`). Federated tier infrastructure is testable end-to-end: see `docs/federation/SETUP.md` and `docs/guides/migrate-tier.md`. FED-M2 active. Decomposition landed in `docs/federation/TASKS.md` (M2-01..M2-13 code workstream + DEPLOY-01..DEPLOY-05 parallel test-deploy workstream, ~88K total). Tracking issue #482.
Begin FED-M2 (Step-CA + grant schema + admin CLI) when planning is greenlit. Issue #461 tracks scope; orchestrator decomposes M2 into per-task rows in `docs/federation/TASKS.md` at the start of M2. Parallel execution plan:
- **CODE workstream**: M2-01 (DB migration) starts immediately — sonnet subagent on `feat/federation-m2-schema`. Then M2-02 → M2-09 sequentially with M2-04/M2-05/M2-06/M2-07 having interleaved CA/storage/grant dependencies.
- **DEPLOY workstream**: DEPLOY-01 (image verify) → DEPLOY-02 (stack template) → DEPLOY-03/04 (mos-test-1/-2 deploy) → DEPLOY-05 (TEST-INFRA.md). Gated on Portainer wrapper PR (`PORTAINER_INSECURE` flag) merging first.
- **Re-converge** at M2-10 (E2E test) once both workstreams ready.

View File

@@ -36,9 +36,51 @@ Goal: Gateway runs in `federated` tier with containerized PG+pgvector+Valkey. No
--- ---
## Pre-M2 — Test deployment infrastructure (FED-M2-DEPLOY)
Goal: Two federated-tier gateways stood up on Portainer at `mos-test-1.woltje.com` and `mos-test-2.woltje.com` running the M1 release (`gateway:fed-v0.1.0-m1`). This is the test bed for M2 enrollment work and the M3 federation E2E harness. No federation logic exercised yet — pure infrastructure validation.
> **Why now:** M2 enrollment requires a real second gateway to test peer-add flows; standing the test hosts up before M2 code lands gives both code and deployment streams a fast feedback loop.
> **Parallelizable:** This workstream runs in parallel with the M2 code workstream (M2-01 → M2-13). They re-converge at M2-10 (E2E test).
> **Tracking issue:** #482.
| id | status | description | issue | agent | branch | depends_on | estimate | notes |
| ---------------- | ----------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----- | ------ | ------------------------------------- | ------------ | -------- | -------------------------------------------------------------------------------------------------------------------------------------------------- |
| FED-M2-DEPLOY-01 | not-started | Verify `gateway:fed-v0.1.0-m1` image was published by `.woodpecker/publish.yml` on tag push; if not, investigate and remediate. Document image URI in deployment artifact. | #482 | sonnet | feat/federation-deploy-image-verify | — | 2K | publish.yml registers `gateway:$CI_COMMIT_TAG` destination; should already exist at `git.mosaicstack.dev/mosaicstack/stack/gateway:fed-v0.1.0-m1`. |
| FED-M2-DEPLOY-02 | not-started | Author Portainer git-stack compose file `deploy/portainer/federated-test.stack.yml` (gateway + PG-pgvector + Valkey, env-driven). Use immutable tag, not `latest`. | #482 | sonnet | feat/federation-deploy-stack-template | DEPLOY-01 | 5K | Stack must be parameterizable via env (`STACK_DOMAIN`, `BETTERAUTH_SECRET`, etc.) so one template serves both hosts. |
| FED-M2-DEPLOY-03 | not-started | Deploy stack to mos-test-1.woltje.com via `~/.config/mosaic/tools/portainer/`. Verify M1 acceptance: federated-tier boot succeeds; `mosaic gateway doctor --json` returns green; pgvector `vector(3)` round-trip works. | #482 | sonnet | feat/federation-deploy-test-1 | DEPLOY-02 | 3K | Requires `PORTAINER_URL` + `PORTAINER_API_KEY` env (vault-loaded). DNS for mos-test-1 must resolve before deploy. |
| FED-M2-DEPLOY-04 | not-started | Deploy stack to mos-test-2.woltje.com via Portainer wrapper. Same M1 acceptance probes as DEPLOY-03. | #482 | sonnet | feat/federation-deploy-test-2 | DEPLOY-02 | 3K | Independent of DEPLOY-03 (parallelizable). Same secret material with distinct domain + secrets per host. |
| FED-M2-DEPLOY-05 | not-started | Document deployment in `docs/federation/TEST-INFRA.md`: hosts, image tags, secrets sourcing, redeploy procedure, teardown. Update MISSION-MANIFEST with deployment status. | #482 | haiku | feat/federation-deploy-docs | DEPLOY-03,04 | 3K | Operator-facing doc; mentions but does not duplicate `tools/portainer/README.md`. |
**Deploy workstream estimate:** ~16K tokens
---
## Milestone 2 — Step-CA + grant schema + admin CLI (FED-M2) ## Milestone 2 — Step-CA + grant schema + admin CLI (FED-M2)
_Deferred to mission planning when M1 is complete. Issue #461 tracks scope._ Goal: An admin can create a federation grant; counterparty enrolls; cert is signed by Step-CA with SAN OIDs for `grantId` + `subjectUserId`. No runtime federation traffic flows yet (that's M3).
| id | status | description | issue | agent | branch | depends_on | estimate | notes |
| --------- | ----------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ----- | ------ | ---------------------------------- | ---------------- | -------- | ----------------------------------------------------------------------------------------------------------------------------------------- |
| FED-M2-01 | not-started | DB migration: `federation_grants`, `federation_peers`, `federation_audit_log` tables + enum types (`grant_status`, `peer_state`). Drizzle schema + migration generation; migration tests. | #461 | sonnet | feat/federation-m2-schema | — | 5K | `federation_audit_log` is created but not yet written to (audit logic is M4). Reserve `query_hash`, `outcome`, `bytes_out` columns. |
| FED-M2-02 | not-started | Add Step-CA sidecar to `docker-compose.federated.yml`: official `smallstep/step-ca` image, persistent CA volume, JWK provisioner config baked into init script. | #461 | sonnet | feat/federation-m2-stepca | DEPLOY-02 | 4K | Profile-gated under `federated`. CA password from secret; dev compose uses dev-only password file. |
| FED-M2-03 | not-started | Scope JSON schema + validator: `resources` allowlist, `excluded_resources`, `include_teams`, `include_personal`, `max_rows_per_query`. Vitest unit tests for valid + invalid scopes. | #461 | sonnet | feat/federation-m2-scope-schema | — | 4K | Validator independent of CA — reusable from grant CRUD + (later) M3 scope enforcement. |
| FED-M2-04 | not-started | `apps/gateway/src/federation/ca.service.ts`: Step-CA client (CSR submission, OID-bearing cert retrieval). Mocked + integration tests against real Step-CA container. | #461 | sonnet | feat/federation-m2-ca-service | M2-02 | 6K | SAN OIDs: `grantId` (custom OID 1.3.6.1.4.1.99999.1) + `subjectUserId` (1.3.6.1.4.1.99999.2). Document OID assignments in PRD/SETUP. |
| FED-M2-05 | not-started | Sealed storage for `client_key_pem` reusing existing `provider_credentials` sealing key. Tests prove DB-at-rest is ciphertext, not PEM. Key rotation path documented (deferred impl). | #461 | sonnet | feat/federation-m2-key-sealing | M2-01 | 5K | Separate from M2-06 to keep crypto seam isolated; reviewer focus is sealing only. |
| FED-M2-06 | not-started | `grants.service.ts`: CRUD + status transitions (`pending``active``revoked`); integrates M2-03 (scope) + M2-05 (sealing). Unit tests cover all transitions including invalid ones. | #461 | sonnet | feat/federation-m2-grants-service | M2-03, M2-05 | 6K | Business logic only — CSR + cert work delegated to M2-04. Revocation handler is M6. |
| FED-M2-07 | not-started | `enrollment.controller.ts`: short-lived single-use token endpoint; CSR signing; updates grant `pending``active`; emits enrollment audit (table-only write, M4 tightens). | #461 | sonnet | feat/federation-m2-enrollment | M2-04, M2-06 | 6K | Tokens single-use with 410 on replay; tokens TTL'd at 15min; rate-limited at request layer (M4 introduces guard, M2 uses simple lock). |
| FED-M2-08 | not-started | Admin CLI: `mosaic federation grant create/list/show` + `peer add/list`. Integration with grants.service (no API duplication). Help output + machine-readable JSON option. | #461 | sonnet | feat/federation-m2-cli | M2-06, M2-07 | 7K | `peer add <enrollment-url>` is the client-side flow; resolves enrollment URL → CSR → store sealed key + cert. |
| FED-M2-09 | not-started | Integration tests covering MILESTONES.md M2 acceptance tests #1, #2, #3, #5, #7, #8 (single-gateway suite). Real Step-CA container; vitest profile gated by `FEDERATED_INTEGRATION=1`. | #461 | sonnet | feat/federation-m2-integration | M2-08 | 8K | Tests #4 (cert OID match) + #6 (two-gateway peer-add) handled separately by M2-10 (E2E). |
| FED-M2-10 | not-started | E2E test against deployed mos-test-1 + mos-test-2 (or local two-gateway docker-compose if Portainer not ready): MILESTONES test #6 `peer add` yields `active` peer record with valid cert + key. | #461 | sonnet | feat/federation-m2-e2e | M2-08, DEPLOY-04 | 6K | Falls back to local docker-compose-two-gateways if remote test hosts not yet available. Documents both paths. |
| FED-M2-11 | not-started | Independent security review (sonnet, not author of M2-04/05/06/07): focus on single-use token replay, sealing leak surfaces, OID match enforcement, scope schema bypass paths. | #461 | sonnet | feat/federation-m2-security-review | M2-10 | 8K | Apply M1 two-round pattern. Reviewer should explicitly attempt enrollment-token replay, OID-spoofing CSR, and key leak in error messages. |
| FED-M2-12 | not-started | Docs update: `docs/federation/SETUP.md` Step-CA section; new `docs/federation/ADMIN-CLI.md` with grant/peer commands; scope schema reference; OID registration note. Runbook still M7-deferred. | #461 | haiku | feat/federation-m2-docs | M2-11 | 4K | Adds CA bootstrap section to SETUP.md with `docker compose --profile federated up step-ca` example. |
| FED-M2-13 | not-started | PR aggregate close, CI green, merge to main, close #461. Release tag `fed-v0.2.0-m2`. Mark deploy stream complete. Update mission manifest M2 row. | #461 | sonnet | feat/federation-m2-close | M2-12 | 3K | Same close pattern as M1-12; queue-guard before merge; tea release-create with notes including deploy-stream PRs. |
**M2 code workstream estimate:** ~72K tokens (vs MILESTONES.md 30K — same over-budget pattern as M1, where per-task breakdown including tests/review/docs catches the real cost).
**Deploy + code combined:** ~88K tokens.
## Milestone 3 — mTLS handshake + list/get + scope enforcement (FED-M3) ## Milestone 3 — mTLS handshake + list/get + scope enforcement (FED-M3)