Files
stack/docs/federation/MISSION-MANIFEST.md
jason.woltje b985d7bfe2
Some checks failed
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline failed
docs(federation): M2 mission planning — TASKS decomposition + manifest update (#483)
2026-04-22 01:24:00 +00:00

7.3 KiB
Raw Blame History

Mission Manifest — Federation v1

Persistent document tracking full mission scope, status, and session history. Updated by the orchestrator at each phase transition and milestone completion.

Mission

ID: federation-v1-20260419 Statement: Jarvis operates across 34 workstations in two physical locations (home, USC). The user currently reaches back to a single jarvis-brain checkout from every session; a prior OpenBrain attempt caused cache, latency, and opacity pain. This mission builds asymmetric federation between Mosaic Stack gateways so that a session on a user's home gateway can query their work gateway in real time without data ever persisting across the boundary, with full multi-tenant isolation and standard-PKI (X.509 / Step-CA) trust management. Phase: M2 active — Step-CA + grant schema + admin CLI; parallel test-deploy workstream stood up Current Milestone: FED-M2 Progress: 1 / 7 milestones Status: active Last Updated: 2026-04-21 (M2 decomposed; mos-test-1/-2 designated as federation E2E test hosts) Parent Mission: None — new mission

Test Infrastructure

Host Role Image Tier
mos-test-1.woltje.com Federation Server A (querying side) gateway:fed-v0.1.0-m1 (M1 baseline) federated
mos-test-2.woltje.com Federation Server B (serving side) gateway:fed-v0.1.0-m1 (M1 baseline) federated

These are TEST hosts for federation E2E (M3+). Distinct from PRD AC-12 production targets (woltje.comuscllc.com). Deployment workstream tracked in docs/federation/TASKS.md under FED-M2-DEPLOY-*.

Context

Federation is the solution to what originally drove OpenBrain. The prior attempt coupled every agent session to a remote service, introduced cache/latency/opacity pain, and created a hard dependency that punished offline use. This redesign:

  1. Makes federation gateway-to-gateway, not agent-to-service
  2. Keeps each user's home instance as source of truth for their data
  3. Exposes scoped, read-only data on demand without persisting across the boundary
  4. Uses X.509 mTLS via Step-CA so rotation/revocation/CRL/OCSP are standard
  5. Supports multi-tenant serving sides (employees on uscllc.com each federating back to their own home gateway) with no cross-user leakage
  6. Requires federation-tier instances on both sides (PG + pgvector + Valkey) — local/standalone tiers cannot federate
  7. Works over public HTTPS (no VPN required); Tailscale is an optional overlay

Key design references:

  • docs/federation/PRD.md — 16-section product requirements
  • docs/federation/MILESTONES.md — 7-milestone decomposition with per-milestone acceptance tests
  • docs/federation/TASKS.md — per-task breakdown (M1 populated; M2-M7 deferred to mission planning)
  • docs/research/mempalace-evaluation/ (in jarvis-brain) — why we didn't adopt MemPalace

Success Criteria

  • AC-1: Two Mosaic Stack gateways on different hosts can establish a federation grant via CLI-driven onboarding
  • AC-2: Server A can query Server B for tasks, notes, memory respecting scope filters
  • AC-3: User on B with no grant cannot be queried by A, even if A has a valid grant for another user (cross-user isolation)
  • AC-4: Revoking a grant on B causes A's next request to fail with a clear error within one request cycle
  • AC-5: Cert rotation happens automatically at T-7 days; in-progress session survives rotation without user action
  • AC-6: Rate-limit enforcement returns 429 with Retry-After; client backs off
  • AC-7: With B unreachable, a session on A completes using local data and surfaces "federation offline for <peer>" once per session
  • AC-8: Every federated request appears in B's federation_audit_log within 1 second
  • AC-9: Scope excluding credentials means credentials are never returned — even via search with matching keywords
  • AC-10: mosaic federation status shows cert expiry, grant status, last success/failure per peer
  • AC-11: Full 3-employee multi-tenant scenario passes with no cross-user leakage
  • AC-12: Two-gateway production deployment (woltje.com ↔ uscllc.com) operational ≥7 days without incident
  • AC-13: All 7 milestones ship as merged PRs with green CI and closed issues

Milestones

# ID Name Status Branch Issue Started Completed
1 FED-M1 Federated tier infrastructure done (12 PRs #470-#481) #460 2026-04-19 2026-04-19
2 FED-M2 Step-CA + grant schema + admin CLI in-progress (decomposition) #461 2026-04-21
3 FED-M3 mTLS handshake + list/get + scope enforcement not-started #462
4 FED-M4 search verb + audit log + rate limit not-started #463
5 FED-M5 Cache + offline degradation + OTEL not-started #464
6 FED-M6 Revocation + auto-renewal + CRL not-started #465
7 FED-M7 Multi-user RBAC hardening + acceptance suite not-started #466

Budget

Milestone Est. tokens Parallelizable?
FED-M1 20K No (foundation)
FED-M2 30K No (needs M1)
FED-M3 40K No (needs M2)
FED-M4 20K No (needs M3)
FED-M5 20K Yes (with M6 after M4)
FED-M6 20K Yes (with M5 after M3)
FED-M7 25K No (needs all)
Total ~175K

Session History

Session Date Runtime Outcome
S1 2026-04-19 claude PRD authored, MILESTONES decomposed, 7 issues filed
S2-S4 2026-04-19 claude FED-M1 complete: 12 tasks (PRs #470-#481) merged; tag fed-v0.1.0-m1

Next Step

FED-M2 active. Decomposition landed in docs/federation/TASKS.md (M2-01..M2-13 code workstream + DEPLOY-01..DEPLOY-05 parallel test-deploy workstream, ~88K total). Tracking issue #482.

Parallel execution plan:

  • CODE workstream: M2-01 (DB migration) starts immediately — sonnet subagent on feat/federation-m2-schema. Then M2-02 → M2-09 sequentially with M2-04/M2-05/M2-06/M2-07 having interleaved CA/storage/grant dependencies.
  • DEPLOY workstream: DEPLOY-01 (image verify) → DEPLOY-02 (stack template) → DEPLOY-03/04 (mos-test-1/-2 deploy) → DEPLOY-05 (TEST-INFRA.md). Gated on Portainer wrapper PR (PORTAINER_INSECURE flag) merging first.
  • Re-converge at M2-10 (E2E test) once both workstreams ready.