Files
stack/docs/federation/TASKS.md
jason.woltje e64ddd2c1c
Some checks failed
ci/woodpecker/push/ci Pipeline failed
ci/woodpecker/push/publish Pipeline was successful
docs(federation): M3 mission planning — 14-task decomposition (#504)
2026-04-24 01:13:40 +00:00

31 KiB
Raw Blame History

Tasks — Federation v1

Single-writer: orchestrator only. Workers read but never modify.

Mission: federation-v1-20260419 Schema: | id | status | description | issue | agent | branch | depends_on | estimate | notes | Status values: not-started | in-progress | done | blocked | failed | needs-qa Agent values: codex | glm-5.1 | haiku | sonnet | opus | (auto)

Scope of this file: M1 is fully decomposed below. M2M7 are placeholders pending each milestone's entry into active planning — the orchestrator expands them one milestone at a time to avoid speculative decomposition of work whose shape will depend on what M1 surfaces.


Milestone 1 — Federated tier infrastructure (FED-M1)

Goal: Gateway runs in federated tier with containerized PG+pgvector+Valkey. No federation logic yet. Existing standalone behavior does not regress.

id status description issue agent branch depends_on estimate notes
FED-M1-01 done Extend mosaic.config.json schema: add "federated" to tier enum in validator + TS types. Keep local and standalone working. Update schema docs/README where referenced. #460 sonnet feat/federation-m1-tier-config 4K Shipped in PR #470. Renamed teamstandalone; added team deprecation alias; added DEFAULT_FEDERATED_CONFIG.
FED-M1-02 done Author docker-compose.federated.yml as an overlay profile: Postgres 17 + pgvector extension (port 5433), Valkey (6380), named volumes, healthchecks. Compose-up should boot cleanly on a clean machine. #460 sonnet feat/federation-m1-compose FED-M1-01 5K Shipped in PR #471. Overlay defines postgres-federated/valkey-federated, profile-gated, with pg-init for pgvector extension.
FED-M1-03 done Add pgvector support to packages/storage/src/adapters/postgres.ts: create extension on init (idempotent), expose vector column type in schema helpers. No adapter changes for non-federated tiers. #460 sonnet feat/federation-m1-pgvector FED-M1-02 8K Shipped in PR #472. enableVector flag on postgres StorageConfig; idempotent CREATE EXTENSION before migrations.
FED-M1-04 done Implement apps/gateway/src/bootstrap/tier-detector.ts: reads config, asserts PG/Valkey/pgvector reachable for federated, fail-fast with actionable error message on failure. Unit tests for each failure mode. #460 sonnet feat/federation-m1-detector FED-M1-03 8K Shipped in PR #473. 12 tests; 5s timeouts on probes; pgvector library/permission discrimination; rejects non-bullmq for federated.
FED-M1-05 done Write scripts/migrate-to-federated.ts: one-way migration from local (PGlite) / standalone (PG without pgvector) → federated. Dumps, transforms, loads; dry-run + confirm UX. Idempotent on re-run. #460 sonnet feat/federation-m1-migrate FED-M1-04 10K Shipped in PR #474. mosaic storage migrate-tier; DrizzleMigrationSource (corrects P0 found in review); 32 tests; idempotent.
FED-M1-06 done Update mosaic doctor: report current tier, required services, actual health per service, pgvector presence, overall green/yellow/red. Machine-readable JSON output flag for CI use. #460 sonnet feat/federation-m1-doctor FED-M1-04 6K Shipped in PR #475 as mosaic gateway doctor. Probes lifted to @mosaicstack/storage; structural TierConfig breaks dep cycle.
FED-M1-07 done Integration test: gateway boots in federated tier with docker-compose federated profile; refuses to boot when PG unreachable (asserts fail-fast); pgvector extension query succeeds. #460 sonnet feat/federation-m1-integration FED-M1-04 8K Shipped in PR #476. 3 test files, 4 tests, gated by FEDERATED_INTEGRATION=1; reserved-port helper avoids host collisions.
FED-M1-08 done Integration test for migration script: seed a local PGlite with representative data (tasks, notes, users, teams), run migration, assert row counts + key samples equal on federated PG. #460 sonnet feat/federation-m1-migrate-test FED-M1-05 6K Shipped in PR #477. Caught P0 in M1-05 (camelCase→snake_case) missed by mocked unit tests; fix in same PR.
FED-M1-09 done Standalone regression: full agent-session E2E on existing standalone tier with a gateway built from this branch. Must pass without referencing any federation module. #460 sonnet feat/federation-m1-regression FED-M1-07 4K Clean canary. 351 gateway tests + 85 storage unit tests + full pnpm test all green; only FEDERATED_INTEGRATION-gated tests skip.
FED-M1-10 done Code review pass: security-focused on the migration script (data-at-rest during migration) + tier detector (error-message sensitivity leakage). Independent reviewer, not authors of tasks 01-09. #460 sonnet feat/federation-m1-security-review FED-M1-09 8K 2 review rounds caught 7 issues: credential leak in pg/valkey/pgvector errors + redact-error util; missing advisory lock; SKIP_TABLES rationale.
FED-M1-11 done Docs update: docs/federation/ operator notes for tier setup; README blurb on federated tier; docs/guides/ entry for migration. Do NOT touch runbook yet (deferred to FED-M7). #460 haiku feat/federation-m1-docs FED-M1-10 4K Shipped: docs/federation/SETUP.md (119 lines), docs/guides/migrate-tier.md (147 lines), README Configuration blurb.
FED-M1-12 done PR, CI green, merge to main, close #460. #460 sonnet feat/federation-m1-close FED-M1-11 3K M1 closed. PRs #470-#480 merged across 11 tasks. Issue #460 closed; release tag fed-v0.1.0-m1 published.

M1 total estimate: ~74K tokens (over-budget vs 20K PRD estimate — explanation below)

Why over-budget: PRD's 20K estimate reflected implementation complexity only. The per-task breakdown includes tests, review, and docs as separate tasks per the delivery cycle, which catches the real cost. The final per-milestone budgets in MISSION-MANIFEST will be updated after M1 completes with actuals.


Pre-M2 — Test deployment infrastructure (FED-M2-DEPLOY)

Goal: Two federated-tier gateways stood up on Portainer at mos-test-1.woltje.com and mos-test-2.woltje.com running the M1 release (gateway:fed-v0.1.0-m1). This is the test bed for M2 enrollment work and the M3 federation E2E harness. No federation logic exercised yet — pure infrastructure validation.

Why now: M2 enrollment requires a real second gateway to test peer-add flows; standing the test hosts up before M2 code lands gives both code and deployment streams a fast feedback loop.

Parallelizable: This workstream runs in parallel with the M2 code workstream (M2-01 → M2-13). They re-converge at M2-10 (E2E test).

Tracking issue: #482.

id status description issue agent branch depends_on estimate notes
FED-M2-DEPLOY-01 done Verify gateway:fed-v0.1.0-m1 image was published by .woodpecker/publish.yml on tag push; if not, investigate and remediate. Document image URI in deployment artifact. #482 sonnet (verified inline, no PR) 2K Tag exists; digest sha256:9b72e202a9eecc27d31920b87b475b9e96e483c0323acc57856be4b1355db1ec captured for digest-pinned deploys.
FED-M2-DEPLOY-02 done Author Portainer git-stack compose file deploy/portainer/federated-test.stack.yml (gateway + PG-pgvector + Valkey, env-driven). Use immutable tag, not latest. #482 sonnet feat/federation-deploy-stack-template DEPLOY-01 5K Shipped in PR #485. Digest-pinned. Env: STACK_NAME, HOST_FQDN, POSTGRES_PASSWORD, BETTER_AUTH_SECRET, BETTER_AUTH_URL.
FED-M2-DEPLOY-IMG-FIX in-progress Gateway image runtime broken (ERR_MODULE_NOT_FOUND for dotenv); Dockerfile copies .pnpm/ store but not apps/gateway/node_modules symlinks. Switch to pnpm deploy for self-contained runtime. #482 sonnet (subagent in flight) DEPLOY-02 4K Subagent a78a9ab0ddae91fbc in flight. Triggers Kaniko rebuild on merge; capture new digest; bump stack template in follow-up PR before redeploy.
FED-M2-DEPLOY-03 blocked Deploy stack to mos-test-1.woltje.com via ~/.config/mosaic/tools/portainer/. Verify M1 acceptance: federated-tier boot succeeds; mosaic gateway doctor --json returns green; pgvector vector(3) round-trip works. #482 sonnet feat/federation-deploy-test-1 IMG-FIX 3K Stack created on Portainer endpoint 3 (Swarm local), but blocked on image fix. Container fails on boot until IMG-FIX merges + redeploy.
FED-M2-DEPLOY-04 blocked Deploy stack to mos-test-2.woltje.com via Portainer wrapper. Same M1 acceptance probes as DEPLOY-03. #482 sonnet feat/federation-deploy-test-2 IMG-FIX 3K Same status as DEPLOY-03. Stack created; blocked on image fix.
FED-M2-DEPLOY-05 not-started Document deployment in docs/federation/TEST-INFRA.md: hosts, image tags, secrets sourcing, redeploy procedure, teardown. Update MISSION-MANIFEST with deployment status. #482 haiku feat/federation-deploy-docs DEPLOY-03,04 3K Operator-facing doc; mentions but does not duplicate tools/portainer/README.md.

Deploy workstream estimate: ~16K tokens


Milestone 2 — Step-CA + grant schema + admin CLI (FED-M2)

Goal: An admin can create a federation grant; counterparty enrolls; cert is signed by Step-CA with SAN OIDs for grantId + subjectUserId. No runtime federation traffic flows yet (that's M3).

id status description issue agent branch depends_on estimate notes
FED-M2-01 done DB migration: federation_grants, federation_peers, federation_audit_log tables + enum types (grant_status, peer_state). Drizzle schema + migration generation; migration tests. #461 sonnet feat/federation-m2-schema 5K Shipped in PR #486. DESC indexes + reserved cols added after first review; migration tests green.
FED-M2-02 done Add Step-CA sidecar to docker-compose.federated.yml: official smallstep/step-ca image, persistent CA volume, JWK provisioner config baked into init script. #461 sonnet feat/federation-m2-stepca DEPLOY-02 4K Shipped in PR #494. Profile-gated under federated; CA password from secret; dev compose uses dev-only password file.
FED-M2-03 done Scope JSON schema + validator: resources allowlist, excluded_resources, include_teams, include_personal, max_rows_per_query. Vitest unit tests for valid + invalid scopes. #461 sonnet feat/federation-m2-scope-schema 4K Shipped in PR #496 (bundled with grants service). Validator independent of CA; reusable from grant CRUD + M3 scope enforcement.
FED-M2-04 done apps/gateway/src/federation/ca.service.ts: Step-CA client (CSR submission, OID-bearing cert retrieval). Mocked + integration tests against real Step-CA container. #461 sonnet feat/federation-m2-ca-service M2-02 6K Shipped in PR #494. SAN OIDs 1.3.6.1.4.1.99999.1 (grantId) + 1.3.6.1.4.1.99999.2 (subjectUserId); integration test asserts both OIDs present in issued cert.
FED-M2-05 done Sealed storage for client_key_pem reusing existing provider_credentials sealing key. Tests prove DB-at-rest is ciphertext, not PEM. Key rotation path documented (deferred impl). #461 sonnet feat/federation-m2-key-sealing M2-01 5K Shipped in PR #495. Crypto seam isolated; tests confirm ciphertext-at-rest; key rotation deferred to M6.
FED-M2-06 done grants.service.ts: CRUD + status transitions (pendingactiverevoked); integrates M2-03 (scope) + M2-05 (sealing). Unit tests cover all transitions including invalid ones. #461 sonnet feat/federation-m2-grants-service M2-03, M2-05 6K Shipped in PR #496. All status transitions covered; invalid transition tests green; revocation handler deferred to M6.
FED-M2-07 done enrollment.controller.ts: short-lived single-use token endpoint; CSR signing; updates grant pendingactive; emits enrollment audit (table-only write, M4 tightens). #461 sonnet feat/federation-m2-enrollment M2-04, M2-06 6K Shipped in PR #497. Tokens single-use with 410 on replay; TTL 15min; rate-limited at request layer.
FED-M2-08 done Admin CLI: mosaic federation grant create/list/show + peer add/list. Integration with grants.service (no API duplication). Help output + machine-readable JSON option. #461 sonnet feat/federation-m2-cli M2-06, M2-07 7K Shipped in PR #498. peer add <enrollment-url> client-side flow; JSON output flag; admin REST controller co-shipped.
FED-M2-09 done Integration tests covering MILESTONES.md M2 acceptance tests #1, #2, #3, #5, #7, #8 (single-gateway suite). Real Step-CA container; vitest profile gated by FEDERATED_INTEGRATION=1. #461 sonnet feat/federation-m2-integration M2-08 8K Shipped in PR #499. All 6 acceptance tests green; gated by FEDERATED_INTEGRATION=1.
FED-M2-10 done E2E test against deployed mos-test-1 + mos-test-2 (or local two-gateway docker-compose if Portainer not ready): MILESTONES test #6 peer add yields active peer record with valid cert + key. #461 sonnet feat/federation-m2-e2e M2-08, DEPLOY-04 6K Shipped in PR #500. Local two-gateway docker-compose path used; peer add yields active peer with valid cert + sealed key.
FED-M2-11 done Independent security review (sonnet, not author of M2-04/05/06/07): focus on single-use token replay, sealing leak surfaces, OID match enforcement, scope schema bypass paths. #461 sonnet feat/federation-m2-security-review M2-10 8K Shipped in PR #501. Two-round review; enrollment-token replay, OID-spoofing CSR, and key leak in error messages all verified and hardened.
FED-M2-12 done Docs update: docs/federation/SETUP.md Step-CA section; new docs/federation/ADMIN-CLI.md with grant/peer commands; scope schema reference; OID registration note. Runbook still M7-deferred. #461 haiku feat/federation-m2-docs M2-11 4K Shipped in PR #502. SETUP.md CA bootstrap section added; ADMIN-CLI.md created; scope schema reference and OID note included.
FED-M2-13 done PR aggregate close, CI green, merge to main, close #461. Release tag fed-v0.2.0-m2. Mark deploy stream complete. Update mission manifest M2 row. #461 sonnet chore/federation-m2-close M2-12 3K Release tag fed-v0.2.0-m2 created; issue #461 closed; all M2 PRs #494#502 merged to main.

M2 code workstream estimate: ~72K tokens (vs MILESTONES.md 30K — same over-budget pattern as M1, where per-task breakdown including tests/review/docs catches the real cost).

Deploy + code combined: ~88K tokens.

Milestone 3 — mTLS handshake + list/get + scope enforcement (FED-M3)

Goal: Two federated gateways exchange real data over mTLS. Inbound requests pass through cert validation → grant lookup → scope enforcement → native RBAC → response. list, get, and capabilities verbs land. The federation E2E harness (tools/federation-harness/) is the new permanent test bed for M3+ and is gated on every milestone going forward.

Critical trust boundary. Every 401/403 path needs a test. Code review is non-negotiable; M3-12 budgets two review rounds.

Tracking issue: #462.

id status description issue agent branch depends_on estimate notes
FED-M3-01 not-started packages/types/src/federation/ — request/response DTOs for list, get, capabilities verbs. Wire-format zod schemas + inferred TS types. Includes FederationRequest, FederationListResponse<T>, FederationGetResponse<T>, FederationCapabilitiesResponse, error envelope, _source tag. #462 sonnet feat/federation-m3-types 4K Reusable from gateway server + client + harness. Pure types — no I/O, no NestJS.
FED-M3-02 not-started tools/federation-harness/ scaffold: docker-compose.two-gateways.yml (Server A + Server B + step-CA), seed.ts (provisions grants, peers, sample tasks/notes/credentials per scope variant), harness.ts helper (boots stack, returns typed clients). README documents harness use. #462 sonnet feat/federation-m3-harness DEPLOY-04 (soft) 8K Falls back to local docker-compose if mos-test-1/-2 not yet redeployed (DEPLOY chain blocked on IMG-FIX). Permanent test infra used by M3+.
FED-M3-03 not-started apps/gateway/src/federation/server/federation-auth.guard.ts (NestJS guard). Validates inbound client cert from Fastify TLS context, extracts grantId + subjectUserId from custom OIDs, loads grant from DB, asserts status='active', attaches FederationContext to request. #462 sonnet feat/federation-m3-auth-guard M3-01 8K Reuses OID parsing logic mirrored from ca.service.ts post-issuance verification. 401 on malformed/missing OIDs; 403 on revoked/expired/missing grant.
FED-M3-04 not-started apps/gateway/src/federation/server/scope.service.ts. Pipeline: (1) resource allowlist + excluded check, (2) native RBAC eval as subjectUserId, (3) scope filter intersection (include_teams, include_personal), (4) max_rows_per_query cap. Pure service — DB calls injected. #462 sonnet feat/federation-m3-scope-service M3-01 10K Hardest correctness target in M3. Reuses parseFederationScope (M2-03). Returns either { allowed: true, filter } or structured deny reason for audit.
FED-M3-05 not-started apps/gateway/src/federation/server/verbs/list.controller.ts. Wires AuthGuard → ScopeService → tasks/notes/memory query layer; applies row cap; tags rows with _source. Resource selector via path param. #462 sonnet feat/federation-m3-verb-list M3-03, M3-04 6K Routes: POST /api/federation/v1/list/:resource. No body persistence. Audit write deferred to M4.
FED-M3-06 not-started apps/gateway/src/federation/server/verbs/get.controller.ts. Single-resource fetch by id; same pipeline as list. 404 on not-found, 403 on RBAC/scope deny — both audited the same way. #462 sonnet feat/federation-m3-verb-get M3-03, M3-04 6K POST /api/federation/v1/get/:resource/:id. Mirrors list controller patterns.
FED-M3-07 not-started apps/gateway/src/federation/server/verbs/capabilities.controller.ts. Read-only enumeration: returns { resources, excluded_resources, max_rows_per_query, supported_verbs } derived from grant scope. Always allowed for an active grant — no RBAC eval. #462 sonnet feat/federation-m3-verb-capabilities M3-03 4K GET /api/federation/v1/capabilities. Smallest verb; useful sanity check that mTLS + auth guard work end-to-end.
FED-M3-08 not-started apps/gateway/src/federation/client/federation-client.service.ts. Outbound mTLS dialer: picks (certPem, sealed clientKey) from federation_peers, unwraps key, builds undici Agent with mTLS, calls peer verb, parses typed response, wraps non-2xx into FederationClientError. #462 sonnet feat/federation-m3-client M3-01 8K Independent of server stream — can land in parallel with M3-03/04. Cert/key cached per-peer; flushed by future M5/M6 logic.
FED-M3-09 not-started apps/gateway/src/federation/client/query-source.service.ts. Accepts source: "local" | "federated:<host>" | "all" from gateway query layer; for "all" fans out to local + each peer in parallel; merges results; tags every row with _source. #462 sonnet feat/federation-m3-query-source M3-08 8K Per-peer failure surfaces as _partial: true in response, not hard failure (sets up M5 offline UX). M5 adds caching + circuit breaker on top.
FED-M3-10 not-started Integration tests for MILESTONES.md M3 acceptance #6 (malformed OIDs → 401; valid cert + revoked grant → 403) and #7 (max_rows_per_query cap). Real PG, mocked TLS context (Fastify req shim). #462 sonnet feat/federation-m3-integration M3-05, M3-06 8K Vitest profile gated by FEDERATED_INTEGRATION=1. Single-gateway suite; no harness required.
FED-M3-11 not-started E2E tests for MILESTONES.md M3 acceptance #1, #2, #3, #4, #5, #8, #9, #10 (8 cases). Uses harness from M3-02; two real gateways, real Step-CA, real mTLS. Each test asserts both happy-path response and audit/no-persist invariants. #462 sonnet feat/federation-m3-e2e M3-02, M3-09 12K Largest single task. Each acceptance gets its own it(...) for clear failure attribution.
FED-M3-12 not-started Independent security review (sonnet, not author of M3-03/04/05/06/07/08/09): focus on cert-SAN spoofing, OID extraction edge cases, scope-bypass via filter manipulation, RBAC-bypass via subjectUser swap, response leakage when scope deny. #462 sonnet feat/federation-m3-security-review M3-11 10K Two review rounds budgeted. PRD requires explicit test for every 401/403 path — review verifies coverage.
FED-M3-13 not-started Docs update: docs/federation/SETUP.md mTLS handshake section, new docs/federation/HARNESS.md for federation-harness usage, OID reference table in SETUP.md, scope enforcement pipeline diagram. Runbook still M7-deferred. #462 haiku feat/federation-m3-docs M3-12 5K One ASCII diagram for the auth-guard → scope → RBAC pipeline; helps future reviewers reason about denial paths.
FED-M3-14 not-started PR aggregate close, CI green, merge to main, close #462. Release tag fed-v0.3.0-m3. Update mission manifest M3 row → done; M4 row → in-progress when work begins. #462 sonnet chore/federation-m3-close M3-13 3K Same close pattern as M1-12 / M2-13.

M3 estimate: ~100K tokens (vs MILESTONES.md 40K — same per-task breakdown pattern as M1/M2: tests, review, and docs split out from implementation cost). Largest milestone in the federation mission.

Parallelization opportunities:

  • M3-08 (client) can land in parallel with M3-03/M3-04 (server pipeline) — they only share DTOs from M3-01.
  • M3-02 (harness) can land in parallel with everything except M3-11.
  • M3-05/M3-06/M3-07 (verbs) are independent of each other once M3-03/M3-04 land.

Test bed fallback: If mos-test-1.woltje.com / mos-test-2.woltje.com are still blocked on FED-M2-DEPLOY-IMG-FIX when M3-11 is ready to run, the harness's local docker-compose.two-gateways.yml is a sufficient stand-in. Production-host validation moves to M7 acceptance suite (PRD AC-12).

Milestone 4 — search + audit + rate limit (FED-M4)

Deferred. Issue #463.

Milestone 5 — cache + offline + OTEL (FED-M5)

Deferred. Issue #464.

Milestone 6 — revocation + auto-renewal + CRL (FED-M6)

Deferred. Issue #465.

Milestone 7 — multi-user hardening + acceptance suite (FED-M7)

Deferred. Issue #466.


Execution Notes

Agent assignment rationale:

  • codex for most implementation tasks (OpenAI credit pool preferred for feature code)
  • sonnet for tests (pattern-based, moderate complexity), doctor work (cross-cutting), and independent code review
  • haiku for docs and the standalone regression canary (cheapest tier for mechanical/verification work)
  • No opus in M1 — save for cross-cutting architecture decisions if they surface later

Branch strategy: Each task gets its own feature branch off main. Tasks within a milestone merge in dependency order. Final aggregate PR (FED-M1-12) isn't a branch of its own — it's the merge of the last upstream task that closes the issue.

Queue guard: Every push and every merge in this mission must run ~/.config/mosaic/tools/git/ci-queue-wait.sh --purpose push|merge per Mosaic hard gate #6.