# Federation Test Harness Local two-gateway federation test infrastructure for Mosaic Stack M3+. This harness boots two real gateway instances (`gateway-a`, `gateway-b`) on a shared Docker bridge network, each backed by its own Postgres (pgvector) + Valkey, sharing a single Step-CA. It is the test bed for all M3+ federation E2E tests. ## Prerequisites - Docker with Compose v2 (`docker compose version` ≥ 2.20) - pnpm (for running via repo scripts) - `infra/step-ca/dev-password` must exist (copy from `infra/step-ca/dev-password.example`) ## Network Topology ``` Host machine ├── localhost:14001 → gateway-a (Server A — home / requesting) ├── localhost:14002 → gateway-b (Server B — work / serving) ├── localhost:15432 → postgres-a ├── localhost:15433 → postgres-b ├── localhost:16379 → valkey-a ├── localhost:16380 → valkey-b └── localhost:19000 → step-ca (shared CA) Docker network: fed-test-net (bridge) gateway-a ←──── mTLS ────→ gateway-b ↘ ↗ step-ca ``` Ports are chosen to avoid collision with the base dev stack (5433, 6380, 14242, 9000). ## Starting the Harness ```bash # From repo root docker compose -f tools/federation-harness/docker-compose.two-gateways.yml up -d # Wait for all services to be healthy (~60-90s on first boot due to NestJS cold start) docker compose -f tools/federation-harness/docker-compose.two-gateways.yml ps ``` ## Seeding Test Data The seed script provisions three grant scope variants (A, B, C) and walks the full enrollment flow so Server A ends up with active peers pointing at Server B. ```bash # Assumes stack is already running pnpm tsx tools/federation-harness/seed.ts # Or boot + seed in one step pnpm tsx tools/federation-harness/seed.ts --boot ``` ### Scope Variants | Variant | Resources | Filters | Excluded | Purpose | | ------- | ------------------ | ---------------------------------- | ----------- | ------------------------------- | | A | tasks, notes | include_personal: true | (none) | Personal data federation | | B | tasks | include_teams: ['T1'], no personal | (none) | Team-scoped, no personal | | C | tasks, credentials | include_personal: true | credentials | Sanity: excluded wins over list | ## Using from Vitest ```ts import { bootHarness, tearDownHarness, serverA, serverB, seed, } from '../../tools/federation-harness/harness.js'; import type { HarnessHandle } from '../../tools/federation-harness/harness.js'; let handle: HarnessHandle; beforeAll(async () => { handle = await bootHarness(); }, 180_000); // allow 3 min for Docker pull + NestJS cold start afterAll(async () => { await tearDownHarness(handle); }); test('variant A: list tasks returns personal tasks', async () => { // NOTE: Only 'all' is supported for now — per-variant narrowing is M3-11. const seedResult = await seed(handle, 'all'); const a = serverA(handle); const res = await fetch(`${a.baseUrl}/api/federation/tasks`, { headers: { 'x-federation-grant': seedResult.grants.variantA.id }, }); expect(res.status).toBe(200); }); ``` > **Note:** `seed()` bootstraps a fresh admin user on each gateway via > `POST /api/bootstrap/setup`. Both gateways must have zero users (pristine DB). > If either gateway already has users, `seed()` throws with a clear error. > Reset state with `docker compose down -v`. The `bootHarness()` function is **idempotent**: if both gateways are already healthy, it reuses the running stack and returns `ownedStack: false`. Tests should not call `tearDownHarness` when `ownedStack` is false unless they explicitly want to shut down a shared stack. ## Vitest Config (pnpm test:federation) Add to `vitest.config.ts` at repo root (or a dedicated config): ```ts // vitest.federation.config.ts import { defineConfig } from 'vitest/config'; export default defineConfig({ test: { include: ['**/*.federation.test.ts'], testTimeout: 60_000, hookTimeout: 180_000, reporters: ['verbose'], }, }); ``` Then add to root `package.json`: ```json "test:federation": "vitest run --config vitest.federation.config.ts" ``` ## Nuking State ```bash # Remove containers AND volumes (ephemeral state — CA keys, DBs, everything) docker compose -f tools/federation-harness/docker-compose.two-gateways.yml down -v ``` On next `up`, Step-CA re-initialises from scratch and generates new CA keys. ## Step-CA Root Certificate The CA root lives in the `fed-harness-step-ca` Docker volume at `/home/step/certs/root_ca.crt`. To extract it to the host: ```bash docker run --rm \ -v fed-harness-step-ca:/home/step \ alpine cat /home/step/certs/root_ca.crt > /tmp/fed-harness-root-ca.crt ``` ## Troubleshooting ### Port conflicts Default host ports: 14001, 14002, 15432, 15433, 16379, 16380, 19000. Override via environment variables before `docker compose up`: ```bash GATEWAY_A_HOST_PORT=14101 GATEWAY_B_HOST_PORT=14102 \ docker compose -f tools/federation-harness/docker-compose.two-gateways.yml up -d ``` ### Image pull failures The gateway image is digest-pinned to: ``` git.mosaicstack.dev/mosaicstack/stack/gateway@sha256:1069117740e00ccfeba357cae38c43f3729fe5ae702740ce474f6512414d7c02 ``` (sha-9f1a081, post-#491 IMG-FIX) If the registry is unreachable, Docker will use the locally cached image if present. If no local image exists, the compose up will fail with a pull error. In that case: 1. Ensure you can reach `git.mosaicstack.dev` (VPN, DNS, etc.). 2. Log in: `docker login git.mosaicstack.dev` 3. Pull manually: `docker pull git.mosaicstack.dev/mosaicstack/stack/gateway@sha256:1069117740e00ccfeba357cae38c43f3729fe5ae702740ce474f6512414d7c02` ### NestJS cold start Gateway containers take 40–60 seconds to become healthy on first boot (Node.js module resolution + NestJS DI bootstrap). The `start_period: 60s` in the compose healthcheck covers this. `bootHarness()` polls for up to 3 minutes. ### Step-CA startup Step-CA initialises on first boot (generates CA keys). This takes ~5-10s. The `start_period: 30s` in the healthcheck covers it. Both gateways wait for Step-CA to be healthy before starting (`depends_on: step-ca: condition: service_healthy`). ### dev-password missing The Step-CA container requires `infra/step-ca/dev-password` to be mounted. Copy the example and set a local password: ```bash cp infra/step-ca/dev-password.example infra/step-ca/dev-password # Edit the file to set your preferred dev CA password ``` The file is `.gitignore`d — do not commit it. ## Image Digest Note The gateway image is pinned to `sha256:1069117740e00ccfeba357cae38c43f3729fe5ae702740ce474f6512414d7c02` (sha-9f1a081). This is the digest promoted by PR #491 (IMG-FIX). The `latest` tag is forbidden per Mosaic image policy. When a new gateway build is promoted, update the digest in `docker-compose.two-gateways.yml` and in this file. ## Known Limitations ### BETTER_AUTH_URL enrollment URL bug (upstream production code — not yet fixed) `apps/gateway/src/federation/federation.controller.ts:145` constructs the enrollment URL using `process.env['BETTER_AUTH_URL'] ?? 'http://localhost:14242'`. This is an upstream bug: `BETTER_AUTH_URL` is the Better Auth origin (typically the web app), not the gateway's own base URL. In non-harness deployments this produces an enrollment URL pointing to the wrong host or port. **How the harness handles this:** 1. **In-cluster calls (container-to-container):** The compose file sets `BETTER_AUTH_URL: 'http://gateway-b:3000'` so the enrollment URL returned by the gateway uses the Docker internal hostname. This lets other containers in the `fed-test-net` network resolve and reach Server B's enrollment endpoint. 2. **Host-side URL rewrite (seed script):** The `seed.ts` script runs on the host machine where `gateway-b` is not a resolvable hostname. Before calling `fetch(enrollmentUrl, ...)`, the seed script rewrites the URL: it extracts only the token path segment from `enrollmentUrl` and reassembles the URL using the host-accessible `serverBUrl` (default: `http://localhost:14002`). This lets the seed script redeem enrollment tokens from the host without being affected by the in-cluster hostname in the returned URL. **TODO:** Fix `federation.controller.ts` to derive the enrollment URL from its own listening address (e.g. `GATEWAY_BASE_URL` env var or a dedicated `FEDERATION_ENROLLMENT_BASE_URL` env var) rather than reusing `BETTER_AUTH_URL`. Tracked as a follow-up to PR #505 — do not bundle with harness changes. ## Permanent Infrastructure This harness is designed to outlive M3 and be reused by M4+ milestone tests. It is not a throwaway scaffold — treat it as production test infrastructure: - Keep it idempotent. - Do not hardcode test assumptions in the harness layer (put them in tests). - Update the seed script when new scope variants are needed. - The README and harness should be kept in sync as the federation API evolves.