# Federation Test Harness Local two-gateway federation test infrastructure for Mosaic Stack M3+. This harness boots two real gateway instances (`gateway-a`, `gateway-b`) on a shared Docker bridge network, each backed by its own Postgres (pgvector) + Valkey, sharing a single Step-CA. It is the test bed for all M3+ federation E2E tests. ## Prerequisites - Docker with Compose v2 (`docker compose version` ≥ 2.20) - pnpm (for running via repo scripts) - `infra/step-ca/dev-password` must exist (copy from `infra/step-ca/dev-password.example`) ## Network Topology ``` Host machine ├── localhost:14001 → gateway-a (Server A — home / requesting) ├── localhost:14002 → gateway-b (Server B — work / serving) ├── localhost:15432 → postgres-a ├── localhost:15433 → postgres-b ├── localhost:16379 → valkey-a ├── localhost:16380 → valkey-b └── localhost:19000 → step-ca (shared CA) Docker network: fed-test-net (bridge) gateway-a ←──── mTLS ────→ gateway-b ↘ ↗ step-ca ``` Ports are chosen to avoid collision with the base dev stack (5433, 6380, 14242, 9000). ## Starting the Harness ```bash # From repo root docker compose -f tools/federation-harness/docker-compose.two-gateways.yml up -d # Wait for all services to be healthy (~60-90s on first boot due to NestJS cold start) docker compose -f tools/federation-harness/docker-compose.two-gateways.yml ps ``` ## Seeding Test Data The seed script provisions three grant scope variants (A, B, C) and walks the full enrollment flow so Server A ends up with active peers pointing at Server B. ```bash # Assumes stack is already running pnpm tsx tools/federation-harness/seed.ts # Or boot + seed in one step pnpm tsx tools/federation-harness/seed.ts --boot ``` ### Scope Variants | Variant | Resources | Filters | Excluded | Purpose | | ------- | ------------------ | ---------------------------------- | ----------- | ------------------------------- | | A | tasks, notes | include_personal: true | (none) | Personal data federation | | B | tasks | include_teams: ['T1'], no personal | (none) | Team-scoped, no personal | | C | tasks, credentials | include_personal: true | credentials | Sanity: excluded wins over list | ## Using from Vitest ```ts import { bootHarness, tearDownHarness, serverA, serverB, seed, } from '../../tools/federation-harness/harness.js'; import type { HarnessHandle } from '../../tools/federation-harness/harness.js'; let handle: HarnessHandle; beforeAll(async () => { handle = await bootHarness(); }, 180_000); // allow 3 min for Docker pull + NestJS cold start afterAll(async () => { await tearDownHarness(handle); }); test('variant A: list tasks returns personal tasks', async () => { // NOTE: Only 'all' is supported for now — per-variant narrowing is M3-11. const seedResult = await seed(handle, 'all'); const a = serverA(handle); const res = await fetch(`${a.baseUrl}/api/federation/tasks`, { headers: { 'x-federation-grant': seedResult.grants.variantA.id }, }); expect(res.status).toBe(200); }); ``` > **Note:** `seed()` bootstraps a fresh admin user on each gateway via > `POST /api/bootstrap/setup`. Both gateways must have zero users (pristine DB). > If either gateway already has users, `seed()` throws with a clear error. > Reset state with `docker compose down -v`. The `bootHarness()` function is **idempotent**: if both gateways are already healthy, it reuses the running stack and returns `ownedStack: false`. Tests should not call `tearDownHarness` when `ownedStack` is false unless they explicitly want to shut down a shared stack. ## Vitest Config (pnpm test:federation) Add to `vitest.config.ts` at repo root (or a dedicated config): ```ts // vitest.federation.config.ts import { defineConfig } from 'vitest/config'; export default defineConfig({ test: { include: ['**/*.federation.test.ts'], testTimeout: 60_000, hookTimeout: 180_000, reporters: ['verbose'], }, }); ``` Then add to root `package.json`: ```json "test:federation": "vitest run --config vitest.federation.config.ts" ``` ## Nuking State ```bash # Remove containers AND volumes (ephemeral state — CA keys, DBs, everything) docker compose -f tools/federation-harness/docker-compose.two-gateways.yml down -v ``` On next `up`, Step-CA re-initialises from scratch and generates new CA keys. ## Step-CA Root Certificate The CA root lives in the `fed-harness-step-ca` Docker volume at `/home/step/certs/root_ca.crt`. To extract it to the host: ```bash docker run --rm \ -v fed-harness-step-ca:/home/step \ alpine cat /home/step/certs/root_ca.crt > /tmp/fed-harness-root-ca.crt ``` ## Troubleshooting ### Port conflicts Default host ports: 14001, 14002, 15432, 15433, 16379, 16380, 19000. Override via environment variables before `docker compose up`: ```bash GATEWAY_A_HOST_PORT=14101 GATEWAY_B_HOST_PORT=14102 \ docker compose -f tools/federation-harness/docker-compose.two-gateways.yml up -d ``` ### Image pull failures The gateway image is digest-pinned to: ``` git.mosaicstack.dev/mosaicstack/stack/gateway@sha256:1069117740e00ccfeba357cae38c43f3729fe5ae702740ce474f6512414d7c02 ``` (sha-9f1a081, post-#491 IMG-FIX) If the registry is unreachable, Docker will use the locally cached image if present. If no local image exists, the compose up will fail with a pull error. In that case: 1. Ensure you can reach `git.mosaicstack.dev` (VPN, DNS, etc.). 2. Log in: `docker login git.mosaicstack.dev` 3. Pull manually: `docker pull git.mosaicstack.dev/mosaicstack/stack/gateway@sha256:1069117740e00ccfeba357cae38c43f3729fe5ae702740ce474f6512414d7c02` ### NestJS cold start Gateway containers take 40–60 seconds to become healthy on first boot (Node.js module resolution + NestJS DI bootstrap). The `start_period: 60s` in the compose healthcheck covers this. `bootHarness()` polls for up to 3 minutes. ### Step-CA startup Step-CA initialises on first boot (generates CA keys). This takes ~5-10s. The `start_period: 30s` in the healthcheck covers it. Both gateways wait for Step-CA to be healthy before starting (`depends_on: step-ca: condition: service_healthy`). ### dev-password missing The Step-CA container requires `infra/step-ca/dev-password` to be mounted. Copy the example and set a local password: ```bash cp infra/step-ca/dev-password.example infra/step-ca/dev-password # Edit the file to set your preferred dev CA password ``` The file is `.gitignore`d — do not commit it. ## Image Digest Note The gateway image is pinned to `sha256:1069117740e00ccfeba357cae38c43f3729fe5ae702740ce474f6512414d7c02` (sha-9f1a081). This is the digest promoted by PR #491 (IMG-FIX). The `latest` tag is forbidden per Mosaic image policy. When a new gateway build is promoted, update the digest in `docker-compose.two-gateways.yml` and in this file. ## Known Limitations ### BETTER_AUTH_URL enrollment URL bug (production code — not fixed here) `apps/gateway/src/federation/federation.controller.ts:145` constructs the enrollment URL using `process.env['BETTER_AUTH_URL'] ?? 'http://localhost:14242'`. In non-harness deployments (where `BETTER_AUTH_URL` is not set or points to the web origin rather than the gateway's own base URL) this produces an incorrect enrollment URL that points to the wrong host or port. The harness works around this by explicitly setting `BETTER_AUTH_URL: 'http://gateway-b:3000'` in the compose file so the enrollment URL correctly references gateway-b's internal Docker hostname. **TODO:** Fix `federation.controller.ts` to derive the enrollment URL from its own listening address (e.g. `GATEWAY_BASE_URL` env var or a dedicated `FEDERATION_ENROLLMENT_BASE_URL` env var) rather than reusing `BETTER_AUTH_URL`. Tracked as a follow-up to PR #505 — do not bundle with harness changes. ## Permanent Infrastructure This harness is designed to outlive M3 and be reused by M4+ milestone tests. It is not a throwaway scaffold — treat it as production test infrastructure: - Keep it idempotent. - Do not hardcode test assumptions in the harness layer (put them in tests). - Update the seed script when new scope variants are needed. - The README and harness should be kept in sync as the federation API evolves.