Files
stack/tools/federation-harness/README.md
Jarvis 4cf9362e75
All checks were successful
ci/woodpecker/pr/ci Pipeline was successful
ci/woodpecker/push/ci Pipeline was successful
fix(federation): harness round-2 — email validation + host-side URL rewrite
- Bug-1: replace whitespace in admin email local-part (was breaking @IsEmail)
- Bug-2: rewrite enrollment URL to use host-accessible base in seed.ts (in-cluster URL not resolvable from host)
- Bug-3: correct README Known Limitations section
- eslint.config.mjs: add tools/federation-harness/*.ts to allowDefaultProject so pre-commit hook can lint harness scripts

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-23 21:54:46 -05:00

9.0 KiB
Raw Blame History

Federation Test Harness

Local two-gateway federation test infrastructure for Mosaic Stack M3+.

This harness boots two real gateway instances (gateway-a, gateway-b) on a shared Docker bridge network, each backed by its own Postgres (pgvector) + Valkey, sharing a single Step-CA. It is the test bed for all M3+ federation E2E tests.

Prerequisites

  • Docker with Compose v2 (docker compose version ≥ 2.20)
  • pnpm (for running via repo scripts)
  • infra/step-ca/dev-password must exist (copy from infra/step-ca/dev-password.example)

Network Topology

Host machine
├── localhost:14001  →  gateway-a   (Server A — home / requesting)
├── localhost:14002  →  gateway-b   (Server B — work / serving)
├── localhost:15432  →  postgres-a
├── localhost:15433  →  postgres-b
├── localhost:16379  →  valkey-a
├── localhost:16380  →  valkey-b
└── localhost:19000  →  step-ca     (shared CA)

Docker network: fed-test-net (bridge)
  gateway-a ←──── mTLS ────→ gateway-b
             ↘             ↗
               step-ca

Ports are chosen to avoid collision with the base dev stack (5433, 6380, 14242, 9000).

Starting the Harness

# From repo root
docker compose -f tools/federation-harness/docker-compose.two-gateways.yml up -d

# Wait for all services to be healthy (~60-90s on first boot due to NestJS cold start)
docker compose -f tools/federation-harness/docker-compose.two-gateways.yml ps

Seeding Test Data

The seed script provisions three grant scope variants (A, B, C) and walks the full enrollment flow so Server A ends up with active peers pointing at Server B.

# Assumes stack is already running
pnpm tsx tools/federation-harness/seed.ts

# Or boot + seed in one step
pnpm tsx tools/federation-harness/seed.ts --boot

Scope Variants

Variant Resources Filters Excluded Purpose
A tasks, notes include_personal: true (none) Personal data federation
B tasks include_teams: ['T1'], no personal (none) Team-scoped, no personal
C tasks, credentials include_personal: true credentials Sanity: excluded wins over list

Using from Vitest

import {
  bootHarness,
  tearDownHarness,
  serverA,
  serverB,
  seed,
} from '../../tools/federation-harness/harness.js';
import type { HarnessHandle } from '../../tools/federation-harness/harness.js';

let handle: HarnessHandle;

beforeAll(async () => {
  handle = await bootHarness();
}, 180_000); // allow 3 min for Docker pull + NestJS cold start

afterAll(async () => {
  await tearDownHarness(handle);
});

test('variant A: list tasks returns personal tasks', async () => {
  // NOTE: Only 'all' is supported for now — per-variant narrowing is M3-11.
  const seedResult = await seed(handle, 'all');
  const a = serverA(handle);

  const res = await fetch(`${a.baseUrl}/api/federation/tasks`, {
    headers: { 'x-federation-grant': seedResult.grants.variantA.id },
  });
  expect(res.status).toBe(200);
});

Note: seed() bootstraps a fresh admin user on each gateway via POST /api/bootstrap/setup. Both gateways must have zero users (pristine DB). If either gateway already has users, seed() throws with a clear error. Reset state with docker compose down -v.

The bootHarness() function is idempotent: if both gateways are already healthy, it reuses the running stack and returns ownedStack: false. Tests should not call tearDownHarness when ownedStack is false unless they explicitly want to shut down a shared stack.

Vitest Config (pnpm test:federation)

Add to vitest.config.ts at repo root (or a dedicated config):

// vitest.federation.config.ts
import { defineConfig } from 'vitest/config';

export default defineConfig({
  test: {
    include: ['**/*.federation.test.ts'],
    testTimeout: 60_000,
    hookTimeout: 180_000,
    reporters: ['verbose'],
  },
});

Then add to root package.json:

"test:federation": "vitest run --config vitest.federation.config.ts"

Nuking State

# Remove containers AND volumes (ephemeral state — CA keys, DBs, everything)
docker compose -f tools/federation-harness/docker-compose.two-gateways.yml down -v

On next up, Step-CA re-initialises from scratch and generates new CA keys.

Step-CA Root Certificate

The CA root lives in the fed-harness-step-ca Docker volume at /home/step/certs/root_ca.crt. To extract it to the host:

docker run --rm \
  -v fed-harness-step-ca:/home/step \
  alpine cat /home/step/certs/root_ca.crt > /tmp/fed-harness-root-ca.crt

Troubleshooting

Port conflicts

Default host ports: 14001, 14002, 15432, 15433, 16379, 16380, 19000. Override via environment variables before docker compose up:

GATEWAY_A_HOST_PORT=14101 GATEWAY_B_HOST_PORT=14102 \
  docker compose -f tools/federation-harness/docker-compose.two-gateways.yml up -d

Image pull failures

The gateway image is digest-pinned to:

git.mosaicstack.dev/mosaicstack/stack/gateway@sha256:1069117740e00ccfeba357cae38c43f3729fe5ae702740ce474f6512414d7c02

(sha-9f1a081, post-#491 IMG-FIX)

If the registry is unreachable, Docker will use the locally cached image if present. If no local image exists, the compose up will fail with a pull error. In that case:

  1. Ensure you can reach git.mosaicstack.dev (VPN, DNS, etc.).
  2. Log in: docker login git.mosaicstack.dev
  3. Pull manually: docker pull git.mosaicstack.dev/mosaicstack/stack/gateway@sha256:1069117740e00ccfeba357cae38c43f3729fe5ae702740ce474f6512414d7c02

NestJS cold start

Gateway containers take 4060 seconds to become healthy on first boot (Node.js module resolution + NestJS DI bootstrap). The start_period: 60s in the compose healthcheck covers this. bootHarness() polls for up to 3 minutes.

Step-CA startup

Step-CA initialises on first boot (generates CA keys). This takes ~5-10s. The start_period: 30s in the healthcheck covers it. Both gateways wait for Step-CA to be healthy before starting (depends_on: step-ca: condition: service_healthy).

dev-password missing

The Step-CA container requires infra/step-ca/dev-password to be mounted. Copy the example and set a local password:

cp infra/step-ca/dev-password.example infra/step-ca/dev-password
# Edit the file to set your preferred dev CA password

The file is .gitignored — do not commit it.

Image Digest Note

The gateway image is pinned to sha256:1069117740e00ccfeba357cae38c43f3729fe5ae702740ce474f6512414d7c02 (sha-9f1a081). This is the digest promoted by PR #491 (IMG-FIX). The latest tag is forbidden per Mosaic image policy. When a new gateway build is promoted, update the digest in docker-compose.two-gateways.yml and in this file.

Known Limitations

BETTER_AUTH_URL enrollment URL bug (upstream production code — not yet fixed)

apps/gateway/src/federation/federation.controller.ts:145 constructs the enrollment URL using process.env['BETTER_AUTH_URL'] ?? 'http://localhost:14242'. This is an upstream bug: BETTER_AUTH_URL is the Better Auth origin (typically the web app), not the gateway's own base URL. In non-harness deployments this produces an enrollment URL pointing to the wrong host or port.

How the harness handles this:

  1. In-cluster calls (container-to-container): The compose file sets BETTER_AUTH_URL: 'http://gateway-b:3000' so the enrollment URL returned by the gateway uses the Docker internal hostname. This lets other containers in the fed-test-net network resolve and reach Server B's enrollment endpoint.

  2. Host-side URL rewrite (seed script): The seed.ts script runs on the host machine where gateway-b is not a resolvable hostname. Before calling fetch(enrollmentUrl, ...), the seed script rewrites the URL: it extracts only the token path segment from enrollmentUrl and reassembles the URL using the host-accessible serverBUrl (default: http://localhost:14002). This lets the seed script redeem enrollment tokens from the host without being affected by the in-cluster hostname in the returned URL.

TODO: Fix federation.controller.ts to derive the enrollment URL from its own listening address (e.g. GATEWAY_BASE_URL env var or a dedicated FEDERATION_ENROLLMENT_BASE_URL env var) rather than reusing BETTER_AUTH_URL. Tracked as a follow-up to PR #505 — do not bundle with harness changes.

Permanent Infrastructure

This harness is designed to outlive M3 and be reused by M4+ milestone tests. It is not a throwaway scaffold — treat it as production test infrastructure:

  • Keep it idempotent.
  • Do not hardcode test assumptions in the harness layer (put them in tests).
  • Update the seed script when new scope variants are needed.
  • The README and harness should be kept in sync as the federation API evolves.