Files
stack/tools/federation-harness
Jarvis cb118a53d9 fix(federation): harness CRIT bugs — admin bootstrap auth + peer FK + boot deadline (review remediation)
CRIT-1: Replace nonexistent x-admin-key header with Authorization: Bearer <token>;
add bootstrapAdmin() to call POST /api/bootstrap/setup on each pristine gateway
before any admin-guarded endpoint is used.

CRIT-2: Fix cross-gateway peer FK violation — peer keypair is now created on
Server B first (so the grant FK resolves against B's own federation_peers table),
then Server A creates its own keypair and redeems the enrollment token at B.

HIGH-3: waitForStack() now polls both gateways in parallel via Promise.all, each
with an independent deadline, so a slow gateway-a cannot starve gateway-b's budget.

MED-4: seed() throws immediately with a clear error if scenario !== 'all';
per-variant narrowing deferred to M3-11 with explicit JSDoc note.

Also: remove ADMIN_API_KEY (no such path in AdminGuard) from compose, replace
with ADMIN_BOOTSTRAP_PASSWORD; add BETTER_AUTH_URL production-code limitation
as a TODO in the README.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-23 21:54:46 -05:00
..

Federation Test Harness

Local two-gateway federation test infrastructure for Mosaic Stack M3+.

This harness boots two real gateway instances (gateway-a, gateway-b) on a shared Docker bridge network, each backed by its own Postgres (pgvector) + Valkey, sharing a single Step-CA. It is the test bed for all M3+ federation E2E tests.

Prerequisites

  • Docker with Compose v2 (docker compose version ≥ 2.20)
  • pnpm (for running via repo scripts)
  • infra/step-ca/dev-password must exist (copy from infra/step-ca/dev-password.example)

Network Topology

Host machine
├── localhost:14001  →  gateway-a   (Server A — home / requesting)
├── localhost:14002  →  gateway-b   (Server B — work / serving)
├── localhost:15432  →  postgres-a
├── localhost:15433  →  postgres-b
├── localhost:16379  →  valkey-a
├── localhost:16380  →  valkey-b
└── localhost:19000  →  step-ca     (shared CA)

Docker network: fed-test-net (bridge)
  gateway-a ←──── mTLS ────→ gateway-b
             ↘             ↗
               step-ca

Ports are chosen to avoid collision with the base dev stack (5433, 6380, 14242, 9000).

Starting the Harness

# From repo root
docker compose -f tools/federation-harness/docker-compose.two-gateways.yml up -d

# Wait for all services to be healthy (~60-90s on first boot due to NestJS cold start)
docker compose -f tools/federation-harness/docker-compose.two-gateways.yml ps

Seeding Test Data

The seed script provisions three grant scope variants (A, B, C) and walks the full enrollment flow so Server A ends up with active peers pointing at Server B.

# Assumes stack is already running
pnpm tsx tools/federation-harness/seed.ts

# Or boot + seed in one step
pnpm tsx tools/federation-harness/seed.ts --boot

Scope Variants

Variant Resources Filters Excluded Purpose
A tasks, notes include_personal: true (none) Personal data federation
B tasks include_teams: ['T1'], no personal (none) Team-scoped, no personal
C tasks, credentials include_personal: true credentials Sanity: excluded wins over list

Using from Vitest

import {
  bootHarness,
  tearDownHarness,
  serverA,
  serverB,
  seed,
} from '../../tools/federation-harness/harness.js';
import type { HarnessHandle } from '../../tools/federation-harness/harness.js';

let handle: HarnessHandle;

beforeAll(async () => {
  handle = await bootHarness();
}, 180_000); // allow 3 min for Docker pull + NestJS cold start

afterAll(async () => {
  await tearDownHarness(handle);
});

test('variant A: list tasks returns personal tasks', async () => {
  // NOTE: Only 'all' is supported for now — per-variant narrowing is M3-11.
  const seedResult = await seed(handle, 'all');
  const a = serverA(handle);

  const res = await fetch(`${a.baseUrl}/api/federation/tasks`, {
    headers: { 'x-federation-grant': seedResult.grants.variantA.id },
  });
  expect(res.status).toBe(200);
});

Note: seed() bootstraps a fresh admin user on each gateway via POST /api/bootstrap/setup. Both gateways must have zero users (pristine DB). If either gateway already has users, seed() throws with a clear error. Reset state with docker compose down -v.

The bootHarness() function is idempotent: if both gateways are already healthy, it reuses the running stack and returns ownedStack: false. Tests should not call tearDownHarness when ownedStack is false unless they explicitly want to shut down a shared stack.

Vitest Config (pnpm test:federation)

Add to vitest.config.ts at repo root (or a dedicated config):

// vitest.federation.config.ts
import { defineConfig } from 'vitest/config';

export default defineConfig({
  test: {
    include: ['**/*.federation.test.ts'],
    testTimeout: 60_000,
    hookTimeout: 180_000,
    reporters: ['verbose'],
  },
});

Then add to root package.json:

"test:federation": "vitest run --config vitest.federation.config.ts"

Nuking State

# Remove containers AND volumes (ephemeral state — CA keys, DBs, everything)
docker compose -f tools/federation-harness/docker-compose.two-gateways.yml down -v

On next up, Step-CA re-initialises from scratch and generates new CA keys.

Step-CA Root Certificate

The CA root lives in the fed-harness-step-ca Docker volume at /home/step/certs/root_ca.crt. To extract it to the host:

docker run --rm \
  -v fed-harness-step-ca:/home/step \
  alpine cat /home/step/certs/root_ca.crt > /tmp/fed-harness-root-ca.crt

Troubleshooting

Port conflicts

Default host ports: 14001, 14002, 15432, 15433, 16379, 16380, 19000. Override via environment variables before docker compose up:

GATEWAY_A_HOST_PORT=14101 GATEWAY_B_HOST_PORT=14102 \
  docker compose -f tools/federation-harness/docker-compose.two-gateways.yml up -d

Image pull failures

The gateway image is digest-pinned to:

git.mosaicstack.dev/mosaicstack/stack/gateway@sha256:1069117740e00ccfeba357cae38c43f3729fe5ae702740ce474f6512414d7c02

(sha-9f1a081, post-#491 IMG-FIX)

If the registry is unreachable, Docker will use the locally cached image if present. If no local image exists, the compose up will fail with a pull error. In that case:

  1. Ensure you can reach git.mosaicstack.dev (VPN, DNS, etc.).
  2. Log in: docker login git.mosaicstack.dev
  3. Pull manually: docker pull git.mosaicstack.dev/mosaicstack/stack/gateway@sha256:1069117740e00ccfeba357cae38c43f3729fe5ae702740ce474f6512414d7c02

NestJS cold start

Gateway containers take 4060 seconds to become healthy on first boot (Node.js module resolution + NestJS DI bootstrap). The start_period: 60s in the compose healthcheck covers this. bootHarness() polls for up to 3 minutes.

Step-CA startup

Step-CA initialises on first boot (generates CA keys). This takes ~5-10s. The start_period: 30s in the healthcheck covers it. Both gateways wait for Step-CA to be healthy before starting (depends_on: step-ca: condition: service_healthy).

dev-password missing

The Step-CA container requires infra/step-ca/dev-password to be mounted. Copy the example and set a local password:

cp infra/step-ca/dev-password.example infra/step-ca/dev-password
# Edit the file to set your preferred dev CA password

The file is .gitignored — do not commit it.

Image Digest Note

The gateway image is pinned to sha256:1069117740e00ccfeba357cae38c43f3729fe5ae702740ce474f6512414d7c02 (sha-9f1a081). This is the digest promoted by PR #491 (IMG-FIX). The latest tag is forbidden per Mosaic image policy. When a new gateway build is promoted, update the digest in docker-compose.two-gateways.yml and in this file.

Known Limitations

BETTER_AUTH_URL enrollment URL bug (production code — not fixed here)

apps/gateway/src/federation/federation.controller.ts:145 constructs the enrollment URL using process.env['BETTER_AUTH_URL'] ?? 'http://localhost:14242'. In non-harness deployments (where BETTER_AUTH_URL is not set or points to the web origin rather than the gateway's own base URL) this produces an incorrect enrollment URL that points to the wrong host or port.

The harness works around this by explicitly setting BETTER_AUTH_URL: 'http://gateway-b:3000' in the compose file so the enrollment URL correctly references gateway-b's internal Docker hostname.

TODO: Fix federation.controller.ts to derive the enrollment URL from its own listening address (e.g. GATEWAY_BASE_URL env var or a dedicated FEDERATION_ENROLLMENT_BASE_URL env var) rather than reusing BETTER_AUTH_URL. Tracked as a follow-up to PR #505 — do not bundle with harness changes.

Permanent Infrastructure

This harness is designed to outlive M3 and be reused by M4+ milestone tests. It is not a throwaway scaffold — treat it as production test infrastructure:

  • Keep it idempotent.
  • Do not hardcode test assumptions in the harness layer (put them in tests).
  • Update the seed script when new scope variants are needed.
  • The README and harness should be kept in sync as the federation API evolves.