CRIT-1: Replace nonexistent x-admin-key header with Authorization: Bearer <token>; add bootstrapAdmin() to call POST /api/bootstrap/setup on each pristine gateway before any admin-guarded endpoint is used. CRIT-2: Fix cross-gateway peer FK violation — peer keypair is now created on Server B first (so the grant FK resolves against B's own federation_peers table), then Server A creates its own keypair and redeems the enrollment token at B. HIGH-3: waitForStack() now polls both gateways in parallel via Promise.all, each with an independent deadline, so a slow gateway-a cannot starve gateway-b's budget. MED-4: seed() throws immediately with a clear error if scenario !== 'all'; per-variant narrowing deferred to M3-11 with explicit JSDoc note. Also: remove ADMIN_API_KEY (no such path in AdminGuard) from compose, replace with ADMIN_BOOTSTRAP_PASSWORD; add BETTER_AUTH_URL production-code limitation as a TODO in the README. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
8.3 KiB
Federation Test Harness
Local two-gateway federation test infrastructure for Mosaic Stack M3+.
This harness boots two real gateway instances (gateway-a, gateway-b) on a
shared Docker bridge network, each backed by its own Postgres (pgvector) +
Valkey, sharing a single Step-CA. It is the test bed for all M3+ federation
E2E tests.
Prerequisites
- Docker with Compose v2 (
docker compose version≥ 2.20) - pnpm (for running via repo scripts)
infra/step-ca/dev-passwordmust exist (copy frominfra/step-ca/dev-password.example)
Network Topology
Host machine
├── localhost:14001 → gateway-a (Server A — home / requesting)
├── localhost:14002 → gateway-b (Server B — work / serving)
├── localhost:15432 → postgres-a
├── localhost:15433 → postgres-b
├── localhost:16379 → valkey-a
├── localhost:16380 → valkey-b
└── localhost:19000 → step-ca (shared CA)
Docker network: fed-test-net (bridge)
gateway-a ←──── mTLS ────→ gateway-b
↘ ↗
step-ca
Ports are chosen to avoid collision with the base dev stack (5433, 6380, 14242, 9000).
Starting the Harness
# From repo root
docker compose -f tools/federation-harness/docker-compose.two-gateways.yml up -d
# Wait for all services to be healthy (~60-90s on first boot due to NestJS cold start)
docker compose -f tools/federation-harness/docker-compose.two-gateways.yml ps
Seeding Test Data
The seed script provisions three grant scope variants (A, B, C) and walks the full enrollment flow so Server A ends up with active peers pointing at Server B.
# Assumes stack is already running
pnpm tsx tools/federation-harness/seed.ts
# Or boot + seed in one step
pnpm tsx tools/federation-harness/seed.ts --boot
Scope Variants
| Variant | Resources | Filters | Excluded | Purpose |
|---|---|---|---|---|
| A | tasks, notes | include_personal: true | (none) | Personal data federation |
| B | tasks | include_teams: ['T1'], no personal | (none) | Team-scoped, no personal |
| C | tasks, credentials | include_personal: true | credentials | Sanity: excluded wins over list |
Using from Vitest
import {
bootHarness,
tearDownHarness,
serverA,
serverB,
seed,
} from '../../tools/federation-harness/harness.js';
import type { HarnessHandle } from '../../tools/federation-harness/harness.js';
let handle: HarnessHandle;
beforeAll(async () => {
handle = await bootHarness();
}, 180_000); // allow 3 min for Docker pull + NestJS cold start
afterAll(async () => {
await tearDownHarness(handle);
});
test('variant A: list tasks returns personal tasks', async () => {
// NOTE: Only 'all' is supported for now — per-variant narrowing is M3-11.
const seedResult = await seed(handle, 'all');
const a = serverA(handle);
const res = await fetch(`${a.baseUrl}/api/federation/tasks`, {
headers: { 'x-federation-grant': seedResult.grants.variantA.id },
});
expect(res.status).toBe(200);
});
Note:
seed()bootstraps a fresh admin user on each gateway viaPOST /api/bootstrap/setup. Both gateways must have zero users (pristine DB). If either gateway already has users,seed()throws with a clear error. Reset state withdocker compose down -v.
The bootHarness() function is idempotent: if both gateways are already
healthy, it reuses the running stack and returns ownedStack: false. Tests
should not call tearDownHarness when ownedStack is false unless they
explicitly want to shut down a shared stack.
Vitest Config (pnpm test:federation)
Add to vitest.config.ts at repo root (or a dedicated config):
// vitest.federation.config.ts
import { defineConfig } from 'vitest/config';
export default defineConfig({
test: {
include: ['**/*.federation.test.ts'],
testTimeout: 60_000,
hookTimeout: 180_000,
reporters: ['verbose'],
},
});
Then add to root package.json:
"test:federation": "vitest run --config vitest.federation.config.ts"
Nuking State
# Remove containers AND volumes (ephemeral state — CA keys, DBs, everything)
docker compose -f tools/federation-harness/docker-compose.two-gateways.yml down -v
On next up, Step-CA re-initialises from scratch and generates new CA keys.
Step-CA Root Certificate
The CA root lives in the fed-harness-step-ca Docker volume at
/home/step/certs/root_ca.crt. To extract it to the host:
docker run --rm \
-v fed-harness-step-ca:/home/step \
alpine cat /home/step/certs/root_ca.crt > /tmp/fed-harness-root-ca.crt
Troubleshooting
Port conflicts
Default host ports: 14001, 14002, 15432, 15433, 16379, 16380, 19000.
Override via environment variables before docker compose up:
GATEWAY_A_HOST_PORT=14101 GATEWAY_B_HOST_PORT=14102 \
docker compose -f tools/federation-harness/docker-compose.two-gateways.yml up -d
Image pull failures
The gateway image is digest-pinned to:
git.mosaicstack.dev/mosaicstack/stack/gateway@sha256:1069117740e00ccfeba357cae38c43f3729fe5ae702740ce474f6512414d7c02
(sha-9f1a081, post-#491 IMG-FIX)
If the registry is unreachable, Docker will use the locally cached image if present. If no local image exists, the compose up will fail with a pull error. In that case:
- Ensure you can reach
git.mosaicstack.dev(VPN, DNS, etc.). - Log in:
docker login git.mosaicstack.dev - Pull manually:
docker pull git.mosaicstack.dev/mosaicstack/stack/gateway@sha256:1069117740e00ccfeba357cae38c43f3729fe5ae702740ce474f6512414d7c02
NestJS cold start
Gateway containers take 40–60 seconds to become healthy on first boot (Node.js
module resolution + NestJS DI bootstrap). The start_period: 60s in the
compose healthcheck covers this. bootHarness() polls for up to 3 minutes.
Step-CA startup
Step-CA initialises on first boot (generates CA keys). This takes ~5-10s.
The start_period: 30s in the healthcheck covers it. Both gateways wait for
Step-CA to be healthy before starting (depends_on: step-ca: condition: service_healthy).
dev-password missing
The Step-CA container requires infra/step-ca/dev-password to be mounted.
Copy the example and set a local password:
cp infra/step-ca/dev-password.example infra/step-ca/dev-password
# Edit the file to set your preferred dev CA password
The file is .gitignored — do not commit it.
Image Digest Note
The gateway image is pinned to sha256:1069117740e00ccfeba357cae38c43f3729fe5ae702740ce474f6512414d7c02
(sha-9f1a081). This is the digest promoted by PR #491 (IMG-FIX). The latest
tag is forbidden per Mosaic image policy. When a new gateway build is promoted,
update the digest in docker-compose.two-gateways.yml and in this file.
Known Limitations
BETTER_AUTH_URL enrollment URL bug (production code — not fixed here)
apps/gateway/src/federation/federation.controller.ts:145 constructs the
enrollment URL using process.env['BETTER_AUTH_URL'] ?? 'http://localhost:14242'.
In non-harness deployments (where BETTER_AUTH_URL is not set or points to the
web origin rather than the gateway's own base URL) this produces an incorrect
enrollment URL that points to the wrong host or port.
The harness works around this by explicitly setting
BETTER_AUTH_URL: 'http://gateway-b:3000' in the compose file so the enrollment
URL correctly references gateway-b's internal Docker hostname.
TODO: Fix federation.controller.ts to derive the enrollment URL from its own
listening address (e.g. GATEWAY_BASE_URL env var or a dedicated
FEDERATION_ENROLLMENT_BASE_URL env var) rather than reusing BETTER_AUTH_URL.
Tracked as a follow-up to PR #505 — do not bundle with harness changes.
Permanent Infrastructure
This harness is designed to outlive M3 and be reused by M4+ milestone tests. It is not a throwaway scaffold — treat it as production test infrastructure:
- Keep it idempotent.
- Do not hardcode test assumptions in the harness layer (put them in tests).
- Update the seed script when new scope variants are needed.
- The README and harness should be kept in sync as the federation API evolves.