Federation Test Harness
Local two-gateway federation test infrastructure for Mosaic Stack M3+.
This harness boots two real gateway instances (gateway-a, gateway-b) on a
shared Docker bridge network, each backed by its own Postgres (pgvector) +
Valkey, sharing a single Step-CA. It is the test bed for all M3+ federation
E2E tests.
Prerequisites
- Docker with Compose v2 (
docker compose version≥ 2.20) - pnpm (for running via repo scripts)
infra/step-ca/dev-passwordmust exist (copy frominfra/step-ca/dev-password.example)
Network Topology
Host machine
├── localhost:14001 → gateway-a (Server A — home / requesting)
├── localhost:14002 → gateway-b (Server B — work / serving)
├── localhost:15432 → postgres-a
├── localhost:15433 → postgres-b
├── localhost:16379 → valkey-a
├── localhost:16380 → valkey-b
└── localhost:19000 → step-ca (shared CA)
Docker network: fed-test-net (bridge)
gateway-a ←──── mTLS ────→ gateway-b
↘ ↗
step-ca
Ports are chosen to avoid collision with the base dev stack (5433, 6380, 14242, 9000).
Starting the Harness
# From repo root
docker compose -f tools/federation-harness/docker-compose.two-gateways.yml up -d
# Wait for all services to be healthy (~60-90s on first boot due to NestJS cold start)
docker compose -f tools/federation-harness/docker-compose.two-gateways.yml ps
Seeding Test Data
The seed script provisions three grant scope variants (A, B, C) and walks the full enrollment flow so Server A ends up with active peers pointing at Server B.
# Assumes stack is already running
pnpm tsx tools/federation-harness/seed.ts
# Or boot + seed in one step
pnpm tsx tools/federation-harness/seed.ts --boot
Scope Variants
| Variant | Resources | Filters | Excluded | Purpose |
|---|---|---|---|---|
| A | tasks, notes | include_personal: true | (none) | Personal data federation |
| B | tasks | include_teams: ['T1'], no personal | (none) | Team-scoped, no personal |
| C | tasks, credentials | include_personal: true | credentials | Sanity: excluded wins over list |
Using from Vitest
import {
bootHarness,
tearDownHarness,
serverA,
serverB,
seed,
} from '../../tools/federation-harness/harness.js';
import type { HarnessHandle } from '../../tools/federation-harness/harness.js';
let handle: HarnessHandle;
beforeAll(async () => {
handle = await bootHarness();
}, 180_000); // allow 3 min for Docker pull + NestJS cold start
afterAll(async () => {
await tearDownHarness(handle);
});
test('variant A: list tasks returns personal tasks', async () => {
// NOTE: Only 'all' is supported for now — per-variant narrowing is M3-11.
const seedResult = await seed(handle, 'all');
const a = serverA(handle);
const res = await fetch(`${a.baseUrl}/api/federation/tasks`, {
headers: { 'x-federation-grant': seedResult.grants.variantA.id },
});
expect(res.status).toBe(200);
});
Note:
seed()bootstraps a fresh admin user on each gateway viaPOST /api/bootstrap/setup. Both gateways must have zero users (pristine DB). If either gateway already has users,seed()throws with a clear error. Reset state withdocker compose down -v.
The bootHarness() function is idempotent: if both gateways are already
healthy, it reuses the running stack and returns ownedStack: false. Tests
should not call tearDownHarness when ownedStack is false unless they
explicitly want to shut down a shared stack.
Vitest Config (pnpm test:federation)
Add to vitest.config.ts at repo root (or a dedicated config):
// vitest.federation.config.ts
import { defineConfig } from 'vitest/config';
export default defineConfig({
test: {
include: ['**/*.federation.test.ts'],
testTimeout: 60_000,
hookTimeout: 180_000,
reporters: ['verbose'],
},
});
Then add to root package.json:
"test:federation": "vitest run --config vitest.federation.config.ts"
Nuking State
# Remove containers AND volumes (ephemeral state — CA keys, DBs, everything)
docker compose -f tools/federation-harness/docker-compose.two-gateways.yml down -v
On next up, Step-CA re-initialises from scratch and generates new CA keys.
Step-CA Root Certificate
The CA root lives in the fed-harness-step-ca Docker volume at
/home/step/certs/root_ca.crt. To extract it to the host:
docker run --rm \
-v fed-harness-step-ca:/home/step \
alpine cat /home/step/certs/root_ca.crt > /tmp/fed-harness-root-ca.crt
Troubleshooting
Port conflicts
Default host ports: 14001, 14002, 15432, 15433, 16379, 16380, 19000.
Override via environment variables before docker compose up:
GATEWAY_A_HOST_PORT=14101 GATEWAY_B_HOST_PORT=14102 \
docker compose -f tools/federation-harness/docker-compose.two-gateways.yml up -d
Image pull failures
The gateway image is digest-pinned to:
git.mosaicstack.dev/mosaicstack/stack/gateway@sha256:1069117740e00ccfeba357cae38c43f3729fe5ae702740ce474f6512414d7c02
(sha-9f1a081, post-#491 IMG-FIX)
If the registry is unreachable, Docker will use the locally cached image if present. If no local image exists, the compose up will fail with a pull error. In that case:
- Ensure you can reach
git.mosaicstack.dev(VPN, DNS, etc.). - Log in:
docker login git.mosaicstack.dev - Pull manually:
docker pull git.mosaicstack.dev/mosaicstack/stack/gateway@sha256:1069117740e00ccfeba357cae38c43f3729fe5ae702740ce474f6512414d7c02
NestJS cold start
Gateway containers take 40–60 seconds to become healthy on first boot (Node.js
module resolution + NestJS DI bootstrap). The start_period: 60s in the
compose healthcheck covers this. bootHarness() polls for up to 3 minutes.
Step-CA startup
Step-CA initialises on first boot (generates CA keys). This takes ~5-10s.
The start_period: 30s in the healthcheck covers it. Both gateways wait for
Step-CA to be healthy before starting (depends_on: step-ca: condition: service_healthy).
dev-password missing
The Step-CA container requires infra/step-ca/dev-password to be mounted.
Copy the example and set a local password:
cp infra/step-ca/dev-password.example infra/step-ca/dev-password
# Edit the file to set your preferred dev CA password
The file is .gitignored — do not commit it.
Image Digest Note
The gateway image is pinned to sha256:1069117740e00ccfeba357cae38c43f3729fe5ae702740ce474f6512414d7c02
(sha-9f1a081). This is the digest promoted by PR #491 (IMG-FIX). The latest
tag is forbidden per Mosaic image policy. When a new gateway build is promoted,
update the digest in docker-compose.two-gateways.yml and in this file.
Known Limitations
BETTER_AUTH_URL enrollment URL bug (upstream production code — not yet fixed)
apps/gateway/src/federation/federation.controller.ts:145 constructs the
enrollment URL using process.env['BETTER_AUTH_URL'] ?? 'http://localhost:14242'.
This is an upstream bug: BETTER_AUTH_URL is the Better Auth origin (typically
the web app), not the gateway's own base URL. In non-harness deployments this
produces an enrollment URL pointing to the wrong host or port.
How the harness handles this:
-
In-cluster calls (container-to-container): The compose file sets
BETTER_AUTH_URL: 'http://gateway-b:3000'so the enrollment URL returned by the gateway uses the Docker internal hostname. This lets other containers in thefed-test-netnetwork resolve and reach Server B's enrollment endpoint. -
Host-side URL rewrite (seed script): The
seed.tsscript runs on the host machine wheregateway-bis not a resolvable hostname. Before callingfetch(enrollmentUrl, ...), the seed script rewrites the URL: it extracts only the token path segment fromenrollmentUrland reassembles the URL using the host-accessibleserverBUrl(default:http://localhost:14002). This lets the seed script redeem enrollment tokens from the host without being affected by the in-cluster hostname in the returned URL.
TODO: Fix federation.controller.ts to derive the enrollment URL from its own
listening address (e.g. GATEWAY_BASE_URL env var or a dedicated
FEDERATION_ENROLLMENT_BASE_URL env var) rather than reusing BETTER_AUTH_URL.
Tracked as a follow-up to PR #505 — do not bundle with harness changes.
Permanent Infrastructure
This harness is designed to outlive M3 and be reused by M4+ milestone tests. It is not a throwaway scaffold — treat it as production test infrastructure:
- Keep it idempotent.
- Do not hardcode test assumptions in the harness layer (put them in tests).
- Update the seed script when new scope variants are needed.
- The README and harness should be kept in sync as the federation API evolves.