feat(federation): two-gateway test harness scaffold (FED-M3-02) (#505)
This commit was merged in pull request #505.
This commit is contained in:
254
tools/federation-harness/README.md
Normal file
254
tools/federation-harness/README.md
Normal file
@@ -0,0 +1,254 @@
|
||||
# Federation Test Harness
|
||||
|
||||
Local two-gateway federation test infrastructure for Mosaic Stack M3+.
|
||||
|
||||
This harness boots two real gateway instances (`gateway-a`, `gateway-b`) on a
|
||||
shared Docker bridge network, each backed by its own Postgres (pgvector) +
|
||||
Valkey, sharing a single Step-CA. It is the test bed for all M3+ federation
|
||||
E2E tests.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Docker with Compose v2 (`docker compose version` ≥ 2.20)
|
||||
- pnpm (for running via repo scripts)
|
||||
- `infra/step-ca/dev-password` must exist (copy from `infra/step-ca/dev-password.example`)
|
||||
|
||||
## Network Topology
|
||||
|
||||
```
|
||||
Host machine
|
||||
├── localhost:14001 → gateway-a (Server A — home / requesting)
|
||||
├── localhost:14002 → gateway-b (Server B — work / serving)
|
||||
├── localhost:15432 → postgres-a
|
||||
├── localhost:15433 → postgres-b
|
||||
├── localhost:16379 → valkey-a
|
||||
├── localhost:16380 → valkey-b
|
||||
└── localhost:19000 → step-ca (shared CA)
|
||||
|
||||
Docker network: fed-test-net (bridge)
|
||||
gateway-a ←──── mTLS ────→ gateway-b
|
||||
↘ ↗
|
||||
step-ca
|
||||
```
|
||||
|
||||
Ports are chosen to avoid collision with the base dev stack (5433, 6380, 14242, 9000).
|
||||
|
||||
## Starting the Harness
|
||||
|
||||
```bash
|
||||
# From repo root
|
||||
docker compose -f tools/federation-harness/docker-compose.two-gateways.yml up -d
|
||||
|
||||
# Wait for all services to be healthy (~60-90s on first boot due to NestJS cold start)
|
||||
docker compose -f tools/federation-harness/docker-compose.two-gateways.yml ps
|
||||
```
|
||||
|
||||
## Seeding Test Data
|
||||
|
||||
The seed script provisions three grant scope variants (A, B, C) and walks the
|
||||
full enrollment flow so Server A ends up with active peers pointing at Server B.
|
||||
|
||||
```bash
|
||||
# Assumes stack is already running
|
||||
pnpm tsx tools/federation-harness/seed.ts
|
||||
|
||||
# Or boot + seed in one step
|
||||
pnpm tsx tools/federation-harness/seed.ts --boot
|
||||
```
|
||||
|
||||
### Scope Variants
|
||||
|
||||
| Variant | Resources | Filters | Excluded | Purpose |
|
||||
| ------- | ------------------ | ---------------------------------- | ----------- | ------------------------------- |
|
||||
| A | tasks, notes | include_personal: true | (none) | Personal data federation |
|
||||
| B | tasks | include_teams: ['T1'], no personal | (none) | Team-scoped, no personal |
|
||||
| C | tasks, credentials | include_personal: true | credentials | Sanity: excluded wins over list |
|
||||
|
||||
## Using from Vitest
|
||||
|
||||
```ts
|
||||
import {
|
||||
bootHarness,
|
||||
tearDownHarness,
|
||||
serverA,
|
||||
serverB,
|
||||
seed,
|
||||
} from '../../tools/federation-harness/harness.js';
|
||||
import type { HarnessHandle } from '../../tools/federation-harness/harness.js';
|
||||
|
||||
let handle: HarnessHandle;
|
||||
|
||||
beforeAll(async () => {
|
||||
handle = await bootHarness();
|
||||
}, 180_000); // allow 3 min for Docker pull + NestJS cold start
|
||||
|
||||
afterAll(async () => {
|
||||
await tearDownHarness(handle);
|
||||
});
|
||||
|
||||
test('variant A: list tasks returns personal tasks', async () => {
|
||||
// NOTE: Only 'all' is supported for now — per-variant narrowing is M3-11.
|
||||
const seedResult = await seed(handle, 'all');
|
||||
const a = serverA(handle);
|
||||
|
||||
const res = await fetch(`${a.baseUrl}/api/federation/tasks`, {
|
||||
headers: { 'x-federation-grant': seedResult.grants.variantA.id },
|
||||
});
|
||||
expect(res.status).toBe(200);
|
||||
});
|
||||
```
|
||||
|
||||
> **Note:** `seed()` bootstraps a fresh admin user on each gateway via
|
||||
> `POST /api/bootstrap/setup`. Both gateways must have zero users (pristine DB).
|
||||
> If either gateway already has users, `seed()` throws with a clear error.
|
||||
> Reset state with `docker compose down -v`.
|
||||
|
||||
The `bootHarness()` function is **idempotent**: if both gateways are already
|
||||
healthy, it reuses the running stack and returns `ownedStack: false`. Tests
|
||||
should not call `tearDownHarness` when `ownedStack` is false unless they
|
||||
explicitly want to shut down a shared stack.
|
||||
|
||||
## Vitest Config (pnpm test:federation)
|
||||
|
||||
Add to `vitest.config.ts` at repo root (or a dedicated config):
|
||||
|
||||
```ts
|
||||
// vitest.federation.config.ts
|
||||
import { defineConfig } from 'vitest/config';
|
||||
|
||||
export default defineConfig({
|
||||
test: {
|
||||
include: ['**/*.federation.test.ts'],
|
||||
testTimeout: 60_000,
|
||||
hookTimeout: 180_000,
|
||||
reporters: ['verbose'],
|
||||
},
|
||||
});
|
||||
```
|
||||
|
||||
Then add to root `package.json`:
|
||||
|
||||
```json
|
||||
"test:federation": "vitest run --config vitest.federation.config.ts"
|
||||
```
|
||||
|
||||
## Nuking State
|
||||
|
||||
```bash
|
||||
# Remove containers AND volumes (ephemeral state — CA keys, DBs, everything)
|
||||
docker compose -f tools/federation-harness/docker-compose.two-gateways.yml down -v
|
||||
```
|
||||
|
||||
On next `up`, Step-CA re-initialises from scratch and generates new CA keys.
|
||||
|
||||
## Step-CA Root Certificate
|
||||
|
||||
The CA root lives in the `fed-harness-step-ca` Docker volume at
|
||||
`/home/step/certs/root_ca.crt`. To extract it to the host:
|
||||
|
||||
```bash
|
||||
docker run --rm \
|
||||
-v fed-harness-step-ca:/home/step \
|
||||
alpine cat /home/step/certs/root_ca.crt > /tmp/fed-harness-root-ca.crt
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Port conflicts
|
||||
|
||||
Default host ports: 14001, 14002, 15432, 15433, 16379, 16380, 19000.
|
||||
Override via environment variables before `docker compose up`:
|
||||
|
||||
```bash
|
||||
GATEWAY_A_HOST_PORT=14101 GATEWAY_B_HOST_PORT=14102 \
|
||||
docker compose -f tools/federation-harness/docker-compose.two-gateways.yml up -d
|
||||
```
|
||||
|
||||
### Image pull failures
|
||||
|
||||
The gateway image is digest-pinned to:
|
||||
|
||||
```
|
||||
git.mosaicstack.dev/mosaicstack/stack/gateway@sha256:1069117740e00ccfeba357cae38c43f3729fe5ae702740ce474f6512414d7c02
|
||||
```
|
||||
|
||||
(sha-9f1a081, post-#491 IMG-FIX)
|
||||
|
||||
If the registry is unreachable, Docker will use the locally cached image if
|
||||
present. If no local image exists, the compose up will fail with a pull error.
|
||||
In that case:
|
||||
|
||||
1. Ensure you can reach `git.mosaicstack.dev` (VPN, DNS, etc.).
|
||||
2. Log in: `docker login git.mosaicstack.dev`
|
||||
3. Pull manually: `docker pull git.mosaicstack.dev/mosaicstack/stack/gateway@sha256:1069117740e00ccfeba357cae38c43f3729fe5ae702740ce474f6512414d7c02`
|
||||
|
||||
### NestJS cold start
|
||||
|
||||
Gateway containers take 40–60 seconds to become healthy on first boot (Node.js
|
||||
module resolution + NestJS DI bootstrap). The `start_period: 60s` in the
|
||||
compose healthcheck covers this. `bootHarness()` polls for up to 3 minutes.
|
||||
|
||||
### Step-CA startup
|
||||
|
||||
Step-CA initialises on first boot (generates CA keys). This takes ~5-10s.
|
||||
The `start_period: 30s` in the healthcheck covers it. Both gateways wait for
|
||||
Step-CA to be healthy before starting (`depends_on: step-ca: condition: service_healthy`).
|
||||
|
||||
### dev-password missing
|
||||
|
||||
The Step-CA container requires `infra/step-ca/dev-password` to be mounted.
|
||||
Copy the example and set a local password:
|
||||
|
||||
```bash
|
||||
cp infra/step-ca/dev-password.example infra/step-ca/dev-password
|
||||
# Edit the file to set your preferred dev CA password
|
||||
```
|
||||
|
||||
The file is `.gitignore`d — do not commit it.
|
||||
|
||||
## Image Digest Note
|
||||
|
||||
The gateway image is pinned to `sha256:1069117740e00ccfeba357cae38c43f3729fe5ae702740ce474f6512414d7c02`
|
||||
(sha-9f1a081). This is the digest promoted by PR #491 (IMG-FIX). The `latest`
|
||||
tag is forbidden per Mosaic image policy. When a new gateway build is promoted,
|
||||
update the digest in `docker-compose.two-gateways.yml` and in this file.
|
||||
|
||||
## Known Limitations
|
||||
|
||||
### BETTER_AUTH_URL enrollment URL bug (upstream production code — not yet fixed)
|
||||
|
||||
`apps/gateway/src/federation/federation.controller.ts:145` constructs the
|
||||
enrollment URL using `process.env['BETTER_AUTH_URL'] ?? 'http://localhost:14242'`.
|
||||
This is an upstream bug: `BETTER_AUTH_URL` is the Better Auth origin (typically
|
||||
the web app), not the gateway's own base URL. In non-harness deployments this
|
||||
produces an enrollment URL pointing to the wrong host or port.
|
||||
|
||||
**How the harness handles this:**
|
||||
|
||||
1. **In-cluster calls (container-to-container):** The compose file sets
|
||||
`BETTER_AUTH_URL: 'http://gateway-b:3000'` so the enrollment URL returned by
|
||||
the gateway uses the Docker internal hostname. This lets other containers in the
|
||||
`fed-test-net` network resolve and reach Server B's enrollment endpoint.
|
||||
|
||||
2. **Host-side URL rewrite (seed script):** The `seed.ts` script runs on the host
|
||||
machine where `gateway-b` is not a resolvable hostname. Before calling
|
||||
`fetch(enrollmentUrl, ...)`, the seed script rewrites the URL: it extracts only
|
||||
the token path segment from `enrollmentUrl` and reassembles the URL using the
|
||||
host-accessible `serverBUrl` (default: `http://localhost:14002`). This lets the
|
||||
seed script redeem enrollment tokens from the host without being affected by the
|
||||
in-cluster hostname in the returned URL.
|
||||
|
||||
**TODO:** Fix `federation.controller.ts` to derive the enrollment URL from its own
|
||||
listening address (e.g. `GATEWAY_BASE_URL` env var or a dedicated
|
||||
`FEDERATION_ENROLLMENT_BASE_URL` env var) rather than reusing `BETTER_AUTH_URL`.
|
||||
Tracked as a follow-up to PR #505 — do not bundle with harness changes.
|
||||
|
||||
## Permanent Infrastructure
|
||||
|
||||
This harness is designed to outlive M3 and be reused by M4+ milestone tests.
|
||||
It is not a throwaway scaffold — treat it as production test infrastructure:
|
||||
|
||||
- Keep it idempotent.
|
||||
- Do not hardcode test assumptions in the harness layer (put them in tests).
|
||||
- Update the seed script when new scope variants are needed.
|
||||
- The README and harness should be kept in sync as the federation API evolves.
|
||||
Reference in New Issue
Block a user