- CRIT-1: Validate cert subjectUserId against grant.subjectUserId from DB;
use authoritative DB value in FederationContext
- CRIT-2: Add @Inject(GrantsService) decorator (tsx/esbuild requirement)
- HIGH-1: Validate UTF8String TLV tag, length, and bounds in OID parser
- HIGH-2: Collapse all 403 wire messages to a generic string to prevent
grant enumeration; keep internal logger detail
- HIGH-3: Assert federation wire envelope shape in all guard tests
- HIGH-4: Regression test for subjectUserId cert/DB mismatch
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds FederationAuthGuard that validates inbound mTLS client certs on
federation API routes. Extracts custom OIDs (grantId, subjectUserId),
loads the grant+peer from DB in one query, asserts active status, and
validates cert serial as defense-in-depth. Attaches FederationContext
to requests on success and uses federation wire-format error envelopes
(not raw NestJS exceptions) for 401/403 responses.
New files:
- apps/gateway/src/federation/oid.util.ts — shared OID extraction (no dupe ASN.1 logic)
- apps/gateway/src/federation/server/federation-auth.guard.ts — guard impl
- apps/gateway/src/federation/server/federation-context.ts — FederationContext type + module augment
- apps/gateway/src/federation/server/index.ts — barrel export
- apps/gateway/src/federation/server/__tests__/federation-auth.guard.spec.ts — 11 unit tests
Modified:
- apps/gateway/src/federation/grants.service.ts — adds getGrantWithPeer() with join
- apps/gateway/src/federation/federation.module.ts — registers FederationAuthGuard as provider
Closes#462
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- HIGH-A: resolveEntry now uses promise-cache pattern so concurrent
callers serialize on a single in-flight build, eliminating duplicate
key material in heap and duplicate DB round-trips
- HIGH-B: flushPeer destroys the evicted undici Agent so stale TLS
connections close on cert rotation
- MED-C: add regression test for PEER_MISCONFIGURED when
STEP_CA_ROOT_CERT_PATH is unset
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
CRIT-1: regenerate pnpm-lock.yaml so apps/gateway resolves undici@7.24.6
(prior PR pushed package.json without lockfile update; CI failed with
ERR_PNPM_OUTDATED_LOCKFILE). Incidentally cleans 57 lines of stale
peer-dep entries.
CRIT-2: cache-hit test no longer swallows resolveEntry errors. Calls the
private method directly twice and asserts identity equality plus a
single DB select, removing the silent-failure path the prior assertion
allowed.
HIGH-1: mTLS Agent now pins Step-CA root via STEP_CA_ROOT_CERT_PATH.
Without the env var resolveEntry throws PEER_MISCONFIGURED, refusing to
dial peers against the public trust store. PEM is read once and cached
on the service instance.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Implements FederationClientService — a NestJS injectable that dials peer
gateways over mTLS (undici Agent with cert+sealed-key from federation_peers),
invokes list/get/capabilities verbs, validates responses via Zod, and surfaces
all failure modes as typed FederationClientError with a coherent error code
taxonomy (PEER_NOT_FOUND, PEER_INACTIVE, PEER_MISCONFIGURED, NETWORK,
FORBIDDEN, HTTP_{status}, INVALID_RESPONSE).
Per-peer Agent instances are cached in a Map for the service lifetime;
flushPeer(peerId) invalidates the cache for M5/M6 cert rotation and
revocation events.
Wired into FederationModule providers + exports so QuerySourceService
(M3-09) can inject it.
13 unit tests covering all required scenarios via undici MockAgent +
real sealClientKey/unsealClientKey round-trip.
Closes#462
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Kaniko fails when COPY --from=builder references a path that doesn't
exist. The web app had no public/ directory, causing build-web to fail
with 'no such file or directory' on the public assets COPY step.
BullMQ v5 RedisConnection constructor does:
Object.assign({ port: 6379, host: '127.0.0.1' }, opts)
When opts is a URL string (via 'as unknown as ConnectionOptions'),
Object.assign only copies character-index properties from the string,
so the default port 6379 was never overridden — causing ECONNREFUSED
against the wrong port instead of the configured 6380.
Fix: parse VALKEY_URL with new URL() and return a plain RedisOptions
object { host, port, ... } so Object.assign merges it correctly.