feat(federation): enrollment controller + single-use token flow (FED-M2-07) #497
Reference in New Issue
Block a user
Delete Branch "feat/federation-m2-enrollment"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
FED-M2-07: Enrollment Controller\n\nAdds single-use enrollment token flow for federation grant activation.\n\n### Changes\n- New DB table
federation_enrollment_tokens(migration 0010)\n-EnrollmentService: token generation, redemption with replay protection, cert issuance via CaService, grant activation\n-EnrollmentController:POST /api/federation/enrollment/tokens(admin) +POST /api/federation/enrollment/:token(unauthenticated)\n- Unit tests for all success and error paths (12 tests)\n\nCloses #461 (partial — M2-07 of M2 milestone)Independent Code Review — REQUEST_CHANGES
Reviewed by Opus 4.7 (independent agent, no shared context with author).
HIGH (must fix before merge)
H1 — TOCTOU: cert issued before token claimed (
enrollment.service.ts:137-170, confidence 88)Current order: check usedAt → issueCert → mark token used. Two concurrent requests both pass pre-checks, both mint a real cert at step-ca with distinct serials, then race on the UPDATE. The loser's cert is a valid mTLS credential with the correct mosaic OIDs, untracked in
federation_peers.cert_serial, outside any CRL workflow, valid for 300s.Fix: Move the
UPDATE ... WHERE used_at IS NULLto BEFOREcaService.issueCert(). If issueCert fails after the token is claimed, the token is consumed (pending grant stranded — admin must create new grant). This tradeoff prevents double cert issuance.H2 — No transaction wrapping (
enrollment.service.ts:154-196, confidence 84)Four serial non-atomic writes: mark-used → activateGrant → update peer → audit. Any mid-flight failure leaves inconsistent state (grant active but peer missing cert serial → all inbound mTLS fails; audit row silently lost).
Fix: Wrap
activateGrant+ peer update + audit inawait this.db.transaction(async (tx) => { ... }). KeepcaService.issueCert()outside the transaction (network call).MEDIUM (defer ok)
rowCountPGlite fallback — prefer.returning({ token: ... })+ check array length; works on both drivers without driver inspection.crypto.randomUUID()doc says UUIDv7 but token column is text — cosmetic, but add TODO.Checked OK
POST /tokens✓crypto.randomBytes(32).toString('hex')(256 bits) ✓verb,resource,statusCode,outcome) ✓anytypes, no hardcoded secrets, stack file untouched ✓updateRowCount: 0 → GoneException✓Verdict: REQUEST_CHANGES — H1 and H2 must be fixed before merge.
Independent Re-Review — APPROVE
Reviewed by Opus 4.7 (independent agent, no shared context with author).
H1 (TOCTOU: cert issued before token claimed): RESOLVED
Atomic claim (
UPDATE WHERE used_at IS NULL RETURNING token) at line 143 fires beforeissueCertat line 160.claimed.length === 0→ GoneException without cert mint. Two new tests verify ordering and no-cert-on-race-loss.H2 (no transaction): RESOLVED
activateGrant + peer update + auditwrapped inthis.db.transaction(async (tx) => { ... })at line 179. All three writes usetx.issueCertcorrectly stays outside. Test confirms: transaction called once,tx.update×2,tx.insert×1.M1 (rowCount portability): RESOLVED
.returning({ token })+claimed.length— works on both node-postgres and PGlite.Gates on remediated branch: typecheck ✓ lint ✓ format ✓ test 429/429 ✓
No new HIGH issues.
Verdict: APPROVE