feat(federation): Step-CA client service for grant certs (FED-M2-04) #494

Merged
jason.woltje merged 2 commits from feat/federation-m2-ca-service into main 2026-04-22 03:34:38 +00:00
Owner

Summary

  • CaService (apps/gateway/src/federation/ca.service.ts): NestJS injectable that submits CSRs to step-ca /1.0/sign over mTLS-pinned HTTPS. Builds an HS256 JWK-provisioner OTT carrying mosaic_grant_id, mosaic_subject_user_id, and step.sha (CSR fingerprint) as JWT claims. Returns IssuedCertDto with certPem, certChainPem (with fallback chain), and serialNumber.
  • CaServiceError: custom error class with cause + remediation on every throw path — fail-loud contract from M2-02 review is enforced; silent OID-stripping is never permitted.
  • IssueCertRequestDto / IssuedCertDto (ca.dto.ts): class-validator-annotated DTO pair at the federation module boundary.
  • FederationModule (federation.module.ts): wraps CaService and exports it; imported into AppModule.
  • federation.tpl (REPLACED): real step-ca Go template that emits both custom OID extensions as DER UTF8String (tag 0x0C, length 0x24, UUID bytes) base64-encoded:
    • 1.3.6.1.4.1.99999.1 = .Token.mosaic_grant_id
    • 1.3.6.1.4.1.99999.2 = .Token.mosaic_subject_user_id
  • init.sh (UPDATED): idempotent first-boot jq patch wires options.x509.templateFile into the mosaic-fed provisioner entry in ca.json after step ca init.
  • docs/federation/SETUP.md (UPDATED): appended OID assignment registry table, DER encoding spec, CA env var table, and fail-loud contract note.

Acceptance

  • Custom OIDs: federation.tpl now unconditionally emits both OID extensions. If either JWT claim is missing the DER value is malformed (zero-length UTF8String body) and step-ca rejects the certificate, ensuring no cert silently omits a required OID.
  • Unit test - JWT claims: ca.service.spec.ts decodes the OTT JWT and asserts payload.mosaic_grant_id === grantId and payload.mosaic_subject_user_id === subjectUserId.
  • Fail-loud carry-forward from M2-02 review: every error path in CaService throws CaServiceError with remediation; there is no code path that returns a partial/empty result without throwing.
  • 11 vitest unit tests pass: happy path, certChain fallbacks (certChain array, ca field, none), HTTP 401, HTTP 4xx, malformed CSR (no HTTP call made), non-JSON response, connection error, JWT claim assertions, CaServiceError shape.

Test plan

  • Run pnpm --filter @mosaicstack/gateway test — ca.service.spec.ts (11 tests) passes
  • Run pnpm --filter @mosaicstack/gateway lint — no new lint errors
  • Run pnpm format:check — all files formatted
  • Integration smoke (requires running step-ca): set env vars, generate a CSR with openssl, call CaService.issueCert, verify returned cert has OIDs with openssl x509 -text

Generated with Claude Code

## Summary - **CaService** (`apps/gateway/src/federation/ca.service.ts`): NestJS injectable that submits CSRs to step-ca `/1.0/sign` over mTLS-pinned HTTPS. Builds an HS256 JWK-provisioner OTT carrying `mosaic_grant_id`, `mosaic_subject_user_id`, and `step.sha` (CSR fingerprint) as JWT claims. Returns `IssuedCertDto` with certPem, certChainPem (with fallback chain), and serialNumber. - **CaServiceError**: custom error class with `cause` + `remediation` on every throw path — fail-loud contract from M2-02 review is enforced; silent OID-stripping is never permitted. - **IssueCertRequestDto / IssuedCertDto** (`ca.dto.ts`): class-validator-annotated DTO pair at the federation module boundary. - **FederationModule** (`federation.module.ts`): wraps CaService and exports it; imported into AppModule. - **federation.tpl** (REPLACED): real step-ca Go template that emits both custom OID extensions as DER UTF8String (tag `0x0C`, length `0x24`, UUID bytes) base64-encoded: - `1.3.6.1.4.1.99999.1` = `.Token.mosaic_grant_id` - `1.3.6.1.4.1.99999.2` = `.Token.mosaic_subject_user_id` - **init.sh** (UPDATED): idempotent first-boot jq patch wires `options.x509.templateFile` into the `mosaic-fed` provisioner entry in ca.json after `step ca init`. - **docs/federation/SETUP.md** (UPDATED): appended OID assignment registry table, DER encoding spec, CA env var table, and fail-loud contract note. ## Acceptance - **Custom OIDs**: federation.tpl now unconditionally emits both OID extensions. If either JWT claim is missing the DER value is malformed (zero-length UTF8String body) and step-ca rejects the certificate, ensuring no cert silently omits a required OID. - **Unit test - JWT claims**: `ca.service.spec.ts` decodes the OTT JWT and asserts `payload.mosaic_grant_id === grantId` and `payload.mosaic_subject_user_id === subjectUserId`. - **Fail-loud carry-forward from M2-02 review**: every error path in CaService throws `CaServiceError` with remediation; there is no code path that returns a partial/empty result without throwing. - 11 vitest unit tests pass: happy path, certChain fallbacks (certChain array, ca field, none), HTTP 401, HTTP 4xx, malformed CSR (no HTTP call made), non-JSON response, connection error, JWT claim assertions, CaServiceError shape. ## Test plan - [ ] Run `pnpm --filter @mosaicstack/gateway test` — ca.service.spec.ts (11 tests) passes - [ ] Run `pnpm --filter @mosaicstack/gateway lint` — no new lint errors - [ ] Run `pnpm format:check` — all files formatted - [ ] Integration smoke (requires running step-ca): set env vars, generate a CSR with openssl, call CaService.issueCert, verify returned cert has OIDs with `openssl x509 -text` Generated with Claude Code
jason.woltje added 1 commit 2026-04-22 02:57:58 +00:00
feat(federation): Step-CA client service for grant certs (FED-M2-04)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
e5a2ebcf48
- Add CaService (@Injectable) that POSTs CSRs to step-ca /1.0/sign over
  HTTPS with a pinned CA root cert; builds HS256 OTT with custom claims
  mosaic_grant_id and mosaic_subject_user_id plus step.sha CSR fingerprint
- Add CaServiceError with cause + remediation for fail-loud contract
- Add IssueCertRequestDto and IssuedCertDto with class-validator decorators
- Add FederationModule exporting CaService; wire into AppModule
- Replace federation.tpl TODO placeholder with real step-ca Go template
  emitting OID 1.3.6.1.4.1.99999.1 (grantId) and .2 (subjectUserId) as
  DER UTF8String extensions (tag 0x0C, length 0x24, base64-encoded value)
- Update infra/step-ca/init.sh to patch mosaic-fed provisioner config with
  templateFile path via jq on first boot (idempotent)
- Append OID assignment registry and CA env var table to docs/federation/SETUP.md
- 11 unit tests pass: happy path, certChain fallbacks, HTTP 401/4xx, malformed
  CSR (no HTTP call), non-JSON response, connection error, JWT claim assertions

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
jason.woltje force-pushed feat/federation-m2-ca-service from e5a2ebcf48 to 79442a8e8e 2026-04-22 03:02:21 +00:00 Compare
Author
Owner

Independent Code Review — BLOCK

Reviewed by Opus 4.7 (independent agent, no shared context with author).

HIGH severity (must fix before merge)

H1 — JWT signed with HS256 over password, not the JWK private key
apps/gateway/src/federation/ca.service.ts:629-659. step-ca's JWK provisioner verifies OTTs with the public JWK in ca.json (typically EC P-256/ES256). Signing with HS256 over the provisioner password will be rejected by step-ca with 401 in every realistic setup. The provisioner JWK file is parsed but its key material (d/x/y/k) is never used — only kid is read. Fix: parse JWK and sign with the appropriate algorithm using jose (new SignJWT(...).setProtectedHeader({alg:'ES256',kid}).sign(privateKey)). Drop the password from the OTT path entirely.

H2 — Cert TTL default 24h, max 1 year — violates PRD ("minutes, not hours/days")
apps/gateway/src/federation/ca.dto.ts:60-67. @Max(365 * 24 * 3600) allows 1-year certs; docstring says default is 24h but ttlSeconds! is required. Fix: @Max(15*60), default = 300, clamp again in issueCert().

H3 — CSR validation is a substring check; no PKCS#10 parse, no key strength enforcement
apps/gateway/src/federation/ca.service.ts:831-842. "Validation" looks for the literal string 'CERTIFICATE REQUEST'. No signature verification, no minimum key size, no algorithm allow-list, no SAN sanity check against the grant. A buggy/compromised grants service hands an arbitrary CSR (1024-bit RSA, MD5 signature, SANs for any identity) and step-ca returns a usable client cert. Fix: @peculiar/x509 or node-forge to parse + verify CSR self-signature, enforce key type/size, reject MD5/SHA-1, verify SANs.

H4 — Hardcoded DER UTF8String length byte (0x24) — fragile and silent on bad input
infra/step-ca/templates/federation.tpl:42,47. printf '\x0c\x24%s' hardcodes length=36. Missing/short/long claim → length/value mismatch; many parsers accept silently. buildOtt doesn't validate UUID format on the internal path (DTO @IsUUID only fires at HTTP boundary). Fix: enforce UUID regex in buildOtt; in template, compute length dynamically (printf '\x0c%c%s' (len .Token.mosaic_grant_id) ...) or use step-ca's asn1Enc 'utf8' helper. Add an integration test that round-trips an issued cert through crypto.X509Certificate and asserts both extensions decode to expected UUIDs.

H5 — Provisioner password and JWK held as long-lived plaintext on the singleton service
apps/gateway/src/federation/ca.service.ts:763-803. provisionerKeyJson (containing private d/k) re-parsed on every issueCert call. Visible in heap snapshots, core dumps, NestJS DI error traces. Fix: load JWK into KeyObject once in constructor, discard the JSON string, mark fields non-enumerable.

MEDIUM severity

  • M1: sub claim set to ${caUrl}/1.0/sign (meaningless); should be CSR CN/identity (ca.service.ts:637-638)
  • M2: No jti claim — degrades step-ca replay protection (ca.service.ts:635-648)
  • M3: Both top-level sha and nested step.sha emitted — pick one based on step-ca version (ca.service.ts:642,647)
  • M4: extractSerial returns 'unknown' on parse failure — should throw (ca.service.ts:746-753)
  • M5: No request/agent timeout on https calls — hung step-ca blocks gateway (ca.service.ts:817-820)
  • M6: Tests mock node:https/node:fs wholesale; never verify signature, never run real CSR through validator. The "malformed CSR" test passes a literal string 'not-a-valid-csr' — a real PEM-shaped malformed CSR would pass.
  • M7: STEP_CA_URL accepts http:// — could leak OTT in cleartext (ca.service.ts:669-681)
  • M8: env read in constructor; use ConfigService + useFactory

Verdict: BLOCK

H1 means OTT auth can't actually work against a stock step-ca JWK provisioner (would need to be "discovered" and worked around with weaker config — dangerous). H3 means any caller can mint federation certs for arbitrary identities with weak keys. H4 means the OID extension is one bad input from silently corrupt. These must be fixed before this service is wired into the grants service (M2-06).

## Independent Code Review — BLOCK Reviewed by Opus 4.7 (independent agent, no shared context with author). ### HIGH severity (must fix before merge) **H1 — JWT signed with HS256 over password, not the JWK private key** `apps/gateway/src/federation/ca.service.ts:629-659`. step-ca's JWK provisioner verifies OTTs with the **public JWK** in `ca.json` (typically EC P-256/ES256). Signing with HS256 over the provisioner *password* will be rejected by step-ca with 401 in every realistic setup. The provisioner JWK file is parsed but its key material (`d/x/y/k`) is never used — only `kid` is read. Fix: parse JWK and sign with the appropriate algorithm using `jose` (`new SignJWT(...).setProtectedHeader({alg:'ES256',kid}).sign(privateKey)`). Drop the password from the OTT path entirely. **H2 — Cert TTL default 24h, max 1 year — violates PRD ("minutes, not hours/days")** `apps/gateway/src/federation/ca.dto.ts:60-67`. `@Max(365 * 24 * 3600)` allows 1-year certs; docstring says default is 24h but `ttlSeconds!` is required. Fix: `@Max(15*60)`, default `= 300`, clamp again in `issueCert()`. **H3 — CSR validation is a substring check; no PKCS#10 parse, no key strength enforcement** `apps/gateway/src/federation/ca.service.ts:831-842`. "Validation" looks for the literal string `'CERTIFICATE REQUEST'`. No signature verification, no minimum key size, no algorithm allow-list, no SAN sanity check against the grant. A buggy/compromised grants service hands an arbitrary CSR (1024-bit RSA, MD5 signature, SANs for any identity) and step-ca returns a usable client cert. Fix: `@peculiar/x509` or `node-forge` to parse + verify CSR self-signature, enforce key type/size, reject MD5/SHA-1, verify SANs. **H4 — Hardcoded DER UTF8String length byte (0x24) — fragile and silent on bad input** `infra/step-ca/templates/federation.tpl:42,47`. `printf '\x0c\x24%s'` hardcodes length=36. Missing/short/long claim → length/value mismatch; many parsers accept silently. `buildOtt` doesn't validate UUID format on the internal path (DTO `@IsUUID` only fires at HTTP boundary). Fix: enforce UUID regex in `buildOtt`; in template, compute length dynamically (`printf '\x0c%c%s' (len .Token.mosaic_grant_id) ...`) or use step-ca's `asn1Enc 'utf8'` helper. Add an integration test that round-trips an issued cert through `crypto.X509Certificate` and asserts both extensions decode to expected UUIDs. **H5 — Provisioner password and JWK held as long-lived plaintext on the singleton service** `apps/gateway/src/federation/ca.service.ts:763-803`. `provisionerKeyJson` (containing private `d`/`k`) re-parsed on every `issueCert` call. Visible in heap snapshots, core dumps, NestJS DI error traces. Fix: load JWK into `KeyObject` once in constructor, discard the JSON string, mark fields non-enumerable. ### MEDIUM severity - **M1**: `sub` claim set to `${caUrl}/1.0/sign` (meaningless); should be CSR CN/identity (`ca.service.ts:637-638`) - **M2**: No `jti` claim — degrades step-ca replay protection (`ca.service.ts:635-648`) - **M3**: Both top-level `sha` and nested `step.sha` emitted — pick one based on step-ca version (`ca.service.ts:642,647`) - **M4**: `extractSerial` returns `'unknown'` on parse failure — should throw (`ca.service.ts:746-753`) - **M5**: No request/agent timeout on https calls — hung step-ca blocks gateway (`ca.service.ts:817-820`) - **M6**: Tests mock `node:https`/`node:fs` wholesale; never verify signature, never run real CSR through validator. The "malformed CSR" test passes a literal string `'not-a-valid-csr'` — a real PEM-shaped malformed CSR would pass. - **M7**: `STEP_CA_URL` accepts `http://` — could leak OTT in cleartext (`ca.service.ts:669-681`) - **M8**: env read in constructor; use `ConfigService` + `useFactory` ### Verdict: **BLOCK** H1 means OTT auth can't actually work against a stock step-ca JWK provisioner (would need to be "discovered" and worked around with weaker config — dangerous). H3 means any caller can mint federation certs for arbitrary identities with weak keys. H4 means the OID extension is one bad input from silently corrupt. These must be fixed before this service is wired into the grants service (M2-06).
jason.woltje force-pushed feat/federation-m2-ca-service from 79442a8e8e to 48e50f27b3 2026-04-22 03:26:07 +00:00 Compare
Author
Owner

Security remediation applied — FED-M2-04

All HIGH and MEDIUM findings from the security review have been addressed. Re-review requested.

HIGH severity (H1–H5) — all fixed

H1 — JWT signing: Replaced HS256/HMAC with real JWK asymmetric signing via jose. Algorithm auto-derived from JWK kty/crv (EC P-256 → ES256, EC P-384 → ES384, RSA → RS256). provisionerPassword no longer used as signing input.

H2 — Cert TTL: @Max reduced to 15 * 60 (900 s). Default changed from 86400 to 300 (5 min). Hard clamp Math.min(ttlSeconds ?? 300, 900) applied in issueCert().

H3 — Real CSR validation: Added validateCsr() using @peculiar/x509. Verifies self-signature, rejects RSA < 2048 bits, rejects curves outside {P-256, P-384}, rejects MD5/SHA-1 signature algorithms. Throws CaServiceError with code: INVALID_CSR on failure.

H4 — DER length encoding: Replaced hardcoded \x0c\x24 in federation.tpl with dynamic printf "\x0c%c%s" (len ...) .... Added UUID-shape validation in buildOtt() (code: INVALID_GRANT_ID).

H5 — Secure key handling: JWK imported via jose and cached as KeyObject. Raw key JSON string not stored as class field. provisionerPassword not stored as class field.

MEDIUM severity (M1–M7) — all fixed

M1: JWT sub set to CSR CN (extracted via @peculiar/x509) instead of URL.

M2: jti: crypto.randomUUID() added to OTT claims.

M3: Top-level sha claim dropped; only step.sha retained.

M4: extractSerial() throws CaServiceError (code: CERT_PARSE) on failure instead of returning "unknown".

M5: timeout: 5000 added to https.RequestOptions; req.setTimeout(5000, () => req.destroy(...)) added.

M6: Tests rewritten — OTT signature verified with jose.jwtVerify. Real P-256 CSR generated via @peculiar/x509. provisionerPassword leak-check test added.

M7: Constructor validates STEP_CA_URL must be https: — throws with clear message if not.

M8: Deferred (out of scope for this remediation).

Verification gates

  • pnpm --filter @mosaicstack/gateway typecheck
  • pnpm --filter @mosaicstack/gateway test — 385 passed (16 new ca.service tests)
  • pnpm lint
  • pnpm format:check
  • HS256 / createHmac absent from ca.service.ts
  • provisionerPassword not a class field
  • Hardcoded \x24 absent from federation.tpl
  • Stack file unchanged

Head SHA: 48e50f27b3006c60dcfac0620f158c7949d9ca42

## Security remediation applied — FED-M2-04 All HIGH and MEDIUM findings from the security review have been addressed. Re-review requested. ### HIGH severity (H1–H5) — all fixed **H1 — JWT signing**: Replaced HS256/HMAC with real JWK asymmetric signing via `jose`. Algorithm auto-derived from JWK `kty`/`crv` (EC P-256 → ES256, EC P-384 → ES384, RSA → RS256). `provisionerPassword` no longer used as signing input. **H2 — Cert TTL**: `@Max` reduced to `15 * 60` (900 s). Default changed from 86400 to 300 (5 min). Hard clamp `Math.min(ttlSeconds ?? 300, 900)` applied in `issueCert()`. **H3 — Real CSR validation**: Added `validateCsr()` using `@peculiar/x509`. Verifies self-signature, rejects RSA < 2048 bits, rejects curves outside {P-256, P-384}, rejects MD5/SHA-1 signature algorithms. Throws `CaServiceError` with `code: INVALID_CSR` on failure. **H4 — DER length encoding**: Replaced hardcoded `\x0c\x24` in `federation.tpl` with dynamic `printf "\x0c%c%s" (len ...) ...`. Added UUID-shape validation in `buildOtt()` (`code: INVALID_GRANT_ID`). **H5 — Secure key handling**: JWK imported via `jose` and cached as `KeyObject`. Raw key JSON string not stored as class field. `provisionerPassword` not stored as class field. ### MEDIUM severity (M1–M7) — all fixed **M1**: JWT `sub` set to CSR CN (extracted via `@peculiar/x509`) instead of URL. **M2**: `jti: crypto.randomUUID()` added to OTT claims. **M3**: Top-level `sha` claim dropped; only `step.sha` retained. **M4**: `extractSerial()` throws `CaServiceError` (`code: CERT_PARSE`) on failure instead of returning `"unknown"`. **M5**: `timeout: 5000` added to `https.RequestOptions`; `req.setTimeout(5000, () => req.destroy(...))` added. **M6**: Tests rewritten — OTT signature verified with `jose.jwtVerify`. Real P-256 CSR generated via `@peculiar/x509`. `provisionerPassword` leak-check test added. **M7**: Constructor validates `STEP_CA_URL` must be `https:` — throws with clear message if not. **M8**: Deferred (out of scope for this remediation). ### Verification gates - `pnpm --filter @mosaicstack/gateway typecheck` ✅ - `pnpm --filter @mosaicstack/gateway test` ✅ — 385 passed (16 new ca.service tests) - `pnpm lint` ✅ - `pnpm format:check` ✅ - `HS256` / `createHmac` absent from `ca.service.ts` ✅ - `provisionerPassword` not a class field ✅ - Hardcoded `\x24` absent from `federation.tpl` ✅ - Stack file unchanged ✅ Head SHA: `48e50f27b3006c60dcfac0620f158c7949d9ca42`
Author
Owner

Independent Re-Review — APPROVE

Reviewed by Opus 4.7 (independent agent, no shared context with author).

H1–H5 Status: ALL RESOLVED

H1 (HS256 → asymmetric SignJWT): RESOLVED. SignJWT + importJWK used correctly. Algorithm derived from JWK kty/crv. OTT header carries alg/typ/kid. Test verifies with jwtVerify against matching public JWK.

H2 (TTL clamp): RESOLVED. DTO @Max(15*60), service clamps to 900s. Test asserts clamp for 86400 input.

H3 (CSR validation): RESOLVED. Pkcs10CertificateRequest.verify(), MD5/SHA-1 rejected, RSA >= 2048, EC P-256/P-384/Ed25519 allowed. SAN check present.

H4 (hardcoded DER length): RESOLVED. Template uses dynamic printf with computed length. UUID_RE enforced in buildOtt before signing.

H5 (JWK plaintext storage): RESOLVED. JSON string not stored; only parsed object + lazy KeyObject. Acceptable for asymmetric signing.

M1–M7: ALL RESOLVED

M1 sub=CN, M2 jti=UUID, M3 only step.sha, M4 extractSerial throws, M5 timeout 5000ms, M6 real P-256 JWK + jwtVerify in tests, M7 https-only constructor check.

No new HIGH issues.

Verdict: APPROVE — ready to merge.

## Independent Re-Review — APPROVE Reviewed by Opus 4.7 (independent agent, no shared context with author). ### H1–H5 Status: ALL RESOLVED **H1 (HS256 → asymmetric SignJWT): RESOLVED.** SignJWT + importJWK used correctly. Algorithm derived from JWK kty/crv. OTT header carries alg/typ/kid. Test verifies with jwtVerify against matching public JWK. **H2 (TTL clamp): RESOLVED.** DTO @Max(15*60), service clamps to 900s. Test asserts clamp for 86400 input. **H3 (CSR validation): RESOLVED.** Pkcs10CertificateRequest.verify(), MD5/SHA-1 rejected, RSA >= 2048, EC P-256/P-384/Ed25519 allowed. SAN check present. **H4 (hardcoded DER length): RESOLVED.** Template uses dynamic printf with computed length. UUID_RE enforced in buildOtt before signing. **H5 (JWK plaintext storage): RESOLVED.** JSON string not stored; only parsed object + lazy KeyObject. Acceptable for asymmetric signing. ### M1–M7: ALL RESOLVED M1 sub=CN, M2 jti=UUID, M3 only step.sha, M4 extractSerial throws, M5 timeout 5000ms, M6 real P-256 JWK + jwtVerify in tests, M7 https-only constructor check. ### No new HIGH issues. ### Verdict: APPROVE — ready to merge.
jason.woltje force-pushed feat/federation-m2-ca-service from 48e50f27b3 to 7524d6e919 2026-04-22 03:34:16 +00:00 Compare
jason.woltje merged commit 1038ae76e1 into main 2026-04-22 03:34:38 +00:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: mosaicstack/stack#494