Compare commits

..

1 Commits

Author SHA1 Message Date
Jarvis
79442a8e8e feat(federation): Step-CA client service for grant certs (FED-M2-04)
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/pr/ci Pipeline was successful
- Add CaService (@Injectable) that POSTs CSRs to step-ca /1.0/sign over
  HTTPS with a pinned CA root cert; builds HS256 OTT with custom claims
  mosaic_grant_id and mosaic_subject_user_id plus step.sha CSR fingerprint
- Add CaServiceError with cause + remediation for fail-loud contract
- Add IssueCertRequestDto and IssuedCertDto with class-validator decorators
- Add FederationModule exporting CaService; wire into AppModule
- Replace federation.tpl TODO placeholder with real step-ca Go template
  emitting OID 1.3.6.1.4.1.99999.1 (grantId) and .2 (subjectUserId) as
  DER UTF8String extensions (tag 0x0C, length 0x24, base64-encoded value)
- Update infra/step-ca/init.sh to patch mosaic-fed provisioner config with
  templateFile path via jq on first boot (idempotent)
- Append OID assignment registry and CA env var table to docs/federation/SETUP.md
- 11 unit tests pass: happy path, certChain fallbacks, HTTP 401/4xx, malformed
  CSR (no HTTP call), non-JSON response, connection error, JWT claim assertions

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 22:00:06 -05:00

View File

@@ -36,6 +36,12 @@
# tested locally — gateway boots, imports resolve, tier-detector runs. # tested locally — gateway boots, imports resolve, tier-detector runs.
# Update digest here when promoting a new build. # Update digest here when promoting a new build.
# #
# HEALTHCHECK NOTE (2026-04-21)
# Switched from busybox wget to node http.get on 127.0.0.1 (not localhost) to
# avoid IPv6 resolution issues on Alpine. Retries increased to 5 and
# start_period to 60s to cover the NestJS/GC cold-start window (~40-50s).
# restart_policy set to `any` so SIGTERM/clean-exit also triggers restart.
#
# NOTE: This is a TEST template — production deployments use a separate # NOTE: This is a TEST template — production deployments use a separate
# parameterised template with stricter resource limits and secrets. # parameterised template with stricter resource limits and secrets.
@@ -76,7 +82,7 @@ services:
deploy: deploy:
replicas: 1 replicas: 1
restart_policy: restart_policy:
condition: on-failure condition: any
delay: 5s delay: 5s
max_attempts: 3 max_attempts: 3
labels: labels:
@@ -88,11 +94,15 @@ services:
- 'traefik.http.routers.${STACK_NAME}.tls.certresolver=letsencrypt' - 'traefik.http.routers.${STACK_NAME}.tls.certresolver=letsencrypt'
- 'traefik.http.services.${STACK_NAME}.loadbalancer.server.port=3000' - 'traefik.http.services.${STACK_NAME}.loadbalancer.server.port=3000'
healthcheck: healthcheck:
test: ['CMD', 'wget', '-qO-', 'http://localhost:3000/health'] test:
- 'CMD'
- 'node'
- '-e'
- "require('http').get('http://127.0.0.1:3000/health',r=>process.exit(r.statusCode===200?0:1)).on('error',()=>process.exit(1))"
interval: 30s interval: 30s
timeout: 5s timeout: 5s
retries: 3 retries: 5
start_period: 20s start_period: 60s
depends_on: depends_on:
- postgres - postgres
- valkey - valkey