fix(api): MS22 Phase 1 post-coding audit (#625)

Co-authored-by: Jason Woltje <jason@diversecanvas.com> Co-committed-by: Jason Woltje <jason@diversecanvas.com>
2026-03-01 19:53:49 +00:00
parent a25a77a43c
commit 3607554902
5 changed files with 233 additions and 27 deletions
--- a/docs/audits/ms22-phase1-audit.md
+++ b/docs/audits/ms22-phase1-audit.md
@@ -0,0 +1,157 @@
+# MS22 Phase 1 Module Audit
+
+Date: 2026-03-01
+Branch: `fix/ms22-audit`
+Scope:
+
+- `apps/api/src/container-lifecycle/`
+- `apps/api/src/crypto/`
+- `apps/api/src/agent-config/`
+- `apps/api/src/onboarding/`
+- `apps/api/src/fleet-settings/`
+- `apps/api/src/chat-proxy/`
+
+## Summary
+
+Audit completed for module wiring, security controls, input validation, and error handling.
+
+Findings:
+
+1. `chat-proxy`: raw internal/upstream error messages were returned to clients over SSE (fixed).
+2. `chat-proxy`: proxy requests to OpenClaw did not forward the container bearer token returned by lifecycle startup (fixed).
+3. `agent-config`: token validation returned early and used length-gated compare logic, creating avoidable timing side-channel behavior (hardened).
+
+## Module Review Results
+
+### 1) `container-lifecycle`
+
+- NestJS module dependency audit:
+  - `ContainerLifecycleModule` imports `ConfigModule`, `PrismaModule`, and `CryptoModule` required by `ContainerLifecycleService`.
+  - Providers/exports are correct (`ContainerLifecycleService` provided and exported).
+- Security review:
+  - Container operations are user-scoped by `userId` and do not expose cross-user selectors in this module.
+  - AES token generation/decryption delegated to `CryptoService`.
+- Input validation:
+  - No controller endpoints in this module; no direct request DTO surface here.
+- Error handling:
+  - No direct HTTP layer here; errors flow to callers/global filter.
+- Finding status: **No issues found in this module**.
+
+### 2) `crypto`
+
+- NestJS module dependency audit:
+  - `CryptoModule` correctly imports `ConfigModule` for `ConfigService`.
+  - `CryptoService` is correctly provided/exported.
+- Security review:
+  - AES-256-GCM is implemented correctly.
+  - 96-bit IV generated via `randomBytes(12)` per encryption.
+  - Auth tag captured and verified on decrypt (`setAuthTag` + `decipher.final()`).
+  - HKDF derives a fixed 32-byte key from `MOSAIC_SECRET_KEY`.
+- Input validation:
+  - No DTO/request surface in this module.
+- Error handling:
+  - Decrypt failures are normalized to `Failed to decrypt value`.
+- Finding status: **No issues found in this module**.
+
+### 3) `agent-config`
+
+- NestJS module dependency audit:
+  - `AgentConfigModule` imports `PrismaModule` + `CryptoModule`; `AgentConfigService` and `AgentConfigGuard` are provided.
+  - Controller/guard/service wiring is correct.
+- Security review:
+  - Bearer token comparisons used `timingSafeEqual`, but returned early on first match and performed length-gated comparison.
+  - Internal route (`/api/internal/agent-config/:id`) is access-controlled by bearer token guard and container-id match (`containerAuth.id === :id`).
+- Input validation:
+  - Header token extraction and route param are manually handled (no DTO for `:id`, acceptable for current use but should remain constrained).
+- Error handling:
+  - Service throws typed Nest exceptions for not-found paths.
+- Finding status: **Issue found and fixed**.
+
+### 4) `onboarding`
+
+- NestJS module dependency audit:
+  - `OnboardingModule` imports required dependencies (`PrismaModule`, `CryptoModule`; `ConfigModule` currently unused but harmless).
+  - Providers/controllers are correctly declared.
+- Security review:
+  - `OnboardingGuard` blocks all mutating onboarding routes once `onboarding.completed=true`.
+  - Onboarding cannot be re-run via guarded endpoints after completion.
+- Input validation:
+  - DTOs use `class-validator` decorators for all request bodies.
+- Error handling:
+  - Uses typed Nest exceptions (`ConflictException`, `BadRequestException`).
+- Finding status: **No issues found in this module**.
+
+### 5) `fleet-settings`
+
+- NestJS module dependency audit:
+  - `FleetSettingsModule` imports `AuthModule`, `PrismaModule`, `CryptoModule` required by its controller/service.
+  - Provider/export wiring is correct for `FleetSettingsService`.
+- Security review:
+  - Class-level `AuthGuard` protects all routes.
+  - Admin-only routes additionally use `AdminGuard` (`oidc` and `breakglass/reset-password`).
+  - Provider list/get responses do not expose `apiKey`.
+  - OIDC read response intentionally omits `clientSecret`.
+- Input validation:
+  - DTOs are decorated with `class-validator`.
+- Error handling:
+  - Ownership/not-found conditions use typed exceptions.
+- Finding status: **No issues found in this module**.
+
+### 6) `chat-proxy`
+
+- NestJS module dependency audit:
+  - `ChatProxyModule` imports `AuthModule`, `PrismaModule`, `ContainerLifecycleModule` needed by controller/service.
+  - Provider/controller wiring is correct.
+- Security review:
+  - User identity comes from `AuthGuard`; no user-provided container selector, so no cross-user container proxy path found.
+  - **Issue fixed:** gateway bearer token was not forwarded on proxied requests.
+  - **Issue fixed:** SSE error events exposed raw internal exception messages.
+- Input validation:
+  - `ChatStreamDto` + nested `ChatMessageDto` use `class-validator` decorators.
+- Error handling:
+  - **Issue fixed:** controller now emits safe client error messages and logs details server-side.
+- Finding status: **Issues found and fixed**.
+
+## Security Checklist Outcomes
+
+- `fleet-settings`: admin-only routes are guarded; non-admin users cannot access OIDC or breakglass reset routes. Provider secrets are not returned in provider read endpoints.
+- `agent-config`: token comparison hardened; route remains gated by bearer token + container id binding.
+- `onboarding`: guarded mutating endpoints cannot run after completion.
+- `crypto`: AES-256-GCM usage is correct (random IV, auth-tag verification, fixed 32-byte key derivation).
+- `chat-proxy`: user cannot target another user’s container; proxy now authenticates to OpenClaw using per-container bearer token.
+
+## Input Validation
+
+- DTO coverage is present in onboarding, fleet-settings, and chat-proxy request bodies.
+- No critical unvalidated body inputs found in scoped modules.
+
+## Error Handling
+
+- Global API layer has a sanitizing `GlobalExceptionFilter`.
+- `chat-proxy` used manual response handling (`@Res`) and bypassed global filter; this was corrected by sending safe generic SSE errors.
+- No additional critical sensitive-data leaks found in reviewed scope.
+
+## Changes Made
+
+1. Hardened token comparison behavior in:
+   - `apps/api/src/agent-config/agent-config.service.ts`
+   - Changes:
+     - Compare SHA-256 digests with `timingSafeEqual`.
+     - Avoid early return during scan to reduce timing signal differences.
+
+2. Fixed OpenClaw auth forwarding and error leak risk in:
+   - `apps/api/src/chat-proxy/chat-proxy.service.ts`
+   - `apps/api/src/chat-proxy/chat-proxy.controller.ts`
+   - `apps/api/src/chat-proxy/chat-proxy.service.spec.ts`
+   - Changes:
+     - Forward `Authorization: Bearer <gatewayToken>` when proxying chat requests.
+     - Stop returning raw internal/upstream error text to clients over SSE.
+     - Log details server-side and return safe client-facing messages.
+
+## Validation Commands
+
+Required quality gate command run:
+
+- `pnpm turbo lint typecheck --filter=@mosaic/api`
+
+(Results captured in session logs.)