stack/docs/audits/ms22-phase1-audit.md

# MS22 Phase 1 Module Audit

Date: 2026-03-01
Branch: `fix/ms22-audit`
Scope:

- `apps/api/src/container-lifecycle/`
- `apps/api/src/crypto/`
- `apps/api/src/agent-config/`
- `apps/api/src/onboarding/`
- `apps/api/src/fleet-settings/`
- `apps/api/src/chat-proxy/`

## Summary

Audit completed for module wiring, security controls, input validation, and error handling.

Findings:

1. `chat-proxy`: raw internal/upstream error messages were returned to clients over SSE (fixed).
2. `chat-proxy`: proxy requests to OpenClaw did not forward the container bearer token returned by lifecycle startup (fixed).
3. `agent-config`: token validation returned early and used length-gated compare logic, creating avoidable timing side-channel behavior (hardened).

## Module Review Results

### 1) `container-lifecycle`

- NestJS module dependency audit:
  - `ContainerLifecycleModule` imports `ConfigModule`, `PrismaModule`, and `CryptoModule` required by `ContainerLifecycleService`.
  - Providers/exports are correct (`ContainerLifecycleService` provided and exported).
- Security review:
  - Container operations are user-scoped by `userId` and do not expose cross-user selectors in this module.
  - AES token generation/decryption delegated to `CryptoService`.
- Input validation:
  - No controller endpoints in this module; no direct request DTO surface here.
- Error handling:
  - No direct HTTP layer here; errors flow to callers/global filter.
- Finding status: **No issues found in this module**.

### 2) `crypto`

- NestJS module dependency audit:
  - `CryptoModule` correctly imports `ConfigModule` for `ConfigService`.
  - `CryptoService` is correctly provided/exported.
- Security review:
  - AES-256-GCM is implemented correctly.
  - 96-bit IV generated via `randomBytes(12)` per encryption.
  - Auth tag captured and verified on decrypt (`setAuthTag` + `decipher.final()`).
  - HKDF derives a fixed 32-byte key from `MOSAIC_SECRET_KEY`.
- Input validation:
  - No DTO/request surface in this module.
- Error handling:
  - Decrypt failures are normalized to `Failed to decrypt value`.
- Finding status: **No issues found in this module**.

### 3) `agent-config`

- NestJS module dependency audit:
  - `AgentConfigModule` imports `PrismaModule` + `CryptoModule`; `AgentConfigService` and `AgentConfigGuard` are provided.
  - Controller/guard/service wiring is correct.
- Security review:
  - Bearer token comparisons used `timingSafeEqual`, but returned early on first match and performed length-gated comparison.
  - Internal route (`/api/internal/agent-config/:id`) is access-controlled by bearer token guard and container-id match (`containerAuth.id === :id`).
- Input validation:
  - Header token extraction and route param are manually handled (no DTO for `:id`, acceptable for current use but should remain constrained).
- Error handling:
  - Service throws typed Nest exceptions for not-found paths.
- Finding status: **Issue found and fixed**.

### 4) `onboarding`

- NestJS module dependency audit:
  - `OnboardingModule` imports required dependencies (`PrismaModule`, `CryptoModule`; `ConfigModule` currently unused but harmless).
  - Providers/controllers are correctly declared.
- Security review:
  - `OnboardingGuard` blocks all mutating onboarding routes once `onboarding.completed=true`.
  - Onboarding cannot be re-run via guarded endpoints after completion.
- Input validation:
  - DTOs use `class-validator` decorators for all request bodies.
- Error handling:
  - Uses typed Nest exceptions (`ConflictException`, `BadRequestException`).
- Finding status: **No issues found in this module**.

### 5) `fleet-settings`

- NestJS module dependency audit:
  - `FleetSettingsModule` imports `AuthModule`, `PrismaModule`, `CryptoModule` required by its controller/service.
  - Provider/export wiring is correct for `FleetSettingsService`.
- Security review:
  - Class-level `AuthGuard` protects all routes.
  - Admin-only routes additionally use `AdminGuard` (`oidc` and `breakglass/reset-password`).
  - Provider list/get responses do not expose `apiKey`.
  - OIDC read response intentionally omits `clientSecret`.
- Input validation:
  - DTOs are decorated with `class-validator`.
- Error handling:
  - Ownership/not-found conditions use typed exceptions.
- Finding status: **No issues found in this module**.

### 6) `chat-proxy`

- NestJS module dependency audit:
  - `ChatProxyModule` imports `AuthModule`, `PrismaModule`, `ContainerLifecycleModule` needed by controller/service.
  - Provider/controller wiring is correct.
- Security review:
  - User identity comes from `AuthGuard`; no user-provided container selector, so no cross-user container proxy path found.
  - **Issue fixed:** gateway bearer token was not forwarded on proxied requests.
  - **Issue fixed:** SSE error events exposed raw internal exception messages.
- Input validation:
  - `ChatStreamDto` + nested `ChatMessageDto` use `class-validator` decorators.
- Error handling:
  - **Issue fixed:** controller now emits safe client error messages and logs details server-side.
- Finding status: **Issues found and fixed**.

## Security Checklist Outcomes

- `fleet-settings`: admin-only routes are guarded; non-admin users cannot access OIDC or breakglass reset routes. Provider secrets are not returned in provider read endpoints.
- `agent-config`: token comparison hardened; route remains gated by bearer token + container id binding.
- `onboarding`: guarded mutating endpoints cannot run after completion.
- `crypto`: AES-256-GCM usage is correct (random IV, auth-tag verification, fixed 32-byte key derivation).
- `chat-proxy`: user cannot target another user’s container; proxy now authenticates to OpenClaw using per-container bearer token.

## Input Validation

- DTO coverage is present in onboarding, fleet-settings, and chat-proxy request bodies.
- No critical unvalidated body inputs found in scoped modules.

## Error Handling

- Global API layer has a sanitizing `GlobalExceptionFilter`.
- `chat-proxy` used manual response handling (`@Res`) and bypassed global filter; this was corrected by sending safe generic SSE errors.
- No additional critical sensitive-data leaks found in reviewed scope.

## Changes Made

1. Hardened token comparison behavior in:
   - `apps/api/src/agent-config/agent-config.service.ts`
   - Changes:
     - Compare SHA-256 digests with `timingSafeEqual`.
     - Avoid early return during scan to reduce timing signal differences.

2. Fixed OpenClaw auth forwarding and error leak risk in:
   - `apps/api/src/chat-proxy/chat-proxy.service.ts`
   - `apps/api/src/chat-proxy/chat-proxy.controller.ts`
   - `apps/api/src/chat-proxy/chat-proxy.service.spec.ts`
   - Changes:
     - Forward `Authorization: Bearer <gatewayToken>` when proxying chat requests.
     - Stop returning raw internal/upstream error text to clients over SSE.
     - Log details server-side and return safe client-facing messages.

## Validation Commands

Required quality gate command run:

- `pnpm turbo lint typecheck --filter=@mosaic/api`

(Results captured in session logs.)