Files
stack/docs/audits/ms22-phase1-audit.md
Jason Woltje 3607554902
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
fix(api): MS22 Phase 1 post-coding audit (#625)
Co-authored-by: Jason Woltje <jason@diversecanvas.com>
Co-committed-by: Jason Woltje <jason@diversecanvas.com>
2026-03-01 19:53:49 +00:00

158 lines
6.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# MS22 Phase 1 Module Audit
Date: 2026-03-01
Branch: `fix/ms22-audit`
Scope:
- `apps/api/src/container-lifecycle/`
- `apps/api/src/crypto/`
- `apps/api/src/agent-config/`
- `apps/api/src/onboarding/`
- `apps/api/src/fleet-settings/`
- `apps/api/src/chat-proxy/`
## Summary
Audit completed for module wiring, security controls, input validation, and error handling.
Findings:
1. `chat-proxy`: raw internal/upstream error messages were returned to clients over SSE (fixed).
2. `chat-proxy`: proxy requests to OpenClaw did not forward the container bearer token returned by lifecycle startup (fixed).
3. `agent-config`: token validation returned early and used length-gated compare logic, creating avoidable timing side-channel behavior (hardened).
## Module Review Results
### 1) `container-lifecycle`
- NestJS module dependency audit:
- `ContainerLifecycleModule` imports `ConfigModule`, `PrismaModule`, and `CryptoModule` required by `ContainerLifecycleService`.
- Providers/exports are correct (`ContainerLifecycleService` provided and exported).
- Security review:
- Container operations are user-scoped by `userId` and do not expose cross-user selectors in this module.
- AES token generation/decryption delegated to `CryptoService`.
- Input validation:
- No controller endpoints in this module; no direct request DTO surface here.
- Error handling:
- No direct HTTP layer here; errors flow to callers/global filter.
- Finding status: **No issues found in this module**.
### 2) `crypto`
- NestJS module dependency audit:
- `CryptoModule` correctly imports `ConfigModule` for `ConfigService`.
- `CryptoService` is correctly provided/exported.
- Security review:
- AES-256-GCM is implemented correctly.
- 96-bit IV generated via `randomBytes(12)` per encryption.
- Auth tag captured and verified on decrypt (`setAuthTag` + `decipher.final()`).
- HKDF derives a fixed 32-byte key from `MOSAIC_SECRET_KEY`.
- Input validation:
- No DTO/request surface in this module.
- Error handling:
- Decrypt failures are normalized to `Failed to decrypt value`.
- Finding status: **No issues found in this module**.
### 3) `agent-config`
- NestJS module dependency audit:
- `AgentConfigModule` imports `PrismaModule` + `CryptoModule`; `AgentConfigService` and `AgentConfigGuard` are provided.
- Controller/guard/service wiring is correct.
- Security review:
- Bearer token comparisons used `timingSafeEqual`, but returned early on first match and performed length-gated comparison.
- Internal route (`/api/internal/agent-config/:id`) is access-controlled by bearer token guard and container-id match (`containerAuth.id === :id`).
- Input validation:
- Header token extraction and route param are manually handled (no DTO for `:id`, acceptable for current use but should remain constrained).
- Error handling:
- Service throws typed Nest exceptions for not-found paths.
- Finding status: **Issue found and fixed**.
### 4) `onboarding`
- NestJS module dependency audit:
- `OnboardingModule` imports required dependencies (`PrismaModule`, `CryptoModule`; `ConfigModule` currently unused but harmless).
- Providers/controllers are correctly declared.
- Security review:
- `OnboardingGuard` blocks all mutating onboarding routes once `onboarding.completed=true`.
- Onboarding cannot be re-run via guarded endpoints after completion.
- Input validation:
- DTOs use `class-validator` decorators for all request bodies.
- Error handling:
- Uses typed Nest exceptions (`ConflictException`, `BadRequestException`).
- Finding status: **No issues found in this module**.
### 5) `fleet-settings`
- NestJS module dependency audit:
- `FleetSettingsModule` imports `AuthModule`, `PrismaModule`, `CryptoModule` required by its controller/service.
- Provider/export wiring is correct for `FleetSettingsService`.
- Security review:
- Class-level `AuthGuard` protects all routes.
- Admin-only routes additionally use `AdminGuard` (`oidc` and `breakglass/reset-password`).
- Provider list/get responses do not expose `apiKey`.
- OIDC read response intentionally omits `clientSecret`.
- Input validation:
- DTOs are decorated with `class-validator`.
- Error handling:
- Ownership/not-found conditions use typed exceptions.
- Finding status: **No issues found in this module**.
### 6) `chat-proxy`
- NestJS module dependency audit:
- `ChatProxyModule` imports `AuthModule`, `PrismaModule`, `ContainerLifecycleModule` needed by controller/service.
- Provider/controller wiring is correct.
- Security review:
- User identity comes from `AuthGuard`; no user-provided container selector, so no cross-user container proxy path found.
- **Issue fixed:** gateway bearer token was not forwarded on proxied requests.
- **Issue fixed:** SSE error events exposed raw internal exception messages.
- Input validation:
- `ChatStreamDto` + nested `ChatMessageDto` use `class-validator` decorators.
- Error handling:
- **Issue fixed:** controller now emits safe client error messages and logs details server-side.
- Finding status: **Issues found and fixed**.
## Security Checklist Outcomes
- `fleet-settings`: admin-only routes are guarded; non-admin users cannot access OIDC or breakglass reset routes. Provider secrets are not returned in provider read endpoints.
- `agent-config`: token comparison hardened; route remains gated by bearer token + container id binding.
- `onboarding`: guarded mutating endpoints cannot run after completion.
- `crypto`: AES-256-GCM usage is correct (random IV, auth-tag verification, fixed 32-byte key derivation).
- `chat-proxy`: user cannot target another users container; proxy now authenticates to OpenClaw using per-container bearer token.
## Input Validation
- DTO coverage is present in onboarding, fleet-settings, and chat-proxy request bodies.
- No critical unvalidated body inputs found in scoped modules.
## Error Handling
- Global API layer has a sanitizing `GlobalExceptionFilter`.
- `chat-proxy` used manual response handling (`@Res`) and bypassed global filter; this was corrected by sending safe generic SSE errors.
- No additional critical sensitive-data leaks found in reviewed scope.
## Changes Made
1. Hardened token comparison behavior in:
- `apps/api/src/agent-config/agent-config.service.ts`
- Changes:
- Compare SHA-256 digests with `timingSafeEqual`.
- Avoid early return during scan to reduce timing signal differences.
2. Fixed OpenClaw auth forwarding and error leak risk in:
- `apps/api/src/chat-proxy/chat-proxy.service.ts`
- `apps/api/src/chat-proxy/chat-proxy.controller.ts`
- `apps/api/src/chat-proxy/chat-proxy.service.spec.ts`
- Changes:
- Forward `Authorization: Bearer <gatewayToken>` when proxying chat requests.
- Stop returning raw internal/upstream error text to clients over SSE.
- Log details server-side and return safe client-facing messages.
## Validation Commands
Required quality gate command run:
- `pnpm turbo lint typecheck --filter=@mosaic/api`
(Results captured in session logs.)