docs(#346): Add credential security architecture design document

Comprehensive design document for M7-CredentialSecurity milestone covering hybrid OpenBao Transit + PostgreSQL encryption approach, threat model, UserCredential data model, API design, RLS enforcement strategy, turnkey OpenBao Docker integration, and 5-phase implementation plan. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-07 11:15:58 -06:00
parent ec87c5479b
commit 51ce32cc76
1 changed files with 412 additions and 0 deletions
--- a/docs/design/credential-security.md
+++ b/docs/design/credential-security.md
@@ -0,0 +1,412 @@
+# Credential Security Architecture
+
+**Version:** 0.0.1
+**Status:** Approved
+**Author:** Mosaic Stack Team
+**Date:** 2026-02-07
+**Epic:** [#346](https://git.mosaicstack.dev/mosaic/stack/issues/346)
+**Milestone:** M7-CredentialSecurity
+
+## Table of Contents
+
+1. [Problem Statement](#problem-statement)
+2. [Threat Model](#threat-model)
+3. [Architecture Decision](#architecture-decision)
+4. [System Architecture](#system-architecture)
+5. [Data Model](#data-model)
+6. [API Design](#api-design)
+7. [RLS Enforcement](#rls-enforcement)
+8. [OpenBao Integration](#openbao-integration)
+9. [Federation Isolation](#federation-isolation)
+10. [Implementation Phases](#implementation-phases)
+11. [Risk Mitigation](#risk-mitigation)
+
+---
+
+## Problem Statement
+
+Mosaic Stack stores sensitive user credentials with critical security gaps:
+
+1. **OAuth tokens stored plaintext** in the `accounts` table (`access_token`, `refresh_token`,
+   `id_token`)
+2. **LLM API keys stored plaintext** in `llm_provider_instances.config` JSON field
+3. **RLS enabled but never enforced** — all 23 tables have policies but no `FORCE ROW LEVEL
+SECURITY`, and Prisma connects as table owner, silently bypassing all policies
+4. **No RLS on auth tables** — `accounts`, `sessions`, `verifications` have no policies
+5. **No user credential management** — no model, API, or UI for storing user-provided tokens
+6. **Master encryption key on disk** — `ENCRYPTION_KEY` in `.env` file
+
+Users will store API keys, git tokens, and OAuth tokens for integrations. This data is private
+and must never leak between users or across federation boundaries.
+
+## Threat Model
+
+### At-Rest Threats (mitigated by encryption)
+
+| Threat                      | Impact                         | Mitigation                                  |
+| --------------------------- | ------------------------------ | ------------------------------------------- |
+| Database backup exposure    | All credentials leaked         | Column-level encryption via OpenBao Transit |
+| SQL injection               | Attacker reads encrypted blobs | Encrypted data useless without Transit key  |
+| Database admin access       | Full table reads               | Encrypted columns, RLS enforcement          |
+| Filesystem access to `.env` | Master key compromised         | OpenBao Shamir key splitting (production)   |
+
+### In-Use Threats (mitigated by access control)
+
+| Threat                  | Impact                            | Mitigation                           |
+| ----------------------- | --------------------------------- | ------------------------------------ |
+| Cross-user data access  | User A sees User B's tokens       | RLS policies with FORCE enforcement  |
+| Federation data leakage | Remote instance gets credentials  | Explicit deny-list in QueryService   |
+| Application logic bugs  | Wrong user gets wrong credential  | RLS as defense-in-depth layer        |
+| Compromised app server  | Memory access to decrypted values | Short-lived plaintext, audit logging |
+
+### Not Mitigated
+
+Full application server compromise with code execution grants access to decrypted credentials
+in memory. This is an accepted risk — no encryption scheme protects against a fully compromised
+application process.
+
+## Architecture Decision
+
+### Approach: Hybrid OpenBao + PostgreSQL Encryption
+
+After evaluating three approaches, the hybrid model was selected:
+
+| Concern                       | Pure DB (pgcrypto) | Pure Vault        | Hybrid (selected)              |
+| ----------------------------- | ------------------ | ----------------- | ------------------------------ |
+| Key on disk (turtles problem) | `.env` on disk     | Shamir-split      | Shamir-split                   |
+| Audit trail                   | Custom logging     | Built-in          | Built-in                       |
+| New infrastructure            | None               | OpenBao container | OpenBao container              |
+| Per-user isolation            | RLS only           | Vault policies    | RLS + encryption               |
+| Turnkey deployment            | Yes                | Manual unsealing  | Auto-unseal via init container |
+| Dynamic secrets               | No                 | Yes               | Yes                            |
+| License cost                  | Free               | Free (OpenBao)    | Free                           |
+
+**Why not pure DB?** The "turtles all the way down" problem — encrypting in the DB still
+requires a master key in an environment variable on disk. If the server is compromised, the
+key is compromised.
+
+**Why not pure Vault?** Operational complexity. Storing all credentials in Vault requires
+significant Vault policy management. PostgreSQL with RLS provides a more natural data model
+for user-scoped credentials.
+
+**Why hybrid?** Best of both worlds — PostgreSQL stores encrypted credentials with RLS
+enforcement, OpenBao handles key management via Transit engine. The master key never exists
+on disk as a single value (Shamir-split in production).
+
+### Why OpenBao (not HashiCorp Vault)?
+
+- Truly open-source (Linux Foundation, OSI license)
+- Drop-in Vault replacement (API-compatible)
+- No Business Source License concerns
+- Production-ready (v2.0)
+- Smaller, focused ecosystem
+
+## System Architecture
+
+```
+                     ┌──────────────────────┐
+                     │   Next.js Frontend   │
+                     │  /settings/creds     │
+                     └──────────┬───────────┘
+                                │ HTTPS
+                     ┌──────────▼───────────┐
+                     │     NestJS API       │
+                     │  CredentialsService  │
+                     │     VaultService     │
+                     └───┬──────────────┬───┘
+                         │              │
+              Ciphertext │              │ Transit API
+              (storage)  │              │ (encrypt/decrypt)
+                         │              │
+              ┌──────────▼──┐    ┌──────▼──────────┐
+              │ PostgreSQL  │    │    OpenBao       │
+              │ + RLS       │    │  Transit Engine  │
+              │ + pgcrypto  │    │  + AppRole Auth  │
+              └─────────────┘    │  + Audit Log     │
+                                 └─────────────────┘
+```
+
+### Data Flow: Store Credential
+
+1. User submits API key via frontend form
+2. NestJS `CredentialsController` receives plaintext value
+3. `CredentialsService` calls `VaultService.encrypt(value, TransitKey.CREDENTIALS)`
+4. `VaultService` calls OpenBao Transit API: `POST /v1/transit/encrypt/mosaic-credentials`
+5. Transit returns ciphertext: `vault:v1:base64data`
+6. Ciphertext stored in `user_credentials.encrypted_value`
+7. Masked value (`****abcd`) stored in `user_credentials.masked_value`
+8. Activity log entry: `CREDENTIAL_CREATED`
+9. Response includes masked value only — never the ciphertext or plaintext
+
+### Data Flow: Retrieve Credential
+
+1. User clicks "Reveal" on credential card
+2. Frontend calls `GET /api/credentials/:id/value`
+3. RLS-scoped query fetches row (user can only see own rows)
+4. `VaultService.decrypt(ciphertext, TransitKey.CREDENTIALS)`
+5. Transit returns plaintext
+6. `lastUsedAt` updated on credential row
+7. Activity log entry: `CREDENTIAL_ACCESSED`
+8. Plaintext returned to frontend, auto-hidden after 30 seconds
+
+### Fallback: No OpenBao Available
+
+When OpenBao is unavailable (local dev, CI), `VaultService` falls back to the existing
+`CryptoService` (AES-256-GCM with `ENCRYPTION_KEY` from environment).
+
+Ciphertext format distinguishes the source:
+
+- `vault:v1:...` — OpenBao Transit ciphertext
+- `aes:iv:authTag:encrypted` — AES-256-GCM fallback
+- No prefix — legacy plaintext (backward compatible, triggers encryption on next write)
+
+## Data Model
+
+### UserCredential Table
+
+```
+user_credentials
+├── id              UUID (PK)
+├── user_id         UUID (FK -> users)
+├── workspace_id    UUID? (FK -> workspaces, nullable for user-global)
+├── name            VARCHAR       -- "GitHub Personal Token"
+├── provider        VARCHAR       -- "github", "openai", "custom"
+├── type            CredentialType (API_KEY, OAUTH_TOKEN, ACCESS_TOKEN, SECRET, PASSWORD, CUSTOM)
+├── scope           CredentialScope (USER, WORKSPACE, SYSTEM)
+├── encrypted_value TEXT          -- OpenBao Transit ciphertext
+├── masked_value    VARCHAR?      -- "****abcd"
+├── description     TEXT?
+├── expires_at      TIMESTAMPTZ?
+├── last_used_at    TIMESTAMPTZ?
+├── metadata        JSONB         -- provider-specific data
+├── is_active       BOOLEAN       -- soft delete
+├── rotated_at      TIMESTAMPTZ?
+├── created_at      TIMESTAMPTZ
+└── updated_at      TIMESTAMPTZ
+
+UNIQUE(user_id, workspace_id, provider, name)
+```
+
+### Scope Semantics
+
+| Scope     | Who Can Access     | Use Case                      |
+| --------- | ------------------ | ----------------------------- |
+| USER      | Owner only         | Personal API keys, git tokens |
+| WORKSPACE | Workspace admins   | Shared integration tokens     |
+| SYSTEM    | System admins only | Platform-level secrets        |
+
+### Enum Additions
+
+- `EntityType`: add `CREDENTIAL`
+- `ActivityAction`: add `CREDENTIAL_CREATED`, `CREDENTIAL_ACCESSED`, `CREDENTIAL_ROTATED`,
+  `CREDENTIAL_REVOKED`
+
+## API Design
+
+### User Credential Endpoints
+
+```
+POST   /api/credentials              Create credential (encrypt + store)
+GET    /api/credentials              List credentials (masked values only)
+GET    /api/credentials/:id          Get single credential (masked)
+GET    /api/credentials/:id/value    Decrypt and return value (audit logged)
+PATCH  /api/credentials/:id          Update metadata (not value)
+POST   /api/credentials/:id/rotate   Replace with new encrypted value
+DELETE /api/credentials/:id          Soft-delete (isActive=false)
+```
+
+Guards: `AuthGuard` + `WorkspaceGuard` + `PermissionGuard`
+
+### Admin Secret Endpoints
+
+```
+POST   /api/admin/secrets            Create system-level secret
+GET    /api/admin/secrets            List system secrets (masked)
+PATCH  /api/admin/secrets/:id        Update system secret
+DELETE /api/admin/secrets/:id        Revoke system secret
+```
+
+Guards: `AuthGuard` + `AdminGuard`
+
+### Security Invariant
+
+**Listing endpoints never return plaintext or ciphertext.** Only `maskedValue` appears in
+list/get responses. Decryption requires an explicit `GET /value` call, which is always
+audit-logged.
+
+## RLS Enforcement
+
+### Current Problem
+
+All 23 RLS-enabled tables use `ENABLE ROW LEVEL SECURITY` but never `FORCE ROW LEVEL SECURITY`.
+Prisma connects as the database owner role (`mosaic`), which bypasses all RLS policies by default.
+The RLS context utilities in `apps/api/src/lib/db-context.ts` are fully implemented but never
+called by any service.
+
+### Solution
+
+1. **FORCE ROW LEVEL SECURITY** on auth and credential tables
+2. **Owner bypass policy** for migration compatibility
+3. **RLS context interceptor** sets session variables in every authenticated request
+
+```sql
+ALTER TABLE user_credentials FORCE ROW LEVEL SECURITY;
+
+-- Owner bypass for migrations
+CREATE POLICY credentials_owner_bypass ON user_credentials
+  FOR ALL TO mosaic USING (true);
+
+-- User access policy
+CREATE POLICY credentials_user_access ON user_credentials
+  FOR ALL USING (
+    (scope = 'USER' AND user_id = current_user_id())
+    OR (scope = 'WORKSPACE' AND workspace_id IS NOT NULL
+        AND is_workspace_admin(workspace_id, current_user_id()))
+  );
+```
+
+### RLS Context Interceptor
+
+Registered as `APP_INTERCEPTOR`, wraps all authenticated requests:
+
+1. Extracts `userId` from `AuthGuard`
+2. Extracts `workspaceId` from `WorkspaceGuard`
+3. Executes `SET LOCAL app.current_user_id = '{userId}'` in Prisma transaction
+4. Uses `AsyncLocalStorage` to propagate transaction client to services
+
+## OpenBao Integration
+
+### Turnkey Docker Deployment
+
+Two containers added to `docker/docker-compose.yml`:
+
+1. **openbao** — OpenBao server with file storage backend
+2. **openbao-init** — Sidecar that auto-initializes, auto-unseals, and configures Transit
+
+On first `docker compose up -d`:
+
+- OpenBao initializes with 1-of-1 key share (turnkey simplicity)
+- Transit secrets engine enabled
+- Four named encryption keys created
+- AppRole created with Transit-only policy
+- Credentials saved to shared Docker volume
+
+On restart:
+
+- `openbao-init` reads stored unseal key and auto-unseals
+
+### Named Transit Keys
+
+| Key                     | Purpose                                          |
+| ----------------------- | ------------------------------------------------ |
+| `mosaic-credentials`    | User-stored credentials (API keys, git tokens)   |
+| `mosaic-account-tokens` | BetterAuth OAuth tokens in accounts table        |
+| `mosaic-federation`     | Federation private keys (replaces CryptoService) |
+| `mosaic-llm-config`     | LLM provider API keys                            |
+
+### Production Hardening
+
+For production deployments (documented in `docs/OPENBAO.md`):
+
+- Upgrade to 3-of-5 Shamir key splitting: `bao operator rekey -key-shares=5 -key-threshold=3`
+- Enable TLS on listener
+- Use external KMS for auto-unseal (AWS KMS, GCP CKMS, Azure Key Vault)
+- Enable audit logging: `bao audit enable file file_path=/bao/logs/audit.log`
+- Use Raft or Consul storage backend for HA
+- Revoke root token after initial setup
+
+## Federation Isolation
+
+Credentials must never leak across federation boundaries:
+
+1. **RLS enforcement** — Federated queries go through `QueryService` which operates within a
+   specific workspace context. RLS policies restrict to authenticated user.
+2. **Explicit deny-list** — `QueryService` denies queries for `UserCredential` entity type
+3. **Transit key isolation** — Each credential type uses a separate named key. Federation keys
+   (`mosaic-federation`) cannot decrypt user credentials (`mosaic-credentials`).
+4. **Endpoint isolation** — Credential API requires session auth. Federated requests use
+   signature-based auth and cannot access credential endpoints.
+
+## Implementation Phases
+
+### Phase 1: Security Foundations (p0)
+
+Fix immediate security gaps:
+
+| Issue | Title                                                  |
+| ----- | ------------------------------------------------------ |
+| #351  | Create RLS context interceptor (fix SEC-API-4)         |
+| #350  | Add RLS policies to auth tables with FORCE enforcement |
+| #352  | Encrypt existing plaintext Account tokens              |
+
+### Phase 2: OpenBao Integration (p1)
+
+Add OpenBao and VaultService:
+
+| Issue | Title                                                      |
+| ----- | ---------------------------------------------------------- |
+| #357  | Add OpenBao to Docker Compose (turnkey setup)              |
+| #353  | Create VaultService NestJS module for OpenBao Transit      |
+| #354  | Write OpenBao documentation and production hardening guide |
+
+### Phase 3: User Credential Storage (p1)
+
+Build the credential management system:
+
+| Issue | Title                                                |
+| ----- | ---------------------------------------------------- |
+| #355  | Create UserCredential Prisma model with RLS policies |
+| #356  | Build credential CRUD API endpoints                  |
+
+### Phase 4: Frontend (p1)
+
+User-facing credential management:
+
+| Issue | Title                                      |
+| ----- | ------------------------------------------ |
+| #358  | Build frontend credential management pages |
+
+### Phase 5: Migration and Hardening (p1-p3)
+
+Encrypt remaining plaintext and harden federation:
+
+| Issue | Title                                     |
+| ----- | ----------------------------------------- |
+| #359  | Encrypt LLM provider API keys in database |
+| #360  | Federation credential isolation           |
+| #361  | Credential audit log viewer (stretch)     |
+
+### Phase Dependencies
+
+```
+Phase 1 (RLS + Token Encryption)
+  └── Phase 2 (OpenBao + VaultService)
+        ├── Phase 3 (Credential Model + API)
+        │     └── Phase 4 (Frontend)
+        └── Phase 5 (LLM Migration + Federation)
+```
+
+## Risk Mitigation
+
+| Risk                               | Mitigation                                                           |
+| ---------------------------------- | -------------------------------------------------------------------- |
+| FORCE RLS breaks Prisma migrations | Owner bypass policy grants full access to `mosaic` role              |
+| FORCE RLS breaks BetterAuth writes | Interceptor sets user context; BetterAuth uses same client           |
+| OpenBao container fails to start   | VaultService falls back to AES-256-GCM; app stays functional         |
+| Data migration corrupts tokens     | Run in transaction; backup first; format prefix tracking             |
+| BetterAuth reads encrypted tokens  | Prisma middleware transparently decrypts on read                     |
+| Transit key rotation               | OpenBao handles versioning transparently; old ciphertext stays valid |
+
+## Key Files Reference
+
+| Purpose                | Path                                                                       |
+| ---------------------- | -------------------------------------------------------------------------- |
+| Existing CryptoService | `apps/api/src/federation/crypto.service.ts`                                |
+| RLS context utilities  | `apps/api/src/lib/db-context.ts`                                           |
+| Prisma schema          | `apps/api/prisma/schema.prisma`                                            |
+| RLS migration          | `apps/api/prisma/migrations/20260129221004_add_rls_policies/migration.sql` |
+| Docker Compose         | `docker/docker-compose.yml`                                                |
+| App module             | `apps/api/src/app.module.ts`                                               |
+| Auth guards            | `apps/api/src/auth/guards/auth.guard.ts`                                   |
+| Workspace guard        | `apps/api/src/common/guards/workspace.guard.ts`                            |
+| Security review        | `docs/reports/codebase-review-2026-02-05/01-security-review.md`            |