docs(#346): Add credential security architecture design document

Comprehensive design document for M7-CredentialSecurity milestone covering
hybrid OpenBao Transit + PostgreSQL encryption approach, threat model,
UserCredential data model, API design, RLS enforcement strategy, turnkey
OpenBao Docker integration, and 5-phase implementation plan.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
2026-02-07 11:15:58 -06:00
parent ec87c5479b
commit 51ce32cc76

View File

@@ -0,0 +1,412 @@
# Credential Security Architecture
**Version:** 0.0.1
**Status:** Approved
**Author:** Mosaic Stack Team
**Date:** 2026-02-07
**Epic:** [#346](https://git.mosaicstack.dev/mosaic/stack/issues/346)
**Milestone:** M7-CredentialSecurity
## Table of Contents
1. [Problem Statement](#problem-statement)
2. [Threat Model](#threat-model)
3. [Architecture Decision](#architecture-decision)
4. [System Architecture](#system-architecture)
5. [Data Model](#data-model)
6. [API Design](#api-design)
7. [RLS Enforcement](#rls-enforcement)
8. [OpenBao Integration](#openbao-integration)
9. [Federation Isolation](#federation-isolation)
10. [Implementation Phases](#implementation-phases)
11. [Risk Mitigation](#risk-mitigation)
---
## Problem Statement
Mosaic Stack stores sensitive user credentials with critical security gaps:
1. **OAuth tokens stored plaintext** in the `accounts` table (`access_token`, `refresh_token`,
`id_token`)
2. **LLM API keys stored plaintext** in `llm_provider_instances.config` JSON field
3. **RLS enabled but never enforced** — all 23 tables have policies but no `FORCE ROW LEVEL
SECURITY`, and Prisma connects as table owner, silently bypassing all policies
4. **No RLS on auth tables** — `accounts`, `sessions`, `verifications` have no policies
5. **No user credential management** — no model, API, or UI for storing user-provided tokens
6. **Master encryption key on disk** — `ENCRYPTION_KEY` in `.env` file
Users will store API keys, git tokens, and OAuth tokens for integrations. This data is private
and must never leak between users or across federation boundaries.
## Threat Model
### At-Rest Threats (mitigated by encryption)
| Threat | Impact | Mitigation |
| --------------------------- | ------------------------------ | ------------------------------------------- |
| Database backup exposure | All credentials leaked | Column-level encryption via OpenBao Transit |
| SQL injection | Attacker reads encrypted blobs | Encrypted data useless without Transit key |
| Database admin access | Full table reads | Encrypted columns, RLS enforcement |
| Filesystem access to `.env` | Master key compromised | OpenBao Shamir key splitting (production) |
### In-Use Threats (mitigated by access control)
| Threat | Impact | Mitigation |
| ----------------------- | --------------------------------- | ------------------------------------ |
| Cross-user data access | User A sees User B's tokens | RLS policies with FORCE enforcement |
| Federation data leakage | Remote instance gets credentials | Explicit deny-list in QueryService |
| Application logic bugs | Wrong user gets wrong credential | RLS as defense-in-depth layer |
| Compromised app server | Memory access to decrypted values | Short-lived plaintext, audit logging |
### Not Mitigated
Full application server compromise with code execution grants access to decrypted credentials
in memory. This is an accepted risk — no encryption scheme protects against a fully compromised
application process.
## Architecture Decision
### Approach: Hybrid OpenBao + PostgreSQL Encryption
After evaluating three approaches, the hybrid model was selected:
| Concern | Pure DB (pgcrypto) | Pure Vault | Hybrid (selected) |
| ----------------------------- | ------------------ | ----------------- | ------------------------------ |
| Key on disk (turtles problem) | `.env` on disk | Shamir-split | Shamir-split |
| Audit trail | Custom logging | Built-in | Built-in |
| New infrastructure | None | OpenBao container | OpenBao container |
| Per-user isolation | RLS only | Vault policies | RLS + encryption |
| Turnkey deployment | Yes | Manual unsealing | Auto-unseal via init container |
| Dynamic secrets | No | Yes | Yes |
| License cost | Free | Free (OpenBao) | Free |
**Why not pure DB?** The "turtles all the way down" problem — encrypting in the DB still
requires a master key in an environment variable on disk. If the server is compromised, the
key is compromised.
**Why not pure Vault?** Operational complexity. Storing all credentials in Vault requires
significant Vault policy management. PostgreSQL with RLS provides a more natural data model
for user-scoped credentials.
**Why hybrid?** Best of both worlds — PostgreSQL stores encrypted credentials with RLS
enforcement, OpenBao handles key management via Transit engine. The master key never exists
on disk as a single value (Shamir-split in production).
### Why OpenBao (not HashiCorp Vault)?
- Truly open-source (Linux Foundation, OSI license)
- Drop-in Vault replacement (API-compatible)
- No Business Source License concerns
- Production-ready (v2.0)
- Smaller, focused ecosystem
## System Architecture
```
┌──────────────────────┐
│ Next.js Frontend │
│ /settings/creds │
└──────────┬───────────┘
│ HTTPS
┌──────────▼───────────┐
│ NestJS API │
│ CredentialsService │
│ VaultService │
└───┬──────────────┬───┘
│ │
Ciphertext │ │ Transit API
(storage) │ │ (encrypt/decrypt)
│ │
┌──────────▼──┐ ┌──────▼──────────┐
│ PostgreSQL │ │ OpenBao │
│ + RLS │ │ Transit Engine │
│ + pgcrypto │ │ + AppRole Auth │
└─────────────┘ │ + Audit Log │
└─────────────────┘
```
### Data Flow: Store Credential
1. User submits API key via frontend form
2. NestJS `CredentialsController` receives plaintext value
3. `CredentialsService` calls `VaultService.encrypt(value, TransitKey.CREDENTIALS)`
4. `VaultService` calls OpenBao Transit API: `POST /v1/transit/encrypt/mosaic-credentials`
5. Transit returns ciphertext: `vault:v1:base64data`
6. Ciphertext stored in `user_credentials.encrypted_value`
7. Masked value (`****abcd`) stored in `user_credentials.masked_value`
8. Activity log entry: `CREDENTIAL_CREATED`
9. Response includes masked value only — never the ciphertext or plaintext
### Data Flow: Retrieve Credential
1. User clicks "Reveal" on credential card
2. Frontend calls `GET /api/credentials/:id/value`
3. RLS-scoped query fetches row (user can only see own rows)
4. `VaultService.decrypt(ciphertext, TransitKey.CREDENTIALS)`
5. Transit returns plaintext
6. `lastUsedAt` updated on credential row
7. Activity log entry: `CREDENTIAL_ACCESSED`
8. Plaintext returned to frontend, auto-hidden after 30 seconds
### Fallback: No OpenBao Available
When OpenBao is unavailable (local dev, CI), `VaultService` falls back to the existing
`CryptoService` (AES-256-GCM with `ENCRYPTION_KEY` from environment).
Ciphertext format distinguishes the source:
- `vault:v1:...` — OpenBao Transit ciphertext
- `aes:iv:authTag:encrypted` — AES-256-GCM fallback
- No prefix — legacy plaintext (backward compatible, triggers encryption on next write)
## Data Model
### UserCredential Table
```
user_credentials
├── id UUID (PK)
├── user_id UUID (FK -> users)
├── workspace_id UUID? (FK -> workspaces, nullable for user-global)
├── name VARCHAR -- "GitHub Personal Token"
├── provider VARCHAR -- "github", "openai", "custom"
├── type CredentialType (API_KEY, OAUTH_TOKEN, ACCESS_TOKEN, SECRET, PASSWORD, CUSTOM)
├── scope CredentialScope (USER, WORKSPACE, SYSTEM)
├── encrypted_value TEXT -- OpenBao Transit ciphertext
├── masked_value VARCHAR? -- "****abcd"
├── description TEXT?
├── expires_at TIMESTAMPTZ?
├── last_used_at TIMESTAMPTZ?
├── metadata JSONB -- provider-specific data
├── is_active BOOLEAN -- soft delete
├── rotated_at TIMESTAMPTZ?
├── created_at TIMESTAMPTZ
└── updated_at TIMESTAMPTZ
UNIQUE(user_id, workspace_id, provider, name)
```
### Scope Semantics
| Scope | Who Can Access | Use Case |
| --------- | ------------------ | ----------------------------- |
| USER | Owner only | Personal API keys, git tokens |
| WORKSPACE | Workspace admins | Shared integration tokens |
| SYSTEM | System admins only | Platform-level secrets |
### Enum Additions
- `EntityType`: add `CREDENTIAL`
- `ActivityAction`: add `CREDENTIAL_CREATED`, `CREDENTIAL_ACCESSED`, `CREDENTIAL_ROTATED`,
`CREDENTIAL_REVOKED`
## API Design
### User Credential Endpoints
```
POST /api/credentials Create credential (encrypt + store)
GET /api/credentials List credentials (masked values only)
GET /api/credentials/:id Get single credential (masked)
GET /api/credentials/:id/value Decrypt and return value (audit logged)
PATCH /api/credentials/:id Update metadata (not value)
POST /api/credentials/:id/rotate Replace with new encrypted value
DELETE /api/credentials/:id Soft-delete (isActive=false)
```
Guards: `AuthGuard` + `WorkspaceGuard` + `PermissionGuard`
### Admin Secret Endpoints
```
POST /api/admin/secrets Create system-level secret
GET /api/admin/secrets List system secrets (masked)
PATCH /api/admin/secrets/:id Update system secret
DELETE /api/admin/secrets/:id Revoke system secret
```
Guards: `AuthGuard` + `AdminGuard`
### Security Invariant
**Listing endpoints never return plaintext or ciphertext.** Only `maskedValue` appears in
list/get responses. Decryption requires an explicit `GET /value` call, which is always
audit-logged.
## RLS Enforcement
### Current Problem
All 23 RLS-enabled tables use `ENABLE ROW LEVEL SECURITY` but never `FORCE ROW LEVEL SECURITY`.
Prisma connects as the database owner role (`mosaic`), which bypasses all RLS policies by default.
The RLS context utilities in `apps/api/src/lib/db-context.ts` are fully implemented but never
called by any service.
### Solution
1. **FORCE ROW LEVEL SECURITY** on auth and credential tables
2. **Owner bypass policy** for migration compatibility
3. **RLS context interceptor** sets session variables in every authenticated request
```sql
ALTER TABLE user_credentials FORCE ROW LEVEL SECURITY;
-- Owner bypass for migrations
CREATE POLICY credentials_owner_bypass ON user_credentials
FOR ALL TO mosaic USING (true);
-- User access policy
CREATE POLICY credentials_user_access ON user_credentials
FOR ALL USING (
(scope = 'USER' AND user_id = current_user_id())
OR (scope = 'WORKSPACE' AND workspace_id IS NOT NULL
AND is_workspace_admin(workspace_id, current_user_id()))
);
```
### RLS Context Interceptor
Registered as `APP_INTERCEPTOR`, wraps all authenticated requests:
1. Extracts `userId` from `AuthGuard`
2. Extracts `workspaceId` from `WorkspaceGuard`
3. Executes `SET LOCAL app.current_user_id = '{userId}'` in Prisma transaction
4. Uses `AsyncLocalStorage` to propagate transaction client to services
## OpenBao Integration
### Turnkey Docker Deployment
Two containers added to `docker/docker-compose.yml`:
1. **openbao** — OpenBao server with file storage backend
2. **openbao-init** — Sidecar that auto-initializes, auto-unseals, and configures Transit
On first `docker compose up -d`:
- OpenBao initializes with 1-of-1 key share (turnkey simplicity)
- Transit secrets engine enabled
- Four named encryption keys created
- AppRole created with Transit-only policy
- Credentials saved to shared Docker volume
On restart:
- `openbao-init` reads stored unseal key and auto-unseals
### Named Transit Keys
| Key | Purpose |
| ----------------------- | ------------------------------------------------ |
| `mosaic-credentials` | User-stored credentials (API keys, git tokens) |
| `mosaic-account-tokens` | BetterAuth OAuth tokens in accounts table |
| `mosaic-federation` | Federation private keys (replaces CryptoService) |
| `mosaic-llm-config` | LLM provider API keys |
### Production Hardening
For production deployments (documented in `docs/OPENBAO.md`):
- Upgrade to 3-of-5 Shamir key splitting: `bao operator rekey -key-shares=5 -key-threshold=3`
- Enable TLS on listener
- Use external KMS for auto-unseal (AWS KMS, GCP CKMS, Azure Key Vault)
- Enable audit logging: `bao audit enable file file_path=/bao/logs/audit.log`
- Use Raft or Consul storage backend for HA
- Revoke root token after initial setup
## Federation Isolation
Credentials must never leak across federation boundaries:
1. **RLS enforcement** — Federated queries go through `QueryService` which operates within a
specific workspace context. RLS policies restrict to authenticated user.
2. **Explicit deny-list** — `QueryService` denies queries for `UserCredential` entity type
3. **Transit key isolation** — Each credential type uses a separate named key. Federation keys
(`mosaic-federation`) cannot decrypt user credentials (`mosaic-credentials`).
4. **Endpoint isolation** — Credential API requires session auth. Federated requests use
signature-based auth and cannot access credential endpoints.
## Implementation Phases
### Phase 1: Security Foundations (p0)
Fix immediate security gaps:
| Issue | Title |
| ----- | ------------------------------------------------------ |
| #351 | Create RLS context interceptor (fix SEC-API-4) |
| #350 | Add RLS policies to auth tables with FORCE enforcement |
| #352 | Encrypt existing plaintext Account tokens |
### Phase 2: OpenBao Integration (p1)
Add OpenBao and VaultService:
| Issue | Title |
| ----- | ---------------------------------------------------------- |
| #357 | Add OpenBao to Docker Compose (turnkey setup) |
| #353 | Create VaultService NestJS module for OpenBao Transit |
| #354 | Write OpenBao documentation and production hardening guide |
### Phase 3: User Credential Storage (p1)
Build the credential management system:
| Issue | Title |
| ----- | ---------------------------------------------------- |
| #355 | Create UserCredential Prisma model with RLS policies |
| #356 | Build credential CRUD API endpoints |
### Phase 4: Frontend (p1)
User-facing credential management:
| Issue | Title |
| ----- | ------------------------------------------ |
| #358 | Build frontend credential management pages |
### Phase 5: Migration and Hardening (p1-p3)
Encrypt remaining plaintext and harden federation:
| Issue | Title |
| ----- | ----------------------------------------- |
| #359 | Encrypt LLM provider API keys in database |
| #360 | Federation credential isolation |
| #361 | Credential audit log viewer (stretch) |
### Phase Dependencies
```
Phase 1 (RLS + Token Encryption)
└── Phase 2 (OpenBao + VaultService)
├── Phase 3 (Credential Model + API)
│ └── Phase 4 (Frontend)
└── Phase 5 (LLM Migration + Federation)
```
## Risk Mitigation
| Risk | Mitigation |
| ---------------------------------- | -------------------------------------------------------------------- |
| FORCE RLS breaks Prisma migrations | Owner bypass policy grants full access to `mosaic` role |
| FORCE RLS breaks BetterAuth writes | Interceptor sets user context; BetterAuth uses same client |
| OpenBao container fails to start | VaultService falls back to AES-256-GCM; app stays functional |
| Data migration corrupts tokens | Run in transaction; backup first; format prefix tracking |
| BetterAuth reads encrypted tokens | Prisma middleware transparently decrypts on read |
| Transit key rotation | OpenBao handles versioning transparently; old ciphertext stays valid |
## Key Files Reference
| Purpose | Path |
| ---------------------- | -------------------------------------------------------------------------- |
| Existing CryptoService | `apps/api/src/federation/crypto.service.ts` |
| RLS context utilities | `apps/api/src/lib/db-context.ts` |
| Prisma schema | `apps/api/prisma/schema.prisma` |
| RLS migration | `apps/api/prisma/migrations/20260129221004_add_rls_policies/migration.sql` |
| Docker Compose | `docker/docker-compose.yml` |
| App module | `apps/api/src/app.module.ts` |
| Auth guards | `apps/api/src/auth/guards/auth.guard.ts` |
| Workspace guard | `apps/api/src/common/guards/workspace.guard.ts` |
| Security review | `docs/reports/codebase-review-2026-02-05/01-security-review.md` |