# MS22 Phase 1: DB-Centric Multi-User Agent Architecture ## Design Principles 1. **2 env vars to bootstrap** — `DATABASE_URL` + `MOSAIC_SECRET_KEY` 2. **DB-centric config** — All runtime config in Postgres, managed via WebUI 3. **Mosaic is the gatekeeper** — Users authenticate to Mosaic, never to OpenClaw directly 4. **Per-user agent isolation** — Each user gets their own OpenClaw container(s) with their own credentials 5. **Onboarding-first** — Breakglass user + wizard on first boot 6. **Generic product** — No hardcoded names, models, providers, or endpoints ## Architecture Overview ``` ┌─────────────────────────────────────────────────────┐ │ MOSAIC WEBUI │ │ (Auth: breakglass local + OIDC via settings) │ └──────────────────────┬──────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────┐ │ MOSAIC API │ │ │ │ ┌──────────────┐ ┌────────────────┐ ┌─────────┐ │ │ │ Onboarding │ │ Container │ │ Config │ │ │ │ Wizard │ │ Lifecycle Mgr │ │ Store │ │ │ └──────────────┘ └───────┬────────┘ └─────────┘ │ │ │ │ └────────────────────────────┼────────────────────────┘ │ Docker API ┌──────────────────┼──────────────────┐ │ │ │ ▼ ▼ ▼ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ OpenClaw │ │ OpenClaw │ │ OpenClaw │ │ User A │ │ User B │ │ System │ │ │ │ │ │ (admin) │ │ Claude Max │ │ Z.ai key │ │ Shared key │ │ own memory │ │ own memory │ │ monitoring │ └─────────────┘ └─────────────┘ └─────────────┘ Scale to zero Scale to zero Always on after idle after idle ``` ## Container Lifecycle ### User containers (on-demand) 1. User logs in → Mosaic checks `UserContainer` table 2. No running container → Mosaic calls Docker API to create one 3. Injects user's encrypted API keys via config endpoint 4. Routes chat requests to user's container 5. Idle timeout (configurable, default 30min) → scale to zero 6. State volume persists (sessions, memory, auth tokens) 7. Next request → container restarts, picks up state from volume ### System containers (always-on, optional) - Admin-provisioned for system tasks (monitoring, scheduled jobs) - Use admin-configured shared API keys - Not tied to any user ## Auth Layers | Flow | Method | | ------------------------------- | ---------------------------------------------------------------------- | | User → Mosaic WebUI | Breakglass (local) or OIDC (configured in settings) | | Mosaic API → OpenClaw container | Bearer token (auto-generated per container, stored encrypted in DB) | | OpenClaw → LLM providers | User's own API keys (delivered via config endpoint, decrypted from DB) | | Admin → System settings | RBAC (admin role required) | | Internal config endpoint | Bearer token (container authenticates to fetch its config) | ## Database Schema ### System Tables ```prisma model SystemConfig { id String @id @default(cuid()) key String @unique // "oidc.issuerUrl", "oidc.clientId", "onboarding.completed" value String // plaintext or encrypted (prefix: "enc:") encrypted Boolean @default(false) updatedAt DateTime @updatedAt } model BreakglassUser { id String @id @default(cuid()) username String @unique passwordHash String // bcrypt isActive Boolean @default(true) createdAt DateTime @default(now()) updatedAt DateTime @updatedAt } ``` ### Provider Tables (per-user) ```prisma model LlmProvider { id String @id @default(cuid()) userId String // owner — each user manages their own providers name String // "my-zai", "work-openai", "local-ollama" displayName String // "Z.ai", "OpenAI (Work)", "Local Ollama" type String // "zai" | "openai" | "anthropic" | "ollama" | "custom" baseUrl String? // null for built-in, URL for custom/ollama apiKey String? // encrypted apiType String @default("openai-completions") models Json @default("[]") // [{id, name, contextWindow, maxTokens}] isActive Boolean @default(true) createdAt DateTime @default(now()) updatedAt DateTime @updatedAt @@unique([userId, name]) } ``` ### Container Tables ```prisma model UserContainer { id String @id @default(cuid()) userId String @unique containerId String? // Docker container ID (null = not running) containerName String // "mosaic-user-{userId}" gatewayPort Int? // assigned port (null = not running) gatewayToken String // encrypted — auto-generated status String @default("stopped") // "running" | "stopped" | "starting" | "error" lastActiveAt DateTime? idleTimeoutMin Int @default(30) config Json @default("{}") // cached openclaw.json for this user createdAt DateTime @default(now()) updatedAt DateTime @updatedAt } model SystemContainer { id String @id @default(cuid()) name String @unique // "mosaic-system-ops", "mosaic-system-monitor" role String // "operations" | "monitor" | "scheduler" containerId String? gatewayPort Int? gatewayToken String // encrypted status String @default("stopped") providerId String? // references admin-level LlmProvider primaryModel String // "zai/glm-5", etc. isActive Boolean @default(true) createdAt DateTime @default(now()) updatedAt DateTime @updatedAt } ``` ### User Agent Preferences ```prisma model UserAgentConfig { id String @id @default(cuid()) userId String @unique primaryModel String? // user's preferred model fallbackModels Json @default("[]") personality String? // custom SOUL.md content providerId String? // default provider for this user createdAt DateTime @default(now()) updatedAt DateTime @updatedAt } ``` ## Internal Config Endpoint `GET /api/internal/agent-config/:containerType/:id` - Auth: Bearer token (container's own gateway token) - Returns: Complete `openclaw.json` generated from DB - For user containers: includes user's providers, model prefs, personality - For system containers: includes admin provider config Response assembles openclaw.json dynamically: ```json { "gateway": { "mode": "local", "port": 18789, "bind": "lan", "auth": { "mode": "token" } ... }, "agents": { "defaults": { "model": { "primary": "" } } }, "models": { "providers": { "": { ... } } } } ``` ## Container Lifecycle Manager NestJS service that manages Docker containers: ```typescript class ContainerLifecycleService { // Create and start a user's OpenClaw container async ensureRunning(userId: string): Promise<{ url: string; token: string }>; // Stop idle containers (called by cron/scheduler) async reapIdle(): Promise; // Stop a specific user's container async stop(userId: string): Promise; // Health check all running containers async healthCheckAll(): Promise; // Restart container with updated config async restart(userId: string): Promise; } ``` Uses Docker Engine API (`/var/run/docker.sock` or TCP) via `dockerode` npm package. ## Onboarding Wizard ### First-Boot Detection - API checks: `SystemConfig.get("onboarding.completed")` → null = first boot - WebUI redirects to `/onboarding` if not completed ### Steps **Step 1: Create Breakglass Admin** - Username + password → bcrypt → `BreakglassUser` table - This user always works, even if OIDC is misconfigured **Step 2: Configure Authentication (optional)** - OIDC: provider URL, client ID, client secret → encrypted in `SystemConfig` - Skip = breakglass-only auth (can add OIDC later in settings) **Step 3: Add Your First LLM Provider** - Pick type → enter API key/endpoint → test connection → save to `LlmProvider` - This becomes the admin's default provider **Step 4: System Agents (optional)** - Configure always-on system agents for monitoring/ops - Or skip — users can just use their own personal agents **Step 5: Complete** - Sets `SystemConfig("onboarding.completed") = true` - Redirects to dashboard ### Post-Onboarding: User Self-Service - Each user adds their own LLM providers in profile settings - Each user configures their preferred model, personality - First chat request triggers container creation ## Docker Compose (final) ```yaml services: mosaic-api: image: mosaic/api:latest environment: DATABASE_URL: ${DATABASE_URL} MOSAIC_SECRET_KEY: ${MOSAIC_SECRET_KEY} volumes: - /var/run/docker.sock:/var/run/docker.sock # Docker API access networks: - internal mosaic-web: image: mosaic/web:latest environment: NEXT_PUBLIC_API_URL: http://mosaic-api:4000 networks: - internal postgres: image: postgres:17 environment: POSTGRES_DB: mosaic POSTGRES_USER: mosaic POSTGRES_PASSWORD: ${DATABASE_PASSWORD} volumes: - postgres-data:/var/lib/postgresql/data networks: - internal # System agent (optional, admin-provisioned) # mosaic-system: # image: alpine/openclaw:latest # ... (managed by ContainerLifecycleService) # User containers are NOT in this file — # they are dynamically created by ContainerLifecycleService # via the Docker API at runtime. networks: internal: driver: overlay volumes: postgres-data: ``` Note: User OpenClaw containers are **not** defined in docker-compose. They are created dynamically by the `ContainerLifecycleService` when users start chatting. ## Entrypoint (for dynamically created containers) ```sh #!/bin/sh set -e : "${MOSAIC_API_URL:?required}" : "${AGENT_TOKEN:?required}" : "${AGENT_ID:?required}" # Fetch config from Mosaic API curl -sf "${MOSAIC_API_URL}/api/internal/agent-config/${AGENT_ID}" \ -H "Authorization: Bearer ${AGENT_TOKEN}" \ -o /tmp/openclaw.json export OPENCLAW_CONFIG_PATH=/tmp/openclaw.json exec openclaw gateway run --bind lan --auth token ``` Container env vars (injected by ContainerLifecycleService): - `MOSAIC_API_URL` — internal API URL - `AGENT_TOKEN` — this container's bearer token (from DB) - `AGENT_ID` — container ID for config lookup ## Config Update Strategy When a user changes settings (model, provider, personality): 1. Mosaic API updates DB 2. API calls `ContainerLifecycleService.restart(userId)` 3. Container restarts, fetches fresh config from API 4. OpenClaw gateway starts with new config 5. State volume preserves sessions/memory across restarts ## Task Breakdown | Task | Phase | Scope | Dependencies | | -------- | -------------- | --------------------------------------------------------------------------------------------------------------------- | ------------ | | MS22-P1a | Schema | Prisma models: SystemConfig, BreakglassUser, LlmProvider, UserContainer, SystemContainer, UserAgentConfig. Migration. | — | | MS22-P1b | Crypto | Encryption service for API keys/tokens (AES-256-GCM using MOSAIC_SECRET_KEY) | P1a | | MS22-P1c | Config API | Internal config endpoint: assembles openclaw.json from DB | P1a, P1b | | MS22-P1d | Container Mgr | ContainerLifecycleService: Docker API integration (dockerode), start/stop/health/reap | P1a | | MS22-P1e | Onboarding API | Onboarding endpoints: breakglass, OIDC, provider, complete | P1a, P1b | | MS22-P1f | Onboarding UI | Multi-step wizard in WebUI | P1e | | MS22-P1g | Settings API | CRUD: providers, agent config, OIDC, breakglass | P1a, P1b | | MS22-P1h | Settings UI | Settings pages: Providers, Agent Config, Auth | P1g | | MS22-P1i | Chat Proxy | Route WebUI chat → user's OpenClaw container (SSE) | P1c, P1d | | MS22-P1j | Docker | Entrypoint script, health checks, compose for core services | P1c | | MS22-P1k | Idle Reaper | Cron service to stop idle user containers | P1d | ## Open Questions (Resolved) 1. ~~Config updates → restart?~~ **Yes.** Mosaic restarts the container, fresh config on boot. 2. ~~CLI alternative for breakglass?~~ **Yes.** Both WebUI wizard and CLI (`mosaic admin create-breakglass`). 3. ~~Config cache TTL?~~ **Yes.** Config fetched once at startup, changes trigger restart. ## Security Isolation Model ### Core Principle: ZERO cross-user access Every user is fully sandboxed. No exceptions. ### Container Isolation - Each user gets their **own** OpenClaw container (separate process, PID namespace) - Each container has its **own** Docker volume (sessions, memory, workspace) - Containers run on an **internal-only** Docker network — no external exposure - Users NEVER talk to OpenClaw directly — Mosaic proxies all requests - Container gateway tokens are unique per-user and single-purpose ### Data Isolation (enforced at API + DB level) | Data | Isolation | Enforcement | | ---------------- | ------------------------- | --------------------------------------------------------------------------------- | | LLM API keys | Per-user, encrypted | `LlmProvider.userId` — all queries scoped by authenticated user | | Chat history | Per-user container volume | Separate Docker volume per user, not shared | | Agent memory | Per-user container volume | Separate Docker volume per user | | Agent config | Per-user | `UserAgentConfig.userId` — scoped queries | | Container access | Per-user | `UserContainer.userId` — Mosaic validates user owns the container before proxying | ### API Enforcement - **All user-facing endpoints** include `WHERE userId = authenticatedUser.id` - **No admin endpoint** exposes another user's API keys (even to admins) - **Chat proxy** validates: authenticated user → owns target container → forwards request - **Config endpoint** validates: container token matches the container requesting config - **Provider CRUD** is fully user-scoped — User A cannot list, read, or modify User B's providers ### What admins CAN see - Container status (running/stopped) — not contents - User list and roles - System-level config (OIDC, system agents) - Aggregate usage metrics (not individual conversations) ### What admins CANNOT see - Other users' API keys (encrypted, no decrypt endpoint) - Other users' chat history (in container volumes, not in Mosaic DB) - Other users' agent memory/workspace contents ### Future: Team Workspaces (NOT in scope) Team/shared workspaces are a potential future feature where users opt-in to shared agent contexts. This requires explicit consent, shared-key management, and a different isolation model. **Not designed here. Not built now.** ### Attack Surface Notes - Docker socket access (`/var/run/docker.sock`) is required by Mosaic API for container management. This is a privileged operation — the Mosaic API container must be trusted. - `MOSAIC_SECRET_KEY` is the root of trust for encryption. Rotation requires re-encrypting all secrets in DB. - Container-to-container communication is blocked by default (no shared network between user containers unless explicitly configured).