Files
stack/docs/PRD-MS22.md
2026-03-01 15:05:35 +00:00

4.0 KiB

PRD: MS22 — Fleet Evolution (DB-Centric Agent Architecture)

Metadata

  • Owner: Jason Woltje
  • Date: 2026-03-01
  • Status: in-progress
  • Design Doc: docs/design/MS22-DB-CENTRIC-ARCHITECTURE.md

Problem Statement

Mosaic Stack needs a multi-user agent fleet where each user gets their own isolated OpenClaw instance with their own LLM provider credentials and agent config. The system must be Docker-first with minimal environment variables and all configuration managed through the WebUI.

Objectives

  1. Minimal bootstrap — 2 env vars (DATABASE_URL, MOSAIC_SECRET_KEY) to start the entire stack
  2. DB-centric config — All runtime config in Postgres, managed via WebUI
  3. Per-user isolation — Each user gets their own OpenClaw container with own API keys, memory, sessions
  4. Onboarding wizard — First-boot experience: breakglass admin → OIDC → LLM provider → agent config
  5. Settings UI — Runtime management of providers, agents, and auth config
  6. Mosaic as gatekeeper — Users never talk to OpenClaw directly; Mosaic proxies all requests
  7. Zero cross-user access — Full container, volume, and DB isolation between users

Security Requirements

  • User A cannot access User B's API keys, chat history, or agent memory
  • All API keys stored encrypted (AES-256-GCM) in database
  • Breakglass admin always works as OIDC fallback
  • OIDC config stored in DB (not env vars) — configured via settings UI
  • Container-to-container communication blocked by default
  • Admin cannot decrypt other users' API keys

Phase 0: Knowledge Layer — COMPLETE

  • Findings API (pgvector, CRUD, similarity search)
  • AgentMemory API (key/value store)
  • ConversationArchive API (pgvector, ingest, search)
  • OpenClaw mosaic skill
  • Session log ingestion pipeline

Phase 1: DB-Centric Agent Fleet

Phase 1a: DB Schema — COMPLETE

  • SystemConfig, BreakglassUser, LlmProvider, UserContainer, SystemContainer, UserAgentConfig tables

Phase 1b: Encryption Service — COMPLETE

  • CryptoService (AES-256-GCM using MOSAIC_SECRET_KEY)

Phase 1c: Internal Config API

  • GET /api/internal/agent-config/:id — assembles openclaw.json from DB
  • Auth: bearer token (container's own gateway token)
  • Returns complete openclaw.json with decrypted provider credentials

Phase 1d: Container Lifecycle Manager

  • Docker API integration via dockerode npm package
  • Start/stop/health-check/reap user containers
  • Auto-generate gateway tokens, assign ports
  • Docker socket access required (/var/run/docker.sock)

Phase 1e: Onboarding API

  • First-boot detection (SystemConfig.onboarding.completed)
  • POST /api/onboarding/breakglass — create admin user
  • POST /api/onboarding/oidc — save OIDC provider config
  • POST /api/onboarding/provider — add LLM provider + test connection
  • POST /api/onboarding/complete — mark done

Phase 1f: Onboarding Wizard UI

  • Multi-step wizard component
  • Skip-able OIDC step
  • LLM provider connection test

Phase 1g: Settings API

  • CRUD: LLM providers (per-user scoped)
  • CRUD: Agent config (model assignments, personalities)
  • CRUD: OIDC config (admin only)
  • Breakglass password reset (admin only)

Phase 1h: Settings UI

  • Settings/Providers page
  • Settings/Agent Config page
  • Settings/Auth page (OIDC + breakglass)

Phase 1i: Chat Proxy

  • Route WebUI chat to user's OpenClaw container
  • SSE streaming pass-through
  • Ensure container is running before proxying (auto-start)

Phase 1j: Docker Compose + Entrypoint

  • Simplified compose (core services only — user containers are dynamic)
  • Entrypoint: fetch config from API, write openclaw.json, start gateway
  • Health check integration

Phase 1k: Idle Reaper

  • Cron job to stop inactive user containers
  • Configurable idle timeout (default 30min)
  • Preserve state volumes

Future Phases (out of scope)

  • Phase 2: Agent fleet standup (predefined agent roles)
  • Phase 3: WebUI chat + task management integration
  • Phase 4: Multi-LLM provider management UI (advanced)
  • Team workspaces (shared agent contexts) — explicitly out of scope