Co-authored-by: Jason Woltje <jason@diversecanvas.com> Co-committed-by: Jason Woltje <jason@diversecanvas.com>
115 lines
4.0 KiB
Markdown
115 lines
4.0 KiB
Markdown
# PRD: MS22 — Fleet Evolution (DB-Centric Agent Architecture)
|
|
|
|
## Metadata
|
|
|
|
- Owner: Jason Woltje
|
|
- Date: 2026-03-01
|
|
- Status: in-progress
|
|
- Design Doc: `docs/design/MS22-DB-CENTRIC-ARCHITECTURE.md`
|
|
|
|
## Problem Statement
|
|
|
|
Mosaic Stack needs a multi-user agent fleet where each user gets their own isolated OpenClaw instance with their own LLM provider credentials and agent config. The system must be Docker-first with minimal environment variables and all configuration managed through the WebUI.
|
|
|
|
## Objectives
|
|
|
|
1. **Minimal bootstrap** — 2 env vars (`DATABASE_URL`, `MOSAIC_SECRET_KEY`) to start the entire stack
|
|
2. **DB-centric config** — All runtime config in Postgres, managed via WebUI
|
|
3. **Per-user isolation** — Each user gets their own OpenClaw container with own API keys, memory, sessions
|
|
4. **Onboarding wizard** — First-boot experience: breakglass admin → OIDC → LLM provider → agent config
|
|
5. **Settings UI** — Runtime management of providers, agents, and auth config
|
|
6. **Mosaic as gatekeeper** — Users never talk to OpenClaw directly; Mosaic proxies all requests
|
|
7. **Zero cross-user access** — Full container, volume, and DB isolation between users
|
|
|
|
## Security Requirements
|
|
|
|
- User A cannot access User B's API keys, chat history, or agent memory
|
|
- All API keys stored encrypted (AES-256-GCM) in database
|
|
- Breakglass admin always works as OIDC fallback
|
|
- OIDC config stored in DB (not env vars) — configured via settings UI
|
|
- Container-to-container communication blocked by default
|
|
- Admin cannot decrypt other users' API keys
|
|
|
|
## Phase 0: Knowledge Layer — COMPLETE
|
|
|
|
- Findings API (pgvector, CRUD, similarity search)
|
|
- AgentMemory API (key/value store)
|
|
- ConversationArchive API (pgvector, ingest, search)
|
|
- OpenClaw mosaic skill
|
|
- Session log ingestion pipeline
|
|
|
|
## Phase 1: DB-Centric Agent Fleet
|
|
|
|
### Phase 1a: DB Schema — COMPLETE
|
|
|
|
- SystemConfig, BreakglassUser, LlmProvider, UserContainer, SystemContainer, UserAgentConfig tables
|
|
|
|
### Phase 1b: Encryption Service — COMPLETE
|
|
|
|
- CryptoService (AES-256-GCM using MOSAIC_SECRET_KEY)
|
|
|
|
### Phase 1c: Internal Config API
|
|
|
|
- `GET /api/internal/agent-config/:id` — assembles openclaw.json from DB
|
|
- Auth: bearer token (container's own gateway token)
|
|
- Returns complete openclaw.json with decrypted provider credentials
|
|
|
|
### Phase 1d: Container Lifecycle Manager
|
|
|
|
- Docker API integration via `dockerode` npm package
|
|
- Start/stop/health-check/reap user containers
|
|
- Auto-generate gateway tokens, assign ports
|
|
- Docker socket access required (`/var/run/docker.sock`)
|
|
|
|
### Phase 1e: Onboarding API
|
|
|
|
- First-boot detection (`SystemConfig.onboarding.completed`)
|
|
- `POST /api/onboarding/breakglass` — create admin user
|
|
- `POST /api/onboarding/oidc` — save OIDC provider config
|
|
- `POST /api/onboarding/provider` — add LLM provider + test connection
|
|
- `POST /api/onboarding/complete` — mark done
|
|
|
|
### Phase 1f: Onboarding Wizard UI
|
|
|
|
- Multi-step wizard component
|
|
- Skip-able OIDC step
|
|
- LLM provider connection test
|
|
|
|
### Phase 1g: Settings API
|
|
|
|
- CRUD: LLM providers (per-user scoped)
|
|
- CRUD: Agent config (model assignments, personalities)
|
|
- CRUD: OIDC config (admin only)
|
|
- Breakglass password reset (admin only)
|
|
|
|
### Phase 1h: Settings UI
|
|
|
|
- Settings/Providers page
|
|
- Settings/Agent Config page
|
|
- Settings/Auth page (OIDC + breakglass)
|
|
|
|
### Phase 1i: Chat Proxy
|
|
|
|
- Route WebUI chat to user's OpenClaw container
|
|
- SSE streaming pass-through
|
|
- Ensure container is running before proxying (auto-start)
|
|
|
|
### Phase 1j: Docker Compose + Entrypoint
|
|
|
|
- Simplified compose (core services only — user containers are dynamic)
|
|
- Entrypoint: fetch config from API, write openclaw.json, start gateway
|
|
- Health check integration
|
|
|
|
### Phase 1k: Idle Reaper
|
|
|
|
- Cron job to stop inactive user containers
|
|
- Configurable idle timeout (default 30min)
|
|
- Preserve state volumes
|
|
|
|
## Future Phases (out of scope)
|
|
|
|
- Phase 2: Agent fleet standup (predefined agent roles)
|
|
- Phase 3: WebUI chat + task management integration
|
|
- Phase 4: Multi-LLM provider management UI (advanced)
|
|
- Team workspaces (shared agent contexts) — explicitly out of scope
|