Files
stack/docs/PRD.md
Jason Woltje 9ac5779e66 fix(P0-001): add missing typescript-eslint dep, format all files
Add typescript-eslint to root devDependencies (required by eslint
flat config). Run prettier across all files. Exclude QA reports
from git.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 20:11:25 -05:00

56 KiB

PRD: Mosaic Stack v0.1.0

Metadata

  • Owner: Jason Woltje
  • Date: 2026-03-12
  • Status: draft
  • Best-Guess Mode: true
  • Repo (target): git.mosaicstack.dev/mosaic/mosaic-stack
  • Baseline: ~/src/jarvis-old (jarvis v0.2.0)
  • Package source: ~/src/mosaic-mono-v0 (@mosaic/* packages)
  • Agent harness: pi (v0.57.1)
  • Remote control reference: OpenClaw (upstream, canonical)

Problem Statement

Jarvis (v0.2.0) is a self-hosted AI assistant with a Python FastAPI backend and Next.js frontend. It handles chat, projects, tasks, and LLM routing but lacks orchestration depth, agent coordination, shared memory, and remote access. The Mosaic framework (~/.config/mosaic) provides agent guides, shell-based orchestration tools, and quality rails — but these are loose scripts, not an integrated platform. The @mosaic/* packages in mosaic-mono-v0 began consolidating these into TypeScript packages (brain, queue, coord, cli, prdy, quality-rails) but have no UI, no auth, and no agent runtime integration.

The gap: Three codebases with overlapping concerns, no unified runtime, no remote control surface (Discord/Telegram), no gateway orchestrator, and a Python backend that doesn't align with the target TypeScript-everywhere stack.

What Mosaic Stack solves: A single monorepo that brings together many pieces to make one beautiful picture — a self-hosted, multi-user AI agent platform with web dashboard, TUI, remote control, shared memory, mission orchestration, and extensible skill/plugin architecture. All TypeScript. Pi as the agent harness. Brain as the knowledge layer. Queue as the coordination backbone.


Objectives

  1. Unified TypeScript monorepo — One repo, one language, one build pipeline for all Mosaic Stack components
  2. Pi-powered agent runtime — Pi SDK embedded as the core agent loop; Pi TUI as the terminal interface
  3. Web + TUI + Remote — Next.js dashboard for visual management, Pi TUI for terminal work, Discord/Telegram for remote control
  4. Gateway orchestrator — Central routing layer that dispatches tasks to appropriate agents based on capability, cost, and context
  5. Shared memory — PostgreSQL canonical store + vector DB for semantic search + tiered log summarization to prevent context creep
  6. Multi-user with SSO — BetterAuth with Authentik/WorkOS/Keycloak SSO, RBAC for family/team/business use
  7. Full @mosaic/* package integration — brain, queue, coord, mosaic, prdy, quality-rails, cli all integrated
  8. Extensible — MCP capability, skill import interface, plugin architecture for LLM providers and remote channels

Scope

In Scope (v0.1.0 Beta)

  1. Chat/conversation UI (web) — carry forward from jarvis-old, rewrite frontend to work with new backend
  2. Pi TUI integration — terminal-based agent interaction using Pi SDK
  3. Web dashboard — settings, task management, projects, PRDs, missions, agent status
  4. Gateway orchestrator (@mosaic/gateway) — central dispatch for agent tasks with routing logic
  5. Task management — CRUD, kanban, mission-scoped tasks, dependency tracking
  6. Project management — projects, milestones, PRDs linked to missions
  7. Shared memory system — learned preferences, behaviors, defaults; tiered storage with summarization
  8. User management — RBAC (admin, member, viewer), multi-user capable
  9. SSO — BetterAuth with Authentik/WorkOS/Keycloak adapter
  10. Remote control — Discord plugin (high priority), Telegram plugin
  11. LLM provider support — Anthropic subs, Codex subs, Z.ai subs, other API-based, Ollama, LM Studio, llama.cpp
  12. Agent routing — task-based model/provider selection (cost/capability matrix)
  13. MCP capability — server and client, tool registration
  14. Skill import interface — browse, install, manage agent skills
  15. @mosaic/brain — structured data layer (migrated to PG + vector DB backend)
  16. @mosaic/queue — Valkey-backed task queue with MCP tools
  17. @mosaic/coord — mission coordination engine
  18. @mosaic/mosaic — install wizard / bootstrap
  19. @mosaic/prdy — PRD wizard
  20. @mosaic/quality-rails — code quality scaffolder
  21. @mosaic/cli — unified mosaic CLI
  22. Docker Compose deployment + bare-metal capability
  23. Agent log service — ingest, parse, tier, summarize agent interaction logs

Out of Scope (v0.1.0)

  1. SaaS / multi-tenant revenue model — this is a personal/family/team tool
  2. Mobile native apps — web responsive is sufficient
  3. Public npm registry publishing — Gitea registry only
  4. Video/voice agent interaction
  5. Full OpenClaw feature parity — we take inspiration, not wholesale migration
  6. Calendar integration (deferred — brain tracks events, but no gcal sync yet)
  7. GLPI/helpdesk ticket sync (deferred)
  8. Woodpecker CI integration tooling (deferred — focus on core platform first)

Architecture

High-Level System Diagram

┌─────────────────────────────────────────────────────────────────┐
│                        Mosaic Stack                              │
│                                                                  │
│  ┌──────────┐  ┌──────────┐  ┌─────────────┐  ┌──────────────┐ │
│  │ Next.js  │  │ Pi TUI   │  │  Discord    │  │  Telegram    │ │
│  │ Web App  │  │ Terminal  │  │  Plugin     │  │  Plugin      │ │
│  └────┬─────┘  └────┬─────┘  └──────┬──────┘  └──────┬───────┘ │
│       │              │               │                │          │
│       └──────────────┴───────┬───────┴────────────────┘          │
│                              │                                   │
│                    ┌─────────▼──────────┐                        │
│                    │  @mosaic/gateway   │  ← Central Orchestrator│
│                    │  (NestJS+Fastify)  │                        │
│                    └────┬────┬────┬─────┘                        │
│                         │    │    │                               │
│          ┌──────────────┤    │    ├──────────────┐               │
│          │              │    │    │              │               │
│  ┌───────▼──────┐ ┌────▼────▼──┐ │  ┌───────────▼────────┐     │
│  │ @mosaic/brain│ │ @mosaic/   │ │  │ Agent Pool         │     │
│  │ (Data Layer) │ │ queue      │ │  │ (Pi SDK sessions)  │     │
│  └───────┬──────┘ └────────────┘ │  │ - Anthropic        │     │
│          │                       │  │ - Codex            │     │
│  ┌───────▼──────────────────┐    │  │ - Z.ai             │     │
│  │  PostgreSQL  │  VectorDB │    │  │ - Ollama           │     │
│  │  (canonical) │  (semantic)│   │  │ - LM Studio        │     │
│  └──────────────┴───────────┘    │  │ - llama.cpp        │     │
│                                  │  └────────────────────┘     │
│                    ┌─────────────▼──────┐                       │
│                    │  @mosaic/coord     │                        │
│                    │  Mission lifecycle │                        │
│                    └────────────────────┘                        │
│                                                                  │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────────┐      │
│  │ @mosaic/cli  │  │ @mosaic/prdy │  │ @mosaic/         │      │
│  │              │  │              │  │ quality-rails    │      │
│  └──────────────┘  └──────────────┘  └──────────────────┘      │
│                                                                  │
│  ┌──────────────────────────────────────────────────────┐       │
│  │  Valkey (queue backend)  │  BetterAuth (SSO/RBAC)   │       │
│  └──────────────────────────────────────────────────────┘       │
└─────────────────────────────────────────────────────────────────┘

Technology Decisions

Layer Technology Rationale
Web Frontend Next.js 16 + React 19 + Tailwind CSS SSR, RSC; design tokens from @mosaic/design-tokens (mosaic-stack-website)
API / Gateway NestJS + Fastify adapter Module system, DI, guards/interceptors for complex gateway; Fastify performance underneath
Agent Runtime Pi SDK (embedded) Extensible harness with tools, skills, session management
TUI Pi interactive mode Native terminal agent interaction
Auth BetterAuth + SSO adapters Multi-user RBAC with Authentik/WorkOS/Keycloak
Database PostgreSQL 17 + pgvector Canonical store; pgvector for embedding search
Vector DB pgvector + VectorStore interface pgvector for v0.1.0; VectorStore abstraction in @mosaic/memory makes Qdrant a drop-in later
Cache / Queue Valkey 8 Redis-compatible; proven in @mosaic/queue
ORM Drizzle ORM TypeScript-native, lightweight, good migration story
Validation Zod Already used across @mosaic/* packages
Build pnpm workspaces + Turborepo Proven in both jarvis-old and mosaic-mono-v0
Testing Vitest + Playwright Unit/integration via Vitest, E2E via Playwright
Remote Control Discord.js + Telegraf Inspired by OpenClaw plugin architecture
MCP @modelcontextprotocol/sdk Already used in @mosaic/brain and @mosaic/queue
Container Docker Compose Self-hosted; bare-metal also supported
CI Woodpecker CI Existing infrastructure at git.mosaicstack.dev
Observability OpenTelemetry + SigNoz Wide-event logging from day one; OTEL auto-instrumentation for NestJS/PG/HTTP; SigNoz as all-in-one backend
Log Processing Custom ingest service Parse agent logs → tiered storage → summarization

Key Architecture Decisions

AD-1: TypeScript everywhere (no Python backend) The jarvis-old FastAPI backend is not carried forward as code. Its domain logic (conversation management, LLM routing, task/project CRUD, auth) is reimplemented in TypeScript. The Python plugin system is replaced by Pi's extension/skill system and MCP tool registration.

AD-2: Pi SDK as the agent runtime Instead of a custom LLM provider abstraction (jarvis-old's BaseLLMProvider), Pi SDK manages agent sessions. Pi handles model selection, tool calling, context management, and compaction. The gateway dispatches work to Pi sessions configured with appropriate providers.

AD-3: Gateway as the central nervous system (NestJS + Fastify adapter) @mosaic/gateway is the single API surface. The web app, TUI, Discord, and Telegram all talk to the gateway. The gateway routes to brain (data), queue (coordination), agent pool (LLM work), and coord (mission lifecycle). This replaces the direct FastAPI-to-DB pattern from jarvis-old.

NestJS was chosen over raw Fastify because the gateway is inherently complex — it hosts channel plugins, agent pool management, routing engine, WebSocket hub, MCP server, auth middleware, and integrates brain, queue, memory, and log services. NestJS provides the module system, dependency injection, guards, and interceptors needed to organize this cleanly. NestJS uses Fastify as its HTTP adapter, so Fastify's performance is preserved. This also aligns with the stated stack preference in USER.md ("NestJS API + Next.js web"). @mosaic/brain's existing Fastify code migrates naturally into a NestJS module with Fastify adapter.

AD-4: Brain migrates from JSON files to PostgreSQL @mosaic/brain currently uses a JSON file store. For Mosaic Stack, brain's data model (tasks, projects, events, agents, missions, tickets) moves to PostgreSQL via Drizzle ORM. Brain's REST + MCP interface is preserved — only the storage backend changes.

AD-5: Tiered memory with summarization Agent interaction logs are ingested into a log service. Raw logs are stored short-term. A summarization pipeline (using a cheap LLM) periodically compresses logs into structured insights stored in the vector DB. This prevents unbounded log growth while preserving searchable context.

AD-6: Remote control via plugin architecture Discord and Telegram plugins follow a channel plugin pattern inspired by OpenClaw (https://github.com/openclaw/openclaw). Each plugin registers as a channel with the gateway, receives messages, and dispatches them through the same routing pipeline as web/TUI messages.

AD-7: Gateway state persistence via Valkey (restart resilience) The gateway persists its orchestration state (active sessions, pending dispatches, routing context, agent assignments) to Valkey. On restart, the gateway reads Valkey state and resumes operations — active agent sessions are reconnected or gracefully recovered. mosaic gateway restart --fresh is the nuclear option: clears the Valkey queue and all in-flight state, starting with a clean slate. This prevents context/focus/direction loss that would otherwise occur on every restart.

AD-8: Multi-session agent architecture Each agent operates in a distinct session. Multiple authorized input channels (TUI, web UI, Discord) can connect to the same agent session simultaneously. This means a user can start a conversation in Discord, continue in the web UI, and monitor via TUI — all feeding into the same agent context. OpenClaw has this concept; Mosaic Stack evolves it with proper session authorization and channel multiplexing at the gateway level.

AD-9: Discord channel-to-agent binding Discord channels pair to specific agent/session combinations via channel ID binding. This provides data segregation — messages in #project-alpha route to the project-alpha agent session, messages in #general route to a general-purpose session. Prevents cross-contamination between contexts and provides clear boundaries for multi-channel use.

AD-10: Agent session barge-in via tmux Each agent session runs in a dedicated, named tmux session (e.g., mosaic-agent-project-alpha). This enables barge-in — a user can attach to any active agent's tmux session to observe, interrupt, or redirect. mosaic agent attach <session-name> connects to the tmux session. This provides direct low-level access when the normal channel interfaces are insufficient.

AD-11: Cron-based scheduled jobs The gateway includes a cron scheduler for recurring tasks: log summarization runs, stale task detection, memory decay, provider health checks, scheduled agent dispatches. Uses node-cron or similar; schedules are configurable via web dashboard and stored in PG. Each cron job is a gateway-dispatched task that goes through the normal routing pipeline.

AD-12: Web search tool (DuckDuckGo MCP) Agent sessions include a web search tool for information retrieval. DuckDuckGo via MCP server is the primary option (privacy-respecting, no API key required). Falls back to other search MCP providers if configured. Registered as a standard MCP tool available to all agent sessions.

AD-13: Design system from @mosaic/design-tokens The web dashboard uses the Mosaic Stack design system established in mosaic-stack-website. The @mosaic/design-tokens package provides CSS custom properties, Tailwind preset, and TS color/font/radius exports. Dark theme default with light theme support. Fonts: Outfit (sans), Fira Code (mono). Color palette: deep blue-grays with blue/purple/teal accents.

AD-14: Multi-tier deployment readiness Code is structured assuming eventual multi-node deployment with dedicated roles (gateway nodes, agent worker nodes, brain/DB nodes). Packages communicate via well-defined APIs (HTTP/WS/MCP), not in-process calls where avoidable. Service boundaries are clean: gateway is stateless (state in PG/Valkey), agent pool can scale independently, brain is a separate service. v0.1.0 runs single-node; the architecture doesn't fight horizontal scaling later.


Package Structure

Monorepo Layout

mosaic-mono-v1/
├── apps/
│   ├── web/                    Next.js 16 web dashboard
│   └── gateway/                @mosaic/gateway — NestJS API + WebSocket
├── packages/
│   ├── types/                  @mosaic/types — shared type contracts
│   ├── brain/                  @mosaic/brain — data layer (PG-backed)
│   ├── queue/                  @mosaic/queue — Valkey task queue + MCP
│   ├── coord/                  @mosaic/coord — mission coordination
│   ├── mosaic/                 @mosaic/mosaic — install wizard
│   ├── prdy/                   @mosaic/prdy — PRD wizard
│   ├── quality-rails/          @mosaic/quality-rails — code quality scaffolder
│   ├── cli/                    @mosaic/cli — unified CLI
│   ├── auth/                   @mosaic/auth — BetterAuth config + SSO adapters
│   ├── db/                     @mosaic/db — Drizzle schema, migrations, connection
│   ├── agent/                  @mosaic/agent — Pi SDK integration, agent pool manager
│   ├── memory/                 @mosaic/memory — tiered memory + summarization service
│   ├── log/                    @mosaic/log — agent log ingest + processing
│   └── design-tokens/          @mosaic/design-tokens — CSS vars, Tailwind preset, colors
├── plugins/
│   ├── discord/                @mosaic/discord-plugin — Discord channel
│   └── telegram/               @mosaic/telegram-plugin — Telegram channel
├── docker/
│   ├── gateway.Dockerfile
│   ├── web.Dockerfile
│   └── init-db.sql
├── docs/
│   ├── PRD.md                  (this file)
│   ├── TASKS.md
│   └── scratchpads/
├── docker-compose.yml
├── pnpm-workspace.yaml
├── turbo.json
├── tsconfig.base.json
├── vitest.workspace.ts
├── AGENTS.md
├── CLAUDE.md
└── README.md

Package Responsibilities

apps/gateway — @mosaic/gateway (NEW — critical path)

The central nervous system. All clients connect here. Built with NestJS (Fastify adapter).

  • NestJS modules — Each concern (chat, brain, agent, auth, queue, memory, plugins) is a module with clear boundaries
  • Fastify adapter — Fastify performance under NestJS's organizational structure
  • WebSocket gateway — NestJS built-in WebSocket support for chat streaming, agent status, notifications
  • Agent routing engine — Routes tasks to appropriate LLM provider/model based on task type, cost tier, capability requirements
  • Session management — Tracks active conversations, agent sessions, user contexts
  • MCP server — Exposes Mosaic capabilities as MCP tools
  • Plugin host — Loads and manages channel plugins (Discord, Telegram)
  • Auth middleware — BetterAuth session validation, RBAC enforcement

Key routes:

POST   /api/chat                    Send message, get streamed response
GET    /api/conversations           List conversations
POST   /api/conversations           Create conversation
GET    /api/conversations/:id       Get conversation with messages
DELETE /api/conversations/:id       Delete conversation
POST   /api/tasks                   Create task (brain-backed)
GET    /api/tasks                   List/filter tasks
PATCH  /api/tasks/:id               Update task
GET    /api/projects                List projects
POST   /api/projects                Create project
GET    /api/missions                List missions
POST   /api/missions                Create mission
GET    /api/missions/:id            Mission summary with tasks
POST   /api/agents/dispatch         Dispatch work to agent pool
GET    /api/agents/status            Active agent sessions
GET    /api/memory/search           Semantic search across memory
POST   /api/memory/preferences      Store learned preference
GET    /api/skills                  List available skills
POST   /api/skills/install          Install a skill
GET    /api/providers               List configured LLM providers
POST   /api/providers               Configure LLM provider
GET    /api/admin/users             User management (admin)
POST   /api/admin/users             Create user (admin)
WS     /ws/chat/:conversationId     Streaming chat via WebSocket
WS     /ws/agents                   Agent status stream
GET    /mcp                         MCP endpoint (streamable HTTP)

apps/web — Next.js Web Dashboard

Carried forward from jarvis-old with significant refactoring.

  • Chat/conversation UI (primary interaction surface)
  • Settings management (providers, integrations, profile)
  • Task management (list, kanban, detail views)
  • Project management (list, detail, linked missions)
  • Mission dashboard (status, progress, task breakdown)
  • PRD viewer/editor
  • Agent status dashboard (active sessions, routing stats)
  • Skill browser and installer
  • User management (admin RBAC panel)
  • Auth pages (login, SSO redirect, registration)

packages/types — @mosaic/types

Migrated from mosaic-mono-v0. Extended with:

  • Gateway types (routing, dispatch, agent pool)
  • Auth types (user, role, permission)
  • Conversation/message types (from jarvis-old domain)
  • Memory types (preference, insight, summary)
  • Plugin channel types (Discord, Telegram message mapping)

packages/brain — @mosaic/brain

Migrated from mosaic-mono-v0. Storage backend changes from JSON to PostgreSQL.

  • REST API preserved (mounted as gateway sub-router or standalone)
  • MCP tools preserved
  • Collections layer rewritten to use Drizzle ORM queries instead of JSON file I/O
  • Same entity model: tasks, projects, events, agents, missions, mission-tasks, tickets
  • New: computed endpoints (today, stale, stats, search, audit) run against PG
  • New: appreciation collection preserved for family use

packages/queue — @mosaic/queue

Migrated from mosaic-mono-v0 with minimal changes.

  • Valkey-backed task queue with atomic WATCH/MULTI/EXEC
  • MCP server with 8 tools
  • Used by gateway for agent task dispatch and coordination

packages/coord — @mosaic/coord

Migrated from mosaic-mono-v0.

  • Mission lifecycle: init, run, resume, status, drain
  • TASKS.md parsing and management
  • Session lock management
  • Continuation prompt generation
  • Integration with gateway for mission-driven orchestration

packages/db — @mosaic/db (NEW)

Shared database package.

  • Drizzle ORM schema definitions (all tables)
  • Migration management
  • Connection pool configuration
  • Shared by gateway, brain, auth, memory

packages/auth — @mosaic/auth (NEW)

Authentication and authorization.

  • BetterAuth configuration
  • SSO adapters: Authentik, WorkOS, Keycloak
  • RBAC: roles (admin, member, viewer), permissions
  • API key generation for brain/MCP access
  • Session management middleware

packages/agent — @mosaic/agent (NEW — critical path)

Pi SDK integration layer.

  • Agent pool manager — spawns and manages Pi agent sessions
  • Provider configuration — Anthropic, Codex, Z.ai, Ollama, LM Studio, llama.cpp
  • Agent routing logic — selects provider/model based on task characteristics
  • Tool registration — registers Mosaic-specific tools (brain access, queue ops, memory search)
  • Skill management — loads and configures Pi skills for agent sessions
  • Session lifecycle — create, monitor, complete, fail, timeout

packages/memory — @mosaic/memory (NEW)

Tiered memory system.

  • Preference store — learned user preferences, behaviors, defaults (PG)
  • Insight store — distilled knowledge from agent interactions (PG + vector)
  • Semantic search — query across memory using pgvector embeddings
  • Summarization pipeline — compress raw logs into structured insights
  • Memory API — used by gateway and agent sessions

packages/log — @mosaic/log (NEW)

Agent log service.

  • Log ingest — receives structured logs from agent sessions
  • Log parsing — extracts decisions, learnings, tool usage patterns
  • Tiered storage — hot (recent, full detail), warm (summarized), cold (archived)
  • Summarization trigger — invokes cheap LLM to compress aging logs
  • Retention policy — configurable TTLs per tier

packages/mosaic — @mosaic/mosaic

Migrated from mosaic-mono-v0, updated for v1.

  • Install wizard for Mosaic Stack setup
  • Detects existing installations, offers upgrade path
  • Configures ~/.config/mosaic/ with guides, tools, runtime configs

packages/prdy — @mosaic/prdy

Migrated from mosaic-mono-v0.

  • PRD generation wizard
  • Template-based PRD creation with Zod validation
  • CLI integration via mosaic prdy

packages/quality-rails — @mosaic/quality-rails

Migrated from mosaic-mono-v0.

  • TypeScript scaffolder for project quality config
  • Generates ESLint, tsconfig, Woodpecker, husky, lint-staged configs
  • Supports project types: monorepo, typescript-node, nextjs

packages/cli — @mosaic/cli

Migrated from mosaic-mono-v0, extended.

  • Unified mosaic binary
  • Subcommands: mosaic coord, mosaic prdy, mosaic queue, mosaic quality, mosaic gateway, mosaic brain
  • Plugin discovery for installed @mosaic/* packages

plugins/discord — @mosaic/discord-plugin (NEW — high priority)

Discord remote control channel. Architecture inspired by OpenClaw (https://github.com/openclaw/openclaw).

  • Channel plugin that registers with the gateway as a NestJS dynamic module
  • Single-guild binding only (v0.1.0) — prevents data leaks between servers
  • Receives Discord messages, dispatches through gateway routing
  • Streams agent responses back to Discord (chunked for 2000-char limit)
  • Supports mention-based activation, thread management for multi-turn
  • Bot pairing and permission management (Discord user → Mosaic user mapping)
  • DM support for private conversations

plugins/telegram — @mosaic/telegram-plugin (NEW)

Telegram remote control channel.

  • Same channel plugin pattern as Discord
  • Telegraf-based bot
  • Message routing through gateway
  • Inline keyboard for interactive responses

User/Stakeholder Requirements

US-001 Multi-Channel Chat

As a user, I can chat with an AI assistant via web browser, terminal (Pi TUI), Discord, or Telegram and get consistent responses regardless of channel.

US-002 Task & Project Dashboard

As a user, I can manage my tasks, projects, and missions from the web dashboard with kanban and list views.

US-003 PRD Management

As a user, I can view and edit PRDs for active missions from the web dashboard.

US-004 Agent Visibility

As a user, I can see which agents are active, what they're working on, and their status in real-time.

US-005 Provider Configuration

As a user, I can configure which LLM providers to use and set routing preferences (cost vs capability).

US-006 Skill Management

As a user, I can install and manage agent skills through the web dashboard.

US-007 Persistent Memory

As a user, the system remembers my preferences, learned behaviors, and past decisions across sessions.

As a user, I can search across my memory, conversations, and knowledge semantically.

US-009 User Management

As an admin, I can manage users, assign roles, and control access.

US-010 SSO Configuration

As an admin, I can configure SSO via Authentik, WorkOS, or Keycloak.

US-011 Self-Hosted Deployment

As a user, I can run Mosaic Stack via Docker Compose or directly on bare metal.

US-012 Intelligent Routing

As an agent operator, the gateway intelligently routes tasks to the cheapest capable model.

US-013 CLI Tooling

As a user, I can use the mosaic CLI for PRD creation, quality rail setup, queue management, and mission coordination.


Functional Requirements

  • FR-1: Chat System
  • FR-2: Gateway Orchestrator
  • FR-3: Agent Pool
  • FR-4: Task Management
  • FR-5: Project Management
  • FR-6: Mission System
  • FR-7: Memory System
  • FR-8: Authentication & Authorization
  • FR-9: Remote Control — Discord
  • FR-10: Remote Control — Telegram
  • FR-11: LLM Provider Management
  • FR-12: Agent Routing
  • FR-13: MCP Capability
  • FR-14: Skill Management
  • FR-15: CLI Integration
  • FR-16: Log Service
  • FR-17: Gateway State Persistence
  • FR-18: Multi-Session Agent Architecture
  • FR-19: Cron Scheduler
  • FR-20: Web Search Tool
  • FR-21: Skill Import from skills.sh

FR-1: Chat System

  • Conversation CRUD (create, list, get with messages, delete)
  • Real-time streaming responses via WebSocket
  • Multi-provider support (route to configured LLM)
  • Conversation history with search
  • Project-scoped conversations
  • System prompt per project/conversation
  • Message rendering with markdown, code blocks, tool call display

FR-2: Gateway Orchestrator

  • Central API surface for all clients (web, TUI, Discord, Telegram)
  • Agent dispatch — receive task, select provider/model, spawn Pi session, return result
  • Routing engine — cost/capability matrix, user preference overrides, task-type heuristics
  • Plugin host — load channel plugins at startup, manage lifecycle
  • MCP server — expose Mosaic tools via MCP protocol
  • WebSocket hub — real-time updates for chat, agent status, notifications
  • Rate limiting and request validation

FR-3: Agent Pool (@mosaic/agent)

  • Manage concurrent Pi SDK sessions
  • Provider configuration: API key management, endpoint URLs, model lists
  • Support providers: Anthropic (subscription + API), OpenAI/Codex (subscription + API), Z.ai, Ollama (local), LM Studio (local), llama.cpp (local)
  • Tool injection — all agent sessions get Mosaic tools (brain, queue, memory)
  • Skill loading — configure skills per agent session based on task type
  • Session monitoring — track active sessions, token usage, duration
  • Graceful shutdown — drain active sessions on shutdown

FR-4: Task Management

  • Brain-backed task CRUD with full filter/sort
  • Task statuses: backlog, scheduled, in-progress, blocked, done, cancelled
  • Priority levels: critical, high, medium, low
  • Domain categorization
  • Dependency tracking (blocks/blocked_by)
  • Project association
  • Assignee tracking
  • Kanban board view in web dashboard
  • Due date tracking with stale detection

FR-5: Project Management

  • Project CRUD with domain, status, priority
  • Link to repository, branch, current/next milestone
  • Progress tracking
  • Blocker tracking
  • Owner assignment

FR-6: Mission System

  • Mission CRUD (linked to project and PRD)
  • Mission tasks with phases, dependencies, ordering
  • Mission summary with computed progress
  • Mission coordination via @mosaic/coord
  • Active mission dashboard in web UI

FR-7: Memory System

  • Preferences: Key-value store for learned user preferences (e.g., "prefers tables over paragraphs", "timezone: America/Chicago")
  • Insights: Distilled knowledge from agent interactions, stored with embeddings
  • Semantic search: Query across all memory using natural language
  • Auto-capture: Agent sessions automatically log decisions and learnings
  • Summarization: Periodic compression of raw logs into structured insights
  • Decay: Old, unused insights decay in relevance score over time

FR-8: Authentication & Authorization

  • BetterAuth integration with Next.js
  • Email/password registration and login
  • SSO via OIDC/SAML: Authentik, WorkOS, Keycloak
  • RBAC roles: admin (full access), member (own resources + shared), viewer (read-only)
  • API key generation for programmatic/MCP access
  • Session management (web + API)

FR-9: Remote Control — Discord

  • Discord bot that connects to the gateway
  • Mention-based activation in channels
  • DM support for private conversations
  • Thread creation for multi-turn conversations
  • Chunked message delivery (Discord 2000-char limit)
  • Bot configuration via web dashboard
  • Permission management (which Discord users/roles can interact)

FR-10: Remote Control — Telegram

  • Telegram bot via Telegraf
  • Private and group chat support
  • Command-based interaction (/ask, /task, /status)
  • Inline keyboard for task management
  • Message routing through gateway

FR-11: LLM Provider Management

  • Provider configuration UI in web dashboard
  • Per-provider: API key/endpoint, enabled models, cost per token
  • Subscription-based providers: detect available models from subscription
  • Local providers: Ollama model list, LM Studio endpoint, llama.cpp binary path
  • Provider health monitoring
  • Usage tracking per provider/model

FR-12: Agent Routing

  • Task-type to model-tier mapping (from AGENTS.md cost matrix)
  • User preference overrides (e.g., "always use Claude for code review")
  • Fallback chains (if primary provider unavailable, try next)
  • Cost tracking and budget enforcement
  • Routing transparency — user can see why a particular model was chosen

FR-13: MCP Capability

  • Gateway exposes MCP server (streamable HTTP transport)
  • Brain tools registered as MCP tools
  • Queue tools registered as MCP tools
  • Memory search registered as MCP tool
  • Agent sessions can call MCP tools from other services
  • External MCP server connectivity (agent can use third-party MCP servers)

FR-14: Skill Management

  • Skill catalog — list available skills from configured sources
  • Skill install — install skill to ~/.config/mosaic/skills/ or project-local
  • Skill configuration — per-skill settings
  • Skill status — installed, available, update available
  • Web UI for browsing and managing skills

FR-15: CLI Integration

  • mosaic gateway start — start the gateway server
  • mosaic brain — brain data management
  • mosaic queue — queue operations
  • mosaic coord — mission coordination
  • mosaic prdy — PRD wizard
  • mosaic quality — quality rail management
  • mosaic tui — launch Pi TUI connected to gateway

FR-16: Log Service

  • Structured log ingest from agent sessions
  • Parse logs for: decisions made, tools used, errors encountered, learnings captured
  • Tier management: hot (7 days, full detail), warm (30 days, summarized), cold (90 days, key facts only)
  • Summarization pipeline: cheap LLM compresses aging logs on schedule
  • Query interface for log search

FR-17: Gateway State Persistence

  • Orchestration state persisted to Valkey (active sessions, pending dispatches, routing context)
  • On restart, gateway reads Valkey state and resumes — reconnects to active agent sessions
  • mosaic gateway restart --fresh clears Valkey queue and all in-flight state (nuclear option)
  • Session recovery: detect orphaned agent sessions, offer reconnect or cleanup

FR-18: Multi-Session Agent Architecture

  • Each agent has a distinct session with dedicated context
  • Multiple input channels (TUI, web, Discord, Telegram) can connect to same agent session
  • Channel multiplexing at gateway level with proper authorization
  • Discord channel ID paired to specific agent/session (prevents cross-contamination)
  • Agent session runs in named tmux session for barge-in capability
  • mosaic agent attach <session> connects to agent's tmux session
  • mosaic agent list shows active sessions with connected channels

FR-19: Cron Scheduler

  • Built-in cron scheduler in gateway for recurring tasks
  • Default schedules: log summarization, stale task detection, memory decay, provider health checks
  • Custom schedules: user-defined agent dispatches on cron expressions
  • Schedule management via web dashboard and CLI
  • Cron jobs dispatched through normal gateway routing pipeline
  • Persistence: schedules stored in PG, survive gateway restart

FR-20: Web Search Tool

  • DuckDuckGo web search via MCP server (primary — privacy-respecting, no API key)
  • Registered as standard MCP tool available to all agent sessions
  • Configurable: can swap to other search providers (Brave, SearXNG, Tavily)
  • Results formatted for agent consumption (title, snippet, URL)

FR-21: Skill Import from skills.sh

  • Browse skills from https://skills.sh directory via API
  • Import skills into ~/.config/mosaic/skills/ or project-local .mosaic/skills/
  • Vetting workflow: imported skills marked as "unvetted" until admin approves
  • Skill review interface in web dashboard (view skill content before approval)
  • Vetted skills auto-available to agent sessions; unvetted require explicit enable
  • mosaic skill import <source/skillId> CLI command
  • Track installed skills, versions, update availability

Non-Functional Requirements

Security

  • No hardcoded secrets — all secrets via environment variables or vault
  • API key rotation capability
  • RBAC enforcement at gateway level
  • Input validation (Zod) on all API endpoints
  • Rate limiting on public endpoints
  • CORS configuration for web app
  • Secure WebSocket connections
  • SSO token validation
  • Database connection encryption (SSL)

Performance

  • Chat response streaming latency < 200ms TTFB (gateway overhead, not LLM latency)
  • Dashboard page loads < 2s
  • Brain query responses < 100ms for filtered reads
  • Semantic search < 500ms
  • Support 10+ concurrent agent sessions
  • WebSocket connection handling for 50+ concurrent users

Reliability

  • Graceful degradation when LLM provider is unavailable (fallback chain)
  • Queue persistence — tasks survive gateway restart
  • Database connection pooling with retry
  • Health check endpoints for all services
  • Structured error responses with correlation IDs

Observability (Wide-Event Logging — Required from Phase 0)

  • OpenTelemetry instrumentation across all services from day one
    • @opentelemetry/sdk-node + @opentelemetry/auto-instrumentations-node for auto-instrumentation (HTTP, PG, Fastify/NestJS)
    • NestJS interceptors for custom spans on agent dispatch, routing decisions, memory writes, summarization runs
    • Every significant operation emits a structured event with rich context (wide events, not just request/response)
  • SigNoz as OTEL backend (single Docker service: traces, metrics, logs, built-in UI)
  • Request tracing with correlation IDs (trace-id propagated across gateway → agent → brain → queue)
  • Agent session metrics (duration, tokens, cost, success/failure, model used, routing reason)
  • Provider availability monitoring (health check spans)
  • Queue depth monitoring (periodic gauge metrics)
  • Memory usage metrics (embedding count, search latency, summarization runs)
  • Migrate to Grafana stack (Tempo + Loki + Grafana) post-beta if more customization is needed

Scalability (Multi-Tier Readiness)

  • Single-node deployment is the MVP target for v0.1.0
  • Code structured with assumption that multi-tiered deployment will follow: dedicated gateway nodes, agent worker nodes, brain/DB nodes
  • Service boundaries communicate via HTTP/WS/MCP APIs, not in-process calls where avoidable
  • Gateway is stateless (all state in PG/Valkey) to enable horizontal scaling
  • Agent pool designed as independently scalable service
  • Database migrations support forward-only schema evolution
  • Hierarchical deployment with dedicated roles/specialties is the post-beta target

Acceptance Criteria

AC-1: Core Chat Flow

  • User can log in via web UI, send a message, and receive a streamed response
  • Conversation persists across page refreshes
  • User can create, list, search, and delete conversations
  • Conversations can be scoped to projects

AC-2: TUI Integration

  • mosaic tui launches Pi interactive mode connected to gateway
  • User can chat with same conversation context as web UI
  • Agent has access to brain, queue, and memory tools

AC-3: Discord Remote Control

  • Discord bot connects and responds to mentions
  • Messages route through gateway to agent pool
  • Responses stream back to Discord (chunked)
  • Thread creation for multi-turn conversations

AC-4: Gateway Orchestration

  • Gateway dispatches tasks to appropriate provider/model
  • Routing decision logged and inspectable
  • Fallback when primary provider unavailable
  • Multiple concurrent agent sessions managed correctly

AC-5: Task & Project Management

  • CRUD operations for tasks, projects, missions via web dashboard
  • Kanban board view for tasks
  • Mission progress tracking with computed stats
  • Brain MCP tools accessible from agent sessions

AC-6: Memory System

  • Agent sessions auto-capture decisions and learnings
  • Semantic search returns relevant past context
  • Learned preferences are applied in new sessions
  • Log summarization runs on schedule, old logs compressed

AC-7: Authentication & RBAC

  • Email/password login works
  • At least one SSO provider (Authentik) works end-to-end
  • Admin can create users and assign roles
  • RBAC enforced on API endpoints

AC-8: Multi-Provider LLM Support

  • At least 3 providers configured and routing correctly (e.g., Anthropic + Ollama + Z.ai)
  • Agent routing selects appropriate model for task type
  • Provider configuration manageable from web UI

AC-9: MCP

  • Gateway exposes MCP endpoint
  • Brain and queue tools callable via MCP
  • Agent sessions can connect to external MCP servers

AC-10: Deployment

  • docker compose up starts full stack from clean state
  • mosaic CLI installable and functional on bare metal
  • Database migrations run automatically on first start
  • .env.example documents all required configuration

AC-11: @mosaic/* Packages

  • All 7 migrated packages build, pass tests, and integrate with gateway
  • mosaic CLI provides subcommands for each package
  • Types package is the single source of shared interfaces

Constraints and Dependencies

  1. Pi SDK — Core dependency; any Pi breaking changes affect the agent layer. Pin to known-good version.
  2. BetterAuth — Auth framework; must support SSO adapters. Verify Authentik/WorkOS/Keycloak support before committing.
  3. Drizzle ORM — Database layer; must support PostgreSQL + pgvector extension.
  4. Discord API — Rate limits, intent requirements, message size limits (2000 chars).
  5. Valkey — Queue backend; must be available for queue and caching.
  6. Gitea registry — Package publishing target; .npmrc must be configured.
  7. OpenClaw — Reference architecture for Discord/Telegram plugin pattern (https://github.com/openclaw/openclaw). Inspiration only, not a dependency.

Risks and Open Questions

Risks

Risk Likelihood Impact Mitigation
Pi SDK API instability (pre-1.0) Medium High Pin version, abstract behind @mosaic/agent interface
Brain PG migration complexity Medium Medium Preserve Brain REST/MCP API contract; only storage changes
Discord plugin complexity (OpenClaw has ~60 files) Medium Medium Start minimal (DM + mention in channel), single-guild only; expand iteratively post-beta
LLM provider subscription auth varies by provider Medium Medium Abstract behind provider interface; implement per-provider adapters
Drizzle + pgvector extension compatibility Low Medium Validate in Phase 0 with spike
Agent log volume overwhelming storage Medium High Tiered storage with aggressive summarization; configurable retention
Scope creep from jarvis-old feature surface High High Strict v0.1.0 scope; features not listed above are post-beta

Open Questions

# Question Priority Status
1 Pi SDK version to pin for v0.1.0? High Resolved — Pin @mariozechner/pi-coding-agent@~0.57.1 (current stable). Abstract behind @mosaic/agent interface to insulate from breaking changes. Bump deliberately after testing.
2 Authentik vs WorkOS vs Keycloak — which SSO provider to implement first? Medium Resolved — Authentik first (already in Jason's infrastructure)
3 Vector DB: pgvector sufficient or need Qdrant from the start? Medium Resolved — pgvector with VectorStore interface abstraction. Qdrant drops in later if needed.
4 Summarization LLM: which model for log compression? Medium Resolved — Haiku-tier default with structured output guardrails, configurable via routing engine.
5 LM Studio and llama.cpp — provider adapters exist in Pi or need custom? Medium Resolved — Pi handles both natively. LM Studio and llama.cpp (server mode) expose OpenAI-compatible APIs; configure via Pi's models.json with openai-completions API type. No custom adapters needed.
6 Discord bot — single guild or multi-guild from day one? Medium Resolved — Single-guild only for v0.1.0 to prevent data leaks. Bot binds to one guild. Multi-guild with tenant isolation is a post-beta feature requiring explicit data boundary design.
7 Bare-metal install — systemd units or just docs? Low ASSUMPTION: Docs + CLI launch commands; systemd units post-beta

Testing and Verification Expectations

  1. Baseline checks: pnpm typecheck && pnpm lint && pnpm test must pass across all packages
  2. Unit tests: Vitest for all packages; mocked dependencies for isolation
  3. Integration tests: Gateway + Brain + Queue with test PG + Valkey (Docker services in CI)
  4. E2E tests: Playwright for web dashboard critical paths (login, chat, task CRUD)
  5. Agent tests: Pi SDK session tests with mock provider (verify tool registration, routing)
  6. Evidence format: CI pipeline green + test count report per package

Milestone / Delivery Intent

All work is alpha (< 0.1.0) until Jason approves 0.1.0 beta release.

Phase 0: Foundation (v0.0.1)

  • Scaffold monorepo (pnpm + turbo + tsconfig + eslint + vitest)
  • @mosaic/types — migrate and extend from v0
  • @mosaic/db — Drizzle schema, PG connection, migrations
  • @mosaic/auth — BetterAuth setup with email/password
  • OTEL foundation — @opentelemetry/sdk-node setup, SigNoz in docker-compose, trace propagation wired
  • Docker Compose (PG 17 + Valkey + SigNoz)
  • CI pipeline (Woodpecker)
  • AGENTS.md, CLAUDE.md, README.md

Phase 1: Core API (v0.0.2)

  • apps/gateway — NestJS server (Fastify adapter), auth middleware, health endpoints
  • @mosaic/brain — migrate from v0, swap JSON store for PG via @mosaic/db
  • @mosaic/queue — migrate from v0 (minimal changes)
  • Gateway routes: conversations, tasks, projects, missions
  • WebSocket server for chat streaming
  • Basic agent dispatch (single provider, no routing)

Phase 2: Agent Layer (v0.0.3)

  • @mosaic/agent — Pi SDK integration, agent pool manager
  • Multi-provider support (Anthropic + Ollama minimum)
  • Agent routing engine (cost/capability matrix)
  • Tool registration (brain, queue, memory tools injected into agent sessions)
  • @mosaic/coord — migrate from v0, integrate with gateway

Phase 3: Web Dashboard (v0.0.4)

  • apps/web — Next.js app with BetterAuth
  • Chat UI (conversation list, message display, streaming input)
  • Task management (list + kanban)
  • Project and mission views
  • Settings (provider config, profile)
  • Admin panel (user management, RBAC)

Phase 4: Memory & Intelligence (v0.0.5)

  • @mosaic/memory — preference store, insight store, semantic search
  • @mosaic/log — log ingest, parsing, tiered storage
  • Summarization pipeline
  • Memory integration into agent sessions
  • Skill management interface (web UI + CLI)

Phase 5: Remote Control (v0.0.6)

  • @mosaic/discord-plugin — Discord channel plugin
  • @mosaic/telegram-plugin — Telegram channel plugin
  • Plugin host in gateway
  • SSO configuration (Authentik)

Phase 6: CLI & Tools (v0.0.7)

  • @mosaic/cli — unified CLI with all subcommands
  • @mosaic/prdy — migrate from v0
  • @mosaic/quality-rails — migrate from v0
  • @mosaic/mosaic — install wizard updated for v1
  • Pi TUI integration (mosaic tui)

Phase 7: Polish & Beta (v0.0.8 → v0.1.0)

  • MCP endpoint hardening
  • Additional SSO providers (WorkOS/Keycloak)
  • Additional LLM providers (Codex, Z.ai, LM Studio, llama.cpp)
  • Bare-metal deployment documentation
  • E2E test suite
  • Performance optimization
  • Documentation: user guide, admin guide, developer guide
  • Jason approval gate → v0.1.0 beta release

Assumptions

  1. RESOLVED: pgvector is sufficient for semantic search at v0.1.0 scale (personal/family/team = thousands to low hundreds-of-thousands of vectors). @mosaic/memory defines a VectorStore interface with pgvector as the default adapter. The interface boundary makes Qdrant a drop-in migration if PG resource contention or scale demands it later. Zero additional infrastructure for v0.1.0. Rationale: Reduces ops burden; pgvector HNSW indexes are fast at this scale; interface abstraction costs almost nothing now.

  2. RESOLVED: Authentik is the first SSO provider — confirmed, already running in Jason's infrastructure. WorkOS and Keycloak adapters follow in Phase 7.

  3. RESOLVED: NestJS with Fastify adapter for the gateway. The gateway's complexity (plugin host, agent pool, routing engine, WebSocket hub, MCP server, auth, brain/queue/memory/log integration) warrants NestJS's module system, DI, and guards. Fastify performance preserved via adapter. Aligns with USER.md stated stack ("NestJS API + Next.js web"). @mosaic/brain's Fastify code migrates into a NestJS module.

  4. RESOLVED: OpenTelemetry from Phase 0. Wide-event logging is required from the start. OTEL auto-instrumentation for NestJS/PG/HTTP via @opentelemetry/sdk-node. SigNoz as the all-in-one OTEL backend (single Docker service). Every significant operation emits structured events with rich context. Custom spans for agent dispatch, routing decisions, memory writes. Rationale: Retrofitting observability is painful; baking it in from day one means consistent instrumentation across all services.

  5. ASSUMPTION: Single-node deployment for v0.1.0, but code structured for multi-tier. No Kubernetes yet. Docker Compose + bare metal. Service boundaries use HTTP/WS/MCP APIs (not in-process) so gateway, agent pool, and brain can split to separate nodes later. Rationale: Ship single-node MVP; the architecture doesn't fight horizontal scaling when needed.

  6. ASSUMPTION: Log summarization uses Haiku-tier LLM by default, configurable. Haiku is well-suited for summarization (compression, not generation — source material is in context). Guardrails: structured output via Zod schema (force extraction of decisions/tools/outcomes/errors as discrete fields), chunked per-session processing (no bulk conflation), extraction-focused prompts. Raw logs stay in hot tier (7 days) as safety net. Users can override the summarization model via routing engine config if they want higher fidelity. Rationale: Haiku is 10-20x cheaper than Sonnet; log summarization runs on schedule against large volumes where cost matters.

  7. ASSUMPTION: Discord plugin starts minimal and single-guild only — DM support, mention-based channel activation, thread management, chunked responses. Single guild binding to prevent data leaks between servers. Advanced features (voice, components, slash commands, multi-guild) are post-beta. Rationale: Proven pattern from OpenClaw; ship core interaction first; data isolation is non-negotiable.

  8. ASSUMPTION: Telegram plugin is lower priority than Discord and may ship as v0.0.7 or later if Discord takes longer than expected. Rationale: Jason indicated Discord as the high-priority remote channel.

  9. ASSUMPTION: Brain's REST API is preserved as a gateway sub-router (mounted at /api/brain/* or similar). Existing MCP tools continue to work. Only the storage backend changes. Rationale: Minimize migration risk; brain's API contract is proven.

  10. ASSUMPTION: Conversations and messages get their own PG tables (not stored in brain's entity model). They follow a chat-specific schema with proper foreign keys to users and projects. Rationale: Chat has different access patterns (streaming, pagination, search) than brain entities.

  11. RESOLVED: Pi handles all target LLM providers natively. Anthropic, OpenAI/Codex, Z.ai, Ollama, LM Studio, and llama.cpp are all supported via Pi's built-in providers or models.json configuration with openai-completions API type. No custom provider adapters needed in @mosaic/agent — only configuration management.