Co-authored-by: Jason Woltje <jason@diversecanvas.com> Co-committed-by: Jason Woltje <jason@diversecanvas.com>
34 KiB
PRD: Harness Foundation — Phase 9
Metadata
- Owner: Jason Woltje
- Date: 2026-03-21
- Status: draft
- Phase: 9 (post-MVP)
- Version Target: v0.2.0
- Agent Harness: Pi SDK
- Best-Guess Mode: true
- Repo:
git.mosaicstack.dev/mosaic/mosaic-stack
Problem Statement
Mosaic Stack v0.1.0 delivered a functional skeleton — gateway boots, TUI connects, single-agent chat streams, basic auth works. But the system is not usable as a daily-driver harness:
-
Chat messages are fire-and-forget. The WebSocket gateway never calls ConversationsRepo. Context is lost on disconnect. Conversations can't be resumed with history. Cross-interface continuity (TUI → WebUI → Matrix) is impossible.
-
Single provider (Ollama) with local models only. No access to frontier models (Claude Opus 4.6, Codex gpt-5.4, GLM-5). The routing engine exists but has never been tested with real providers.
-
No task-aware agent routing. A coding task and a summarization task route to the same agent with the same model. There is no mechanism to match tasks to agents by capability, cost tier, or specialization.
-
Memory is not user-scoped. Insight vector search returns all users' data. Deploying multi-user is a security violation.
-
Agent configs exist in DB but are ignored. Stored system prompts, model preferences, and tool allowlists don't apply to sessions. The
/modeland/agentslash commands are stubbed. -
No job queue. Background processing (summarization, GC, tier management) runs on fragile cron. No retry, no monitoring, no async task dispatch foundation for future agent orchestration.
-
Plugin system is hollow. Zero implementations. No defined message protocol. Blocks all remote interfaces (Matrix, Discord, Telegram) planned for Phase 10+.
What this phase solves: Transform Mosaic from a demo into a real multi-provider, task-routing AI harness that persists everything, routes intelligently, and is architecturally ready for multi-agent and remote control.
Objectives
- Persistent conversations — Every message saved, every conversation resumable, full context available across interfaces
- Multi-provider LLM access — Anthropic, OpenAI, OpenRouter, Z.ai, Ollama with proper auth flows
- Task-aware agent routing — Granular routing rules that match tasks to the right agent + model by capability, cost, and domain
- Security isolation — All data queries user-scoped, ready for multi-user deployment
- Session hardening — Agent configs apply, model/agent switching works mid-session
- Reliable background processing — BullMQ job queue replaces fragile cron
- Channel protocol design — Architecture for Matrix and remote interfaces, built into the foundation now
Scope
In Scope
- Conversation persistence — wire ChatGateway to ConversationsRepo, context loading on resume
- Multi-provider integration — Anthropic, OpenAI, OpenRouter, Z.ai, Ollama with auth flows
- Task-aware agent routing — granular routing rules with task classification and fallback chains
- Security isolation — user-scoped queries on all data paths (memory, conversations, agents)
- Agent session hardening — configs apply, model/agent switching, session resume
- Job queue — BullMQ replacing cron for background processing
- Channel protocol design — architecture document for Matrix and remote interfaces
- Embedding migration — Ollama-local embeddings replacing OpenAI dependency
Out of Scope
- Matrix homeserver deployment + appservice (Phase 10)
- Multi-agent orchestration / supervisor-worker pattern (Phase 10+)
- WebUI rebuild (future)
- Self-managing memory — compaction, merge, forget (future)
- Team workspace isolation (future)
- Remote channel plugins — WhatsApp, Discord, Telegram (Phase 10+, via Matrix)
- Fine-grained RBAC — project/agent/team roles (future)
- Agent-to-agent communication (Phase 10+)
User/Stakeholder Requirements
- As a user, I can resume a conversation after closing the TUI and the agent remembers the full context
- As a user, I can use frontier models (Claude Opus 4.6, Codex gpt-5.4) without manual provider configuration
- As a user, the system automatically selects the best model for my task (coding → powerful model, simple question → cheap model)
- As a user, I can override the automatic model selection with
/model <name>at any time - As a user, I can switch between specialized agents mid-session with
/agent <name> - As an admin, I can define routing rules that control which models handle which task types
- As an admin, I can monitor background job health and retry failed jobs
- As a user, my conversations, memories, and preferences are invisible to other users
Functional Requirements
- FR-1: ChatGateway persists every message (user, assistant, tool call, thinking) to the conversations/messages tables
- FR-2: On session resume with an existing conversationId, message history is loaded from DB and injected into the agent session context
- FR-3: When conversation history exceeds 80% of the model's context window, older messages are summarized and prepended as a context checkpoint
- FR-4: Five LLM providers are registered with the gateway: Anthropic (Claude Sonnet 4.6, Opus 4.6, Haiku 4.5), OpenAI (Codex gpt-5.4), OpenRouter (dynamic model list), Z.ai (GLM-5), Ollama (local models)
- FR-5: Each provider supports API key auth; Anthropic and OpenAI additionally support OAuth (URL-display + callback pattern)
- FR-6: Provider credentials are stored per-user in the DB (encrypted), not in environment variables
- FR-7: A routing engine classifies each user message by taskType, complexity, domain, and required capabilities, then selects the optimal provider/model via priority-ordered rules
- FR-8: Default routing rules are seeded on first run; admins can customize system-wide rules; users can set per-session overrides
- FR-9: Routing decisions are transparent — the TUI shows which model was selected and why
- FR-10: Agent configs (system prompt, default model, tool allowlist, skills) stored in DB are applied when creating agent sessions
- FR-11:
/model <name>switches the active model for subsequent messages in the current session - FR-12:
/agent <name>switches to a different agent config, loading its system prompt, tools, and default model - FR-13: All memory queries (insight vector search, preferences) filter by userId
- FR-14: BullMQ handles background jobs (summarization, GC, tier management) with retry, backoff, and monitoring
- FR-15: Embeddings are served locally via Ollama (nomic-embed-text or mxbai-embed-large) with no external API dependency
Non-Functional Requirements
- Security: All data queries include userId filter. Provider credentials encrypted at rest. No cross-user data leakage. OAuth tokens stored securely with refresh handling.
- Performance: Message persistence adds <50ms to message relay latency. Routing classification <100ms per message. Provider health checks run on configurable interval (default 60s) without blocking requests.
- Reliability: BullMQ jobs retry with exponential backoff (3 attempts default). Provider failover: if primary provider is unhealthy, fallback chain activates automatically. Conversation context survives TUI restart.
- Observability: Routing decisions logged with classification details. Job execution logged to agent_logs. Provider health status exposed via
/api/providers/health. Session metrics (tokens, model switches, duration) persisted in DB.
Acceptance Criteria
- AC-1: Send messages in TUI → restart TUI → resume conversation → agent has full history and context
- AC-2: Route a coding task to Claude Opus 4.6, a simple question to Haiku, a summarization to GLM-5 — all via granular routing rules
- AC-3: Two users exist, User A's memory searches never return User B's data
- AC-4:
/model claude-sonnet-4-6in TUI switches the active model for subsequent messages - AC-5:
/agent coding-agentin TUI switches to a different agent with different system prompt and tools - AC-6: BullMQ jobs execute on schedule, failures retry with backoff, admin can inspect via
/api/admin/jobs - AC-7: Channel protocol document exists with Matrix integration points defined, reviewed, and approved
- AC-8: Embeddings run on Ollama local models (no external API dependency for vector operations)
- AC-9: All five providers (Anthropic, OpenAI, OpenRouter, Z.ai, Ollama) connect, list models, and complete chat requests
- AC-10: Routing transparency — TUI displays which model was selected and the routing reason for each response
Testing and Verification Expectations
- Baseline checks:
pnpm typecheck,pnpm lint,pnpm format:check— all green before any push - Unit tests: Routing engine rules matching, task classifier, provider adapter registration, message persistence
- Integration tests: Two-user isolation (M2-007), provider round-trip (M3-012), routing end-to-end (M4-013), session resume with context (M1-008)
- Situational tests per milestone: Each milestone has a verify task that exercises the delivered functionality end-to-end
- Evidence format: Test output + manual verification notes in scratchpad per milestone
Constraints and Dependencies
| Type | Item | Notes |
|---|---|---|
| Dependency | @anthropic-ai/sdk |
npm, required for M3-002 |
| Dependency | openai |
npm, required for M3-003 |
| Dependency | bullmq |
npm, Valkey-compatible, required for M6 |
| Dependency | Ollama embedding models | ollama pull nomic-embed-text, required for M3-009 |
| Dependency | Pi SDK provider adapter support | ASSUMPTION: supported — verify in M3-001 |
| External | Anthropic OAuth credentials | Requires Anthropic Console setup |
| External | OpenAI OAuth credentials | Requires OpenAI Platform setup |
| External | Z.ai API key | Requires Z.ai account |
| External | OpenRouter API key | Requires OpenRouter account |
| Constraint | Valkey 8 compatibility | BullMQ requires Redis 6+; Valkey 8 is compatible |
| Constraint | Embedding dimension migration | Switching from 1536 (OpenAI) to 768/1024 (Ollama) requires re-embedding or fresh start |
Assumptions
- ASSUMPTION: Pi SDK supports custom provider adapters for all target LLM providers. If not, adapters wrap native SDKs behind Pi's interface. Rationale: Gateway already uses Pi with Ollama via a custom adapter pattern.
- ASSUMPTION: BullMQ is Valkey-compatible. Rationale: BullMQ documents Redis 6+ compatibility; Valkey 8 is Redis-compatible.
- ASSUMPTION: Ollama can serve embedding models (nomic-embed-text, mxbai-embed-large) with acceptable quality. Rationale: Ollama supports embedding endpoints natively.
- ASSUMPTION: Anthropic and OpenAI OAuth flows can be handled via URL-display + token callback pattern (same as existing provider auth). Rationale: Both providers offer standard OAuth 2.0 flows.
- ASSUMPTION: Z.ai GLM-5 uses an API format compatible with OpenAI or has a documented SDK. Rationale: Most LLM providers converge on OpenAI-compatible APIs.
- ASSUMPTION: The existing Pi SDK session model supports mid-session model switching without destroying session state. If not, we destroy and recreate with conversation history. Rationale: Acceptable fallback — context is persisted in DB.
- ASSUMPTION: Channel protocol design can be completed without a running Matrix homeserver. Rationale: Matrix protocol is well-documented; design is architecture, not integration.
Milestones
Milestone 1: Conversation Persistence & Context
Goal: Every message persisted. Every conversation resumable with full context.
| Task | Description |
|---|---|
| M1-001 | Wire ChatGateway.handleMessage() → ConversationsRepo.addMessage() for user messages |
| M1-002 | Wire agent event relay → ConversationsRepo.addMessage() for assistant responses (text, tool calls, thinking) |
| M1-003 | Store message metadata: model used, provider, token counts, tool call details, timestamps |
| M1-004 | On session resume (existing conversationId), load message history from DB and inject into Pi session context |
| M1-005 | Context window management: if history exceeds model context, summarize older messages and prepend summary |
| M1-006 | Conversation search: full-text search on messages table via /api/conversations/search |
| M1-007 | TUI: /history command to display conversation message count and context usage |
| M1-008 | Verify: send messages → kill TUI → resume with -c <id> → agent references prior context |
Milestone 2: Security & Isolation
Goal: All data queries user-scoped. Safe for multi-user deployment.
| Task | Description |
|---|---|
| M2-001 | Audit InsightsRepo: add userId filter to searchByEmbedding() vector search |
| M2-002 | Audit InsightsRepo: add userId filter to findByUser(), decayOldInsights() |
| M2-003 | Audit PreferencesRepo: verify all queries filter by userId |
| M2-004 | Audit agent memory tools: verify memory_search, memory_save_*, memory_get_* all scope to session user |
| M2-005 | Audit ConversationsRepo: verify ownership check on findById, update, delete, addMessage, findMessages |
| M2-006 | Audit AgentsRepo: verify findAccessible() returns only user's agents + system agents |
| M2-007 | Add integration test: create two users, populate data for each, verify cross-user isolation on every query path |
| M2-008 | Audit Valkey keys: verify session keys include userId or are not enumerable across users |
Milestone 3: Provider Integration
Goal: Five providers operational with proper auth, health checking, and capability metadata.
| Task | Description |
|---|---|
| M3-001 | Refactor ProviderService into provider adapter pattern: IProviderAdapter interface with register(), listModels(), healthCheck(), createClient() |
| M3-002 | Anthropic adapter: @anthropic-ai/sdk, register Claude Sonnet 4.6 + Opus 4.6, OAuth flow (URL display + callback), API key fallback |
| M3-003 | OpenAI adapter: openai SDK, register Codex gpt-5.4, OAuth flow, API key fallback |
| M3-004 | OpenRouter adapter: OpenAI-compatible client, API key auth, dynamic model list from /api/v1/models |
| M3-005 | Z.ai GLM adapter: register GLM-5, API key auth, research and implement API format |
| M3-006 | Ollama adapter: refactor existing Ollama integration into adapter pattern, add embedding model support |
| M3-007 | Provider health check: periodic probe (configurable interval), status per provider, expose via /api/providers/health |
| M3-008 | Model capability matrix: define per-model metadata (tier, context window, tool support, vision, streaming, embedding capable) |
| M3-009 | Refactor EmbeddingService: replace OpenAI-hardcoded client with provider-agnostic interface, Ollama as default (nomic-embed-text or mxbai-embed-large) |
| M3-010 | OAuth token storage: persist provider tokens per user in DB (encrypted), refresh flow |
| M3-011 | Provider config UI support: /api/providers CRUD for user-scoped provider credentials |
| M3-012 | Verify: each provider connects, lists models, completes a chat request, handles errors gracefully |
Milestone 4: Agent Routing Engine
Goal: Granular, rule-based routing that matches tasks to the right agent and model by capability, cost, and domain specialization.
| Task | Description |
|---|---|
| M4-001 | Define routing rule schema: RoutingRule { name, priority, conditions[], action } stored in DB |
| M4-002 | Condition types: taskType (coding, research, summarization, conversation, analysis, creative), complexity (simple, moderate, complex), domain (frontend, backend, devops, docs, general), costTier (cheap, standard, premium), requiredCapabilities (tools, vision, long-context, reasoning) |
| M4-003 | Action types: routeTo { provider, model, agentConfigId?, systemPromptOverride?, toolAllowlist? } |
| M4-004 | Default routing rules (seed data): coding → Opus 4.6, simple Q&A → Sonnet 4.6, summarization → GLM-5, research → Codex gpt-5.4, local/offline → Ollama llama3.2 |
| M4-005 | Task classification: lightweight classifier that infers taskType + complexity from user message (can be rule-based regex/keyword initially, LLM-assisted later) |
| M4-006 | Routing decision pipeline: classify task → match rules by priority → select best available provider/model → fallback chain if primary unavailable |
| M4-007 | Routing override: user can force a specific model via /model <name> regardless of routing rules |
| M4-008 | Routing transparency: include routing decision in session:info event (why this model was selected) |
| M4-009 | Routing rules CRUD: /api/routing/rules — list, create, update, delete, reorder priority |
| M4-010 | Per-user routing overrides: users can customize default rules for their sessions |
| M4-011 | Agent specialization: agents can declare capabilities in their config (domains, preferred models, tool sets) |
| M4-012 | Routing integration: wire routing engine into ChatGateway — every new message triggers routing decision before agent dispatch |
| M4-013 | Verify: send a coding question → routed to Opus; send "summarize this" → routed to GLM-5; send "what time is it" → routed to cheap tier |
Milestone 5: Agent Session Hardening
Goal: Agent configs apply to sessions. Model and agent switching work mid-session.
| Task | Description |
|---|---|
| M5-001 | Wire ChatGateway: on session create, load agent config from DB (system prompt, model, provider, tool allowlist, skills) |
| M5-002 | /model <name> command: end-to-end wiring — TUI → socket command:execute → gateway switches provider/model → new messages use new model |
| M5-003 | /agent <name> command: switch to different agent config mid-session — loads new system prompt, tools, and default model |
| M5-004 | Session ↔ conversation binding: persist sessionId on conversation record, allow session resume via conversation ID |
| M5-005 | Session info broadcast: on model/agent switch, emit session:info with updated provider, model, agent name |
| M5-006 | Agent creation from TUI: /agent new command creates agent config via gateway API |
| M5-007 | Session metrics: track per-session token usage, model switches, duration — persist in DB |
| M5-008 | Verify: start TUI → /model claude-opus-4-6 → verify response uses Opus → /agent research-bot → verify system prompt changes |
Milestone 6: Job Queue Foundation
Goal: Reliable background processing via BullMQ. Foundation for future agent task orchestration.
| Task | Description |
|---|---|
| M6-001 | Add BullMQ dependency, configure with Valkey connection |
| M6-002 | Create queue service: typed job definitions, worker registration, error handling with exponential backoff |
| M6-003 | Migrate summarization cron → BullMQ repeatable job |
| M6-004 | Migrate GC (session cleanup) → BullMQ repeatable job |
| M6-005 | Migrate tier management (log archival) → BullMQ repeatable job |
| M6-006 | Admin jobs API: GET /api/admin/jobs — list active/completed/failed jobs, retry failed, pause/resume queues |
| M6-007 | Job event logging: emit job start/complete/fail events to agent_logs for observability |
| M6-008 | Verify: jobs execute on schedule, deliberate failure retries with backoff, admin endpoint shows job history |
Milestone 7: Channel Protocol Design
Goal: Architecture document defining how remote interfaces (Matrix, Discord, Telegram) will integrate. No code — design only. Built into foundation now so Phase 10+ doesn't require gateway rewrites.
| Task | Description |
|---|---|
| M7-001 | Define IChannelAdapter interface: lifecycle (connect, disconnect, health), message flow (receiveMessage → gateway, sendMessage ← gateway), identity mapping (channel user ↔ Mosaic user) |
| M7-002 | Define channel message protocol: canonical message format that all adapters translate to/from (content, metadata, attachments, thread context) |
| M7-003 | Design Matrix integration: appservice registration, room ↔ conversation mapping, space ↔ team mapping, agent ghost users, power levels for human observation |
| M7-004 | Design conversation multiplexing: same conversation accessible from TUI + WebUI + Matrix simultaneously, real-time sync via gateway events |
| M7-005 | Design remote auth bridging: how a Matrix/Discord message authenticates to Mosaic (token linking, OAuth bridge, invite-based provisioning) |
| M7-006 | Design agent-to-agent communication via Matrix rooms: room per agent pair, human can join to observe, message format for structured agent dialogue |
| M7-007 | Design multi-user isolation in Matrix: space-per-team, room visibility rules, encryption considerations, admin visibility |
| M7-008 | Publish architecture doc: docs/architecture/channel-protocol.md — reviewed and approved before Phase 10 |
Technical Approach
Pi SDK Provider Adapter Pattern
The agent layer stays on Pi SDK. Provider diversity is solved at the adapter layer below Pi:
Provider SDKs (@anthropic-ai/sdk, openai, etc.)
→ IProviderAdapter implementations
→ ProviderRegistry (Pi SDK compatible)
→ Agent Session (Pi SDK) — tool loops, streaming, context
→ AgentService — lifecycle, routing, events
→ ChatGateway — WebSocket to all interfaces
Adding a provider means implementing IProviderAdapter. Everything above stays unchanged.
Routing Decision Flow
User sends message
→ Task classifier (regex/keyword, optionally LLM-assisted)
→ { taskType, complexity, domain, requiredCapabilities }
→ RoutingEngine.resolve(classification, userOverrides, availableProviders)
→ Match rules by priority
→ Check provider health
→ Apply fallback chain
→ Return { provider, model, agentConfigId }
→ AgentService.createOrResumeSession(routingResult)
→ Session uses selected provider/model
→ Emit session:info with routing decision explanation
Embedding Strategy
Replace OpenAI-hardcoded embedding service with provider-agnostic interface:
- Default: Ollama serving
nomic-embed-text(768-dim) ormxbai-embed-large(1024-dim) - Fallback: Any OpenAI-compatible embedding API
- Migration: Update pgvector column dimension if switching from 1536 (OpenAI) to 768/1024 (Ollama models)
- No external API dependency for vector operations in default configuration
Context Window Management
When conversation history exceeds model context:
- Calculate token count of full history
- If exceeds 80% of model context window, trigger summarization
- Summarize oldest N messages into a condensed context block
- Prepend summary + keep recent messages within context budget
- Store summary as a "context checkpoint" message in DB
Model Reference
| Provider | Model | Tier | Context | Tools | Vision | Embedding |
|---|---|---|---|---|---|---|
| Anthropic | Claude Opus 4.6 | premium | 200K | yes | yes | no |
| Anthropic | Claude Sonnet 4.6 | standard | 200K | yes | yes | no |
| Anthropic | Claude Haiku 4.5 | cheap | 200K | yes | yes | no |
| OpenAI | Codex gpt-5.4 | premium | 128K+ | yes | yes | no |
| Z.ai | GLM-5 | standard | TBD | TBD | TBD | no |
| OpenRouter | varies | varies | varies | varies | varies | no |
| Ollama | llama3.2 | local/free | 128K | yes | no | no |
| Ollama | nomic-embed-text | — | — | — | — | yes (768-dim) |
| Ollama | mxbai-embed-large | — | — | — | — | yes (1024-dim) |
Default Routing Rules (Seed Data)
| Priority | Condition | Route To |
|---|---|---|
| 1 | taskType=coding AND complexity=complex | Opus 4.6 |
| 2 | taskType=coding AND complexity=moderate | Sonnet 4.6 |
| 3 | taskType=coding AND complexity=simple | Codex gpt-5.4 |
| 4 | taskType=research | Codex gpt-5.4 |
| 5 | taskType=summarization | GLM-5 |
| 6 | taskType=analysis AND requiredCapabilities includes reasoning | Opus 4.6 |
| 7 | taskType=conversation | Sonnet 4.6 |
| 8 | taskType=creative | Sonnet 4.6 |
| 9 | costTier=cheap OR domain=general | Haiku 4.5 |
| 10 | fallback (no rule matched) | Sonnet 4.6 |
| 99 | provider=ollama forced OR offline mode | llama3.2 |
Rules are user-customizable. Admins set system defaults; users override for their sessions.
Risks and Open Questions
| Risk | Impact | Mitigation |
|---|---|---|
| Pi SDK doesn't support custom provider adapters cleanly | High — blocks M3 | Verify in M3-001; fallback: wrap native SDKs and bypass Pi's registry, feeding responses into Pi's session format |
| BullMQ + Valkey incompatibility | Medium — blocks M6 | Test in M6-001 before migrating jobs; fallback: use bullmq with ioredis directly |
| Embedding dimension migration (1536 → 768/1024) | Medium — data migration | Run migration script to re-embed existing insights; or start fresh if insight count is low |
| Z.ai GLM-5 API undocumented | Low — blocks one provider | Deprioritize; other 4 providers cover all use cases |
| Context window summarization quality | Medium — affects UX | Start with simple truncation; add LLM summarization iteratively |
| OAuth flow complexity in TUI (no browser redirect) | Medium | URL-display + clipboard + Valkey poll token pattern (already designed in P8-012) |
Open Questions
- What is the Z.ai GLM-5 API format? OpenAI-compatible or custom SDK? (Research in M3-005)
- Should routing classification use LLM-assisted classification from the start, or rule-based only? (ASSUMPTION: rule-based first, LLM-assisted later)
- What Ollama embedding model provides the best quality/performance tradeoff? (Test nomic-embed-text vs mxbai-embed-large in M3-009)
- Should provider credentials be stored in DB per-user, or remain environment-variable based for system-wide providers? (ASSUMPTION: hybrid — env vars for system defaults, DB for per-user overrides)
Milestone / Delivery Intent
- Target version: v0.2.0
- Milestone count: 7
- Definition of done: All 10 acceptance criteria verified with evidence, all quality gates green, PRD status updated to
completed - Delivery order: M1 (persistence) → M2 (security) → M3 (providers) → M4 (routing) → M5 (sessions) → M6 (jobs) → M7 (channel design)
- M1 and M2 are prerequisites — no provider or routing work begins until conversations persist and data is user-scoped