Files
stack/docs/archive/missions/harness-20260321/PRD.md
jason.woltje 6f15a84ccf
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
ci/woodpecker/push/publish Pipeline was successful
docs: archive stale mission, scaffold CLI unification mission (#399)
2026-04-05 04:47:54 +00:00

34 KiB

PRD: Harness Foundation — Phase 9

Metadata

  • Owner: Jason Woltje
  • Date: 2026-03-21
  • Status: completed
  • Phase: 9 (post-MVP)
  • Version Target: v0.2.0
  • Agent Harness: Pi SDK
  • Best-Guess Mode: true
  • Repo: git.mosaicstack.dev/mosaic/mosaic-stack

Problem Statement

Mosaic Stack v0.1.0 delivered a functional skeleton — gateway boots, TUI connects, single-agent chat streams, basic auth works. But the system is not usable as a daily-driver harness:

  1. Chat messages are fire-and-forget. The WebSocket gateway never calls ConversationsRepo. Context is lost on disconnect. Conversations can't be resumed with history. Cross-interface continuity (TUI → WebUI → Matrix) is impossible.

  2. Single provider (Ollama) with local models only. No access to frontier models (Claude Opus 4.6, Codex gpt-5.4, GLM-5). The routing engine exists but has never been tested with real providers.

  3. No task-aware agent routing. A coding task and a summarization task route to the same agent with the same model. There is no mechanism to match tasks to agents by capability, cost tier, or specialization.

  4. Memory is not user-scoped. Insight vector search returns all users' data. Deploying multi-user is a security violation.

  5. Agent configs exist in DB but are ignored. Stored system prompts, model preferences, and tool allowlists don't apply to sessions. The /model and /agent slash commands are stubbed.

  6. No job queue. Background processing (summarization, GC, tier management) runs on fragile cron. No retry, no monitoring, no async task dispatch foundation for future agent orchestration.

  7. Plugin system is hollow. Zero implementations. No defined message protocol. Blocks all remote interfaces (Matrix, Discord, Telegram) planned for Phase 10+.

What this phase solves: Transform Mosaic from a demo into a real multi-provider, task-routing AI harness that persists everything, routes intelligently, and is architecturally ready for multi-agent and remote control.


Objectives

  1. Persistent conversations — Every message saved, every conversation resumable, full context available across interfaces
  2. Multi-provider LLM access — Anthropic, OpenAI, OpenRouter, Z.ai, Ollama with proper auth flows
  3. Task-aware agent routing — Granular routing rules that match tasks to the right agent + model by capability, cost, and domain
  4. Security isolation — All data queries user-scoped, ready for multi-user deployment
  5. Session hardening — Agent configs apply, model/agent switching works mid-session
  6. Reliable background processing — BullMQ job queue replaces fragile cron
  7. Channel protocol design — Architecture for Matrix and remote interfaces, built into the foundation now

Scope

In Scope

  1. Conversation persistence — wire ChatGateway to ConversationsRepo, context loading on resume
  2. Multi-provider integration — Anthropic, OpenAI, OpenRouter, Z.ai, Ollama with auth flows
  3. Task-aware agent routing — granular routing rules with task classification and fallback chains
  4. Security isolation — user-scoped queries on all data paths (memory, conversations, agents)
  5. Agent session hardening — configs apply, model/agent switching, session resume
  6. Job queue — BullMQ replacing cron for background processing
  7. Channel protocol design — architecture document for Matrix and remote interfaces
  8. Embedding migration — Ollama-local embeddings replacing OpenAI dependency

Out of Scope

  1. Matrix homeserver deployment + appservice (Phase 10)
  2. Multi-agent orchestration / supervisor-worker pattern (Phase 10+)
  3. WebUI rebuild (future)
  4. Self-managing memory — compaction, merge, forget (future)
  5. Team workspace isolation (future)
  6. Remote channel plugins — WhatsApp, Discord, Telegram (Phase 10+, via Matrix)
  7. Fine-grained RBAC — project/agent/team roles (future)
  8. Agent-to-agent communication (Phase 10+)

User/Stakeholder Requirements

  1. As a user, I can resume a conversation after closing the TUI and the agent remembers the full context
  2. As a user, I can use frontier models (Claude Opus 4.6, Codex gpt-5.4) without manual provider configuration
  3. As a user, the system automatically selects the best model for my task (coding → powerful model, simple question → cheap model)
  4. As a user, I can override the automatic model selection with /model <name> at any time
  5. As a user, I can switch between specialized agents mid-session with /agent <name>
  6. As an admin, I can define routing rules that control which models handle which task types
  7. As an admin, I can monitor background job health and retry failed jobs
  8. As a user, my conversations, memories, and preferences are invisible to other users

Functional Requirements

  1. FR-1: ChatGateway persists every message (user, assistant, tool call, thinking) to the conversations/messages tables
  2. FR-2: On session resume with an existing conversationId, message history is loaded from DB and injected into the agent session context
  3. FR-3: When conversation history exceeds 80% of the model's context window, older messages are summarized and prepended as a context checkpoint
  4. FR-4: Five LLM providers are registered with the gateway: Anthropic (Claude Sonnet 4.6, Opus 4.6, Haiku 4.5), OpenAI (Codex gpt-5.4), OpenRouter (dynamic model list), Z.ai (GLM-5), Ollama (local models)
  5. FR-5: Each provider supports API key auth; Anthropic and OpenAI additionally support OAuth (URL-display + callback pattern)
  6. FR-6: Provider credentials are stored per-user in the DB (encrypted), not in environment variables
  7. FR-7: A routing engine classifies each user message by taskType, complexity, domain, and required capabilities, then selects the optimal provider/model via priority-ordered rules
  8. FR-8: Default routing rules are seeded on first run; admins can customize system-wide rules; users can set per-session overrides
  9. FR-9: Routing decisions are transparent — the TUI shows which model was selected and why
  10. FR-10: Agent configs (system prompt, default model, tool allowlist, skills) stored in DB are applied when creating agent sessions
  11. FR-11: /model <name> switches the active model for subsequent messages in the current session
  12. FR-12: /agent <name> switches to a different agent config, loading its system prompt, tools, and default model
  13. FR-13: All memory queries (insight vector search, preferences) filter by userId
  14. FR-14: BullMQ handles background jobs (summarization, GC, tier management) with retry, backoff, and monitoring
  15. FR-15: Embeddings are served locally via Ollama (nomic-embed-text or mxbai-embed-large) with no external API dependency

Non-Functional Requirements

  1. Security: All data queries include userId filter. Provider credentials encrypted at rest. No cross-user data leakage. OAuth tokens stored securely with refresh handling.
  2. Performance: Message persistence adds <50ms to message relay latency. Routing classification <100ms per message. Provider health checks run on configurable interval (default 60s) without blocking requests.
  3. Reliability: BullMQ jobs retry with exponential backoff (3 attempts default). Provider failover: if primary provider is unhealthy, fallback chain activates automatically. Conversation context survives TUI restart.
  4. Observability: Routing decisions logged with classification details. Job execution logged to agent_logs. Provider health status exposed via /api/providers/health. Session metrics (tokens, model switches, duration) persisted in DB.

Acceptance Criteria

  • AC-1: Send messages in TUI → restart TUI → resume conversation → agent has full history and context
  • AC-2: Route a coding task to Claude Opus 4.6, a simple question to Haiku, a summarization to GLM-5 — all via granular routing rules
  • AC-3: Two users exist, User A's memory searches never return User B's data
  • AC-4: /model claude-sonnet-4-6 in TUI switches the active model for subsequent messages
  • AC-5: /agent coding-agent in TUI switches to a different agent with different system prompt and tools
  • AC-6: BullMQ jobs execute on schedule, failures retry with backoff, admin can inspect via /api/admin/jobs
  • AC-7: Channel protocol document exists with Matrix integration points defined, reviewed, and approved
  • AC-8: Embeddings run on Ollama local models (no external API dependency for vector operations)
  • AC-9: All five providers (Anthropic, OpenAI, OpenRouter, Z.ai, Ollama) connect, list models, and complete chat requests
  • AC-10: Routing transparency — TUI displays which model was selected and the routing reason for each response

Testing and Verification Expectations

  1. Baseline checks: pnpm typecheck, pnpm lint, pnpm format:check — all green before any push
  2. Unit tests: Routing engine rules matching, task classifier, provider adapter registration, message persistence
  3. Integration tests: Two-user isolation (M2-007), provider round-trip (M3-012), routing end-to-end (M4-013), session resume with context (M1-008)
  4. Situational tests per milestone: Each milestone has a verify task that exercises the delivered functionality end-to-end
  5. Evidence format: Test output + manual verification notes in scratchpad per milestone

Constraints and Dependencies

Type Item Notes
Dependency @anthropic-ai/sdk npm, required for M3-002
Dependency openai npm, required for M3-003
Dependency bullmq npm, Valkey-compatible, required for M6
Dependency Ollama embedding models ollama pull nomic-embed-text, required for M3-009
Dependency Pi SDK provider adapter support ASSUMPTION: supported — verify in M3-001
External Anthropic OAuth credentials Requires Anthropic Console setup
External OpenAI OAuth credentials Requires OpenAI Platform setup
External Z.ai API key Requires Z.ai account
External OpenRouter API key Requires OpenRouter account
Constraint Valkey 8 compatibility BullMQ requires Redis 6+; Valkey 8 is compatible
Constraint Embedding dimension migration Switching from 1536 (OpenAI) to 768/1024 (Ollama) requires re-embedding or fresh start

Assumptions

  1. ASSUMPTION: Pi SDK supports custom provider adapters for all target LLM providers. If not, adapters wrap native SDKs behind Pi's interface. Rationale: Gateway already uses Pi with Ollama via a custom adapter pattern.
  2. ASSUMPTION: BullMQ is Valkey-compatible. Rationale: BullMQ documents Redis 6+ compatibility; Valkey 8 is Redis-compatible.
  3. ASSUMPTION: Ollama can serve embedding models (nomic-embed-text, mxbai-embed-large) with acceptable quality. Rationale: Ollama supports embedding endpoints natively.
  4. ASSUMPTION: Anthropic and OpenAI OAuth flows can be handled via URL-display + token callback pattern (same as existing provider auth). Rationale: Both providers offer standard OAuth 2.0 flows.
  5. ASSUMPTION: Z.ai GLM-5 uses an API format compatible with OpenAI or has a documented SDK. Rationale: Most LLM providers converge on OpenAI-compatible APIs.
  6. ASSUMPTION: The existing Pi SDK session model supports mid-session model switching without destroying session state. If not, we destroy and recreate with conversation history. Rationale: Acceptable fallback — context is persisted in DB.
  7. ASSUMPTION: Channel protocol design can be completed without a running Matrix homeserver. Rationale: Matrix protocol is well-documented; design is architecture, not integration.

Milestones

Milestone 1: Conversation Persistence & Context

Goal: Every message persisted. Every conversation resumable with full context.

Task Description
M1-001 Wire ChatGateway.handleMessage() → ConversationsRepo.addMessage() for user messages
M1-002 Wire agent event relay → ConversationsRepo.addMessage() for assistant responses (text, tool calls, thinking)
M1-003 Store message metadata: model used, provider, token counts, tool call details, timestamps
M1-004 On session resume (existing conversationId), load message history from DB and inject into Pi session context
M1-005 Context window management: if history exceeds model context, summarize older messages and prepend summary
M1-006 Conversation search: full-text search on messages table via /api/conversations/search
M1-007 TUI: /history command to display conversation message count and context usage
M1-008 Verify: send messages → kill TUI → resume with -c <id> → agent references prior context

Milestone 2: Security & Isolation

Goal: All data queries user-scoped. Safe for multi-user deployment.

Task Description
M2-001 Audit InsightsRepo: add userId filter to searchByEmbedding() vector search
M2-002 Audit InsightsRepo: add userId filter to findByUser(), decayOldInsights()
M2-003 Audit PreferencesRepo: verify all queries filter by userId
M2-004 Audit agent memory tools: verify memory_search, memory_save_*, memory_get_* all scope to session user
M2-005 Audit ConversationsRepo: verify ownership check on findById, update, delete, addMessage, findMessages
M2-006 Audit AgentsRepo: verify findAccessible() returns only user's agents + system agents
M2-007 Add integration test: create two users, populate data for each, verify cross-user isolation on every query path
M2-008 Audit Valkey keys: verify session keys include userId or are not enumerable across users

Milestone 3: Provider Integration

Goal: Five providers operational with proper auth, health checking, and capability metadata.

Task Description
M3-001 Refactor ProviderService into provider adapter pattern: IProviderAdapter interface with register(), listModels(), healthCheck(), createClient()
M3-002 Anthropic adapter: @anthropic-ai/sdk, register Claude Sonnet 4.6 + Opus 4.6, OAuth flow (URL display + callback), API key fallback
M3-003 OpenAI adapter: openai SDK, register Codex gpt-5.4, OAuth flow, API key fallback
M3-004 OpenRouter adapter: OpenAI-compatible client, API key auth, dynamic model list from /api/v1/models
M3-005 Z.ai GLM adapter: register GLM-5, API key auth, research and implement API format
M3-006 Ollama adapter: refactor existing Ollama integration into adapter pattern, add embedding model support
M3-007 Provider health check: periodic probe (configurable interval), status per provider, expose via /api/providers/health
M3-008 Model capability matrix: define per-model metadata (tier, context window, tool support, vision, streaming, embedding capable)
M3-009 Refactor EmbeddingService: replace OpenAI-hardcoded client with provider-agnostic interface, Ollama as default (nomic-embed-text or mxbai-embed-large)
M3-010 OAuth token storage: persist provider tokens per user in DB (encrypted), refresh flow
M3-011 Provider config UI support: /api/providers CRUD for user-scoped provider credentials
M3-012 Verify: each provider connects, lists models, completes a chat request, handles errors gracefully

Milestone 4: Agent Routing Engine

Goal: Granular, rule-based routing that matches tasks to the right agent and model by capability, cost, and domain specialization.

Task Description
M4-001 Define routing rule schema: RoutingRule { name, priority, conditions[], action } stored in DB
M4-002 Condition types: taskType (coding, research, summarization, conversation, analysis, creative), complexity (simple, moderate, complex), domain (frontend, backend, devops, docs, general), costTier (cheap, standard, premium), requiredCapabilities (tools, vision, long-context, reasoning)
M4-003 Action types: routeTo { provider, model, agentConfigId?, systemPromptOverride?, toolAllowlist? }
M4-004 Default routing rules (seed data): coding → Opus 4.6, simple Q&A → Sonnet 4.6, summarization → GLM-5, research → Codex gpt-5.4, local/offline → Ollama llama3.2
M4-005 Task classification: lightweight classifier that infers taskType + complexity from user message (can be rule-based regex/keyword initially, LLM-assisted later)
M4-006 Routing decision pipeline: classify task → match rules by priority → select best available provider/model → fallback chain if primary unavailable
M4-007 Routing override: user can force a specific model via /model <name> regardless of routing rules
M4-008 Routing transparency: include routing decision in session:info event (why this model was selected)
M4-009 Routing rules CRUD: /api/routing/rules — list, create, update, delete, reorder priority
M4-010 Per-user routing overrides: users can customize default rules for their sessions
M4-011 Agent specialization: agents can declare capabilities in their config (domains, preferred models, tool sets)
M4-012 Routing integration: wire routing engine into ChatGateway — every new message triggers routing decision before agent dispatch
M4-013 Verify: send a coding question → routed to Opus; send "summarize this" → routed to GLM-5; send "what time is it" → routed to cheap tier

Milestone 5: Agent Session Hardening

Goal: Agent configs apply to sessions. Model and agent switching work mid-session.

Task Description
M5-001 Wire ChatGateway: on session create, load agent config from DB (system prompt, model, provider, tool allowlist, skills)
M5-002 /model <name> command: end-to-end wiring — TUI → socket command:execute → gateway switches provider/model → new messages use new model
M5-003 /agent <name> command: switch to different agent config mid-session — loads new system prompt, tools, and default model
M5-004 Session ↔ conversation binding: persist sessionId on conversation record, allow session resume via conversation ID
M5-005 Session info broadcast: on model/agent switch, emit session:info with updated provider, model, agent name
M5-006 Agent creation from TUI: /agent new command creates agent config via gateway API
M5-007 Session metrics: track per-session token usage, model switches, duration — persist in DB
M5-008 Verify: start TUI → /model claude-opus-4-6 → verify response uses Opus → /agent research-bot → verify system prompt changes

Milestone 6: Job Queue Foundation

Goal: Reliable background processing via BullMQ. Foundation for future agent task orchestration.

Task Description
M6-001 Add BullMQ dependency, configure with Valkey connection
M6-002 Create queue service: typed job definitions, worker registration, error handling with exponential backoff
M6-003 Migrate summarization cron → BullMQ repeatable job
M6-004 Migrate GC (session cleanup) → BullMQ repeatable job
M6-005 Migrate tier management (log archival) → BullMQ repeatable job
M6-006 Admin jobs API: GET /api/admin/jobs — list active/completed/failed jobs, retry failed, pause/resume queues
M6-007 Job event logging: emit job start/complete/fail events to agent_logs for observability
M6-008 Verify: jobs execute on schedule, deliberate failure retries with backoff, admin endpoint shows job history

Milestone 7: Channel Protocol Design

Goal: Architecture document defining how remote interfaces (Matrix, Discord, Telegram) will integrate. No code — design only. Built into foundation now so Phase 10+ doesn't require gateway rewrites.

Task Description
M7-001 Define IChannelAdapter interface: lifecycle (connect, disconnect, health), message flow (receiveMessage → gateway, sendMessage ← gateway), identity mapping (channel user ↔ Mosaic user)
M7-002 Define channel message protocol: canonical message format that all adapters translate to/from (content, metadata, attachments, thread context)
M7-003 Design Matrix integration: appservice registration, room ↔ conversation mapping, space ↔ team mapping, agent ghost users, power levels for human observation
M7-004 Design conversation multiplexing: same conversation accessible from TUI + WebUI + Matrix simultaneously, real-time sync via gateway events
M7-005 Design remote auth bridging: how a Matrix/Discord message authenticates to Mosaic (token linking, OAuth bridge, invite-based provisioning)
M7-006 Design agent-to-agent communication via Matrix rooms: room per agent pair, human can join to observe, message format for structured agent dialogue
M7-007 Design multi-user isolation in Matrix: space-per-team, room visibility rules, encryption considerations, admin visibility
M7-008 Publish architecture doc: docs/architecture/channel-protocol.md — reviewed and approved before Phase 10

Technical Approach

Pi SDK Provider Adapter Pattern

The agent layer stays on Pi SDK. Provider diversity is solved at the adapter layer below Pi:

Provider SDKs (@anthropic-ai/sdk, openai, etc.)
  → IProviderAdapter implementations
    → ProviderRegistry (Pi SDK compatible)
      → Agent Session (Pi SDK) — tool loops, streaming, context
        → AgentService — lifecycle, routing, events
          → ChatGateway — WebSocket to all interfaces

Adding a provider means implementing IProviderAdapter. Everything above stays unchanged.

Routing Decision Flow

User sends message
  → Task classifier (regex/keyword, optionally LLM-assisted)
    → { taskType, complexity, domain, requiredCapabilities }
  → RoutingEngine.resolve(classification, userOverrides, availableProviders)
    → Match rules by priority
    → Check provider health
    → Apply fallback chain
    → Return { provider, model, agentConfigId }
  → AgentService.createOrResumeSession(routingResult)
    → Session uses selected provider/model
  → Emit session:info with routing decision explanation

Embedding Strategy

Replace OpenAI-hardcoded embedding service with provider-agnostic interface:

  • Default: Ollama serving nomic-embed-text (768-dim) or mxbai-embed-large (1024-dim)
  • Fallback: Any OpenAI-compatible embedding API
  • Migration: Update pgvector column dimension if switching from 1536 (OpenAI) to 768/1024 (Ollama models)
  • No external API dependency for vector operations in default configuration

Context Window Management

When conversation history exceeds model context:

  1. Calculate token count of full history
  2. If exceeds 80% of model context window, trigger summarization
  3. Summarize oldest N messages into a condensed context block
  4. Prepend summary + keep recent messages within context budget
  5. Store summary as a "context checkpoint" message in DB

Model Reference

Provider Model Tier Context Tools Vision Embedding
Anthropic Claude Opus 4.6 premium 200K yes yes no
Anthropic Claude Sonnet 4.6 standard 200K yes yes no
Anthropic Claude Haiku 4.5 cheap 200K yes yes no
OpenAI Codex gpt-5.4 premium 128K+ yes yes no
Z.ai GLM-5 standard TBD TBD TBD no
OpenRouter varies varies varies varies varies no
Ollama llama3.2 local/free 128K yes no no
Ollama nomic-embed-text yes (768-dim)
Ollama mxbai-embed-large yes (1024-dim)

Default Routing Rules (Seed Data)

Priority Condition Route To
1 taskType=coding AND complexity=complex Opus 4.6
2 taskType=coding AND complexity=moderate Sonnet 4.6
3 taskType=coding AND complexity=simple Codex gpt-5.4
4 taskType=research Codex gpt-5.4
5 taskType=summarization GLM-5
6 taskType=analysis AND requiredCapabilities includes reasoning Opus 4.6
7 taskType=conversation Sonnet 4.6
8 taskType=creative Sonnet 4.6
9 costTier=cheap OR domain=general Haiku 4.5
10 fallback (no rule matched) Sonnet 4.6
99 provider=ollama forced OR offline mode llama3.2

Rules are user-customizable. Admins set system defaults; users override for their sessions.


Risks and Open Questions

Risk Impact Mitigation
Pi SDK doesn't support custom provider adapters cleanly High — blocks M3 Verify in M3-001; fallback: wrap native SDKs and bypass Pi's registry, feeding responses into Pi's session format
BullMQ + Valkey incompatibility Medium — blocks M6 Test in M6-001 before migrating jobs; fallback: use bullmq with ioredis directly
Embedding dimension migration (1536 → 768/1024) Medium — data migration Run migration script to re-embed existing insights; or start fresh if insight count is low
Z.ai GLM-5 API undocumented Low — blocks one provider Deprioritize; other 4 providers cover all use cases
Context window summarization quality Medium — affects UX Start with simple truncation; add LLM summarization iteratively
OAuth flow complexity in TUI (no browser redirect) Medium URL-display + clipboard + Valkey poll token pattern (already designed in P8-012)

Open Questions

  1. What is the Z.ai GLM-5 API format? OpenAI-compatible or custom SDK? (Research in M3-005)
  2. Should routing classification use LLM-assisted classification from the start, or rule-based only? (ASSUMPTION: rule-based first, LLM-assisted later)
  3. What Ollama embedding model provides the best quality/performance tradeoff? (Test nomic-embed-text vs mxbai-embed-large in M3-009)
  4. Should provider credentials be stored in DB per-user, or remain environment-variable based for system-wide providers? (ASSUMPTION: hybrid — env vars for system defaults, DB for per-user overrides)

Milestone / Delivery Intent

  1. Target version: v0.2.0
  2. Milestone count: 7
  3. Definition of done: All 10 acceptance criteria verified with evidence, all quality gates green, PRD status updated to completed
  4. Delivery order: M1 (persistence) → M2 (security) → M3 (providers) → M4 (routing) → M5 (sessions) → M6 (jobs) → M7 (channel design)
  5. M1 and M2 are prerequisites — no provider or routing work begins until conversations persist and data is user-scoped